Dec 20, 2022
Project in Deep Learning and Neural Networks course instructed by Professor Richard Zemel.
Framework
Empowered by deep learning, Music Information Retrieval (MIR) is a rising domain where major tasks rely on deep representation learning. As a crucial sub-field in MIR, music representation learning aims to obtain useful musical information in the embedding space for downstream tasks with a generalization ability to understand music in various ways.
Previous work has demonstrated the effectiveness of music representation learned in different frameworks. Pair-based or proxy-based Metric learning has been the most commonly used paradigms for representation learning, while recent work also indicate the advantages of inducing representations from generation models.
In this report, three types of frameworks will be implemented: the proxy-based method from Proxy-NCA, the pair-based contrastive learning method from CLMR, and the generative-based pretraining method from Jukebox. We evaluate the quality of representations learned within the above frameworks by comparing and analyzing their performance on audio instrument and genre classification tasks. Finally, we demonstrate the advantages of representations learned within a generative-based pretraining paradigm over the other two methods.
Codes
References
@article{lee2020metric, title={Metric learning vs classification for disentangled music representation learning}, author={Lee, Jongpil and Bryan, Nicholas J and Salamon, Justin and Jin, Zeyu and Nam, Juhan}, journal={arXiv preprint arXiv:2008.03729}, year={2020} } @article{spijkervet2021contrastive, title={Contrastive learning of musical representations}, author={Spijkervet, Janne and Burgoyne, John Ashley}, journal={arXiv preprint arXiv:2103.09410}, year={2021} } @article{dhariwal2020jukebox, title={Jukebox: A generative model for music}, author={Dhariwal, Prafulla and Jun, Heewoo and Payne, Christine and Kim, Jong Wook and Radford, Alec and Sutskever, Ilya}, journal={arXiv preprint arXiv:2005.00341}, year={2020} } @inproceedings{movshovitz2017no, title={No fuss distance metric learning using proxies}, author={Movshovitz-Attias, Yair and Toshev, Alexander and Leung, Thomas K and Ioffe, Sergey and Singh, Saurabh}, booktitle={Proceedings of the IEEE International Conference on Computer Vision}, pages={360--368}, year={2017} }