Nov 23, 2022
Work in progress supervised and mentored by Aurora Cramer, Professor Magdalena Fuentes, and Professor Juan Bello.
Proof of Concepts
- Pushing the ability of an audio encoder to the edge, we examined the representations learned in convolutional neural network architectures through two stages of transfer learning: one in audio-visual scene correspondence and audio-visual navigation; another in various downstream tasks to holistically evaluate the generalizability of learned audio representations.
- In audio visual correspondence pre-training, we adopted the contrastive learning method on egocentric videos with stereo audio.
- In semantic audio-visual navigation fine-tuning, the acoustic, directional, and semantic features of the binaural sound are learned through a reinforcement learning approach with actions, reward, and memory.
Â
Up-upstream: contrastive audio visual correspondence
Upstream: embodied semantic audio-visual navigation
Downstream: sound source localization in HEAR benchmarks
Codes in Progress
sound-spaces
marl • Updated Nov 4, 2022
hear-semav-embedding
marl • Updated Sep 27, 2022
hear-baseline
auroracramer • Updated Jul 27, 2022
Â
References
Cover video from https://soundspaces.org/. inproceedings{chen2021semantic, title={Semantic audio-visual navigation}, author={Chen, Changan and Al-Halah, Ziad and Grauman, Kristen}, booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition}, pages={15516--15525}, year={2021} } @article{wu2022listen, title={How to Listen? Rethinking Visual Sound Localization}, author={Wu, Ho-Hsiang and Fuentes, Magdalena and Seetharaman, Prem and Bello, Juan Pablo}, journal={arXiv preprint arXiv:2204.05156}, year={2022} } @article{turian2022hear, title={Hear 2021: Holistic evaluation of audio representations}, author={Turian, Joseph and Shier, Jordie and Khan, Humair Raj and Raj, Bhiksha and Schuller, Bj{\"o}rn W and Steinmetz, Christian J and Malloy, Colin and Tzanetakis, George and Velarde, Gissel and McNally, Kirk and others}, journal={arXiv preprint arXiv:2203.03022}, year={2022} }
Â