Automatic Speech Recognition (ASR) with Wav2Vec 2.0 & Whisper @Columbia DSI & Accenture
Automatic Speech Recognition (ASR) with Wav2Vec 2.0 & Whisper @Columbia DSI & Accenture

Automatic Speech Recognition (ASR) with Wav2Vec 2.0 & Whisper @Columbia DSI & Accenture

Created
Sep 22, 2022 10:17 PM
update
Last updated December 13, 2022
Domain
Music/Audio Technology
time
2022
💡
Nov 23, 2022 Work in progress with Columbia Data Science Institute and Accenture as my Capstone project.

Framework

  • Studying the robustness, generalizability, and transferability of speech representation models: self-supervised Wav2Vec2.0 and weakly supervised Whisper.
  • Evaluate those powerful speech representations in challenging scenarios: noisy speech, low-quality speech (downsampled), low-resource languages, English-distant languages (like Chinese, Korean, Hebrew, and Telugu), accented English speech, singing speech. → All types of “speech” that’s not standard and clean enough. Demonstrate token-level confidence score comparison.
  • Demonstrate the robustness of large-scale multi-task training regime of Whisper.

Codes in Progress

Poster

notion image

References

@article{radford2022robust, title={Robust speech recognition via large-scale weak supervision}, author={Radford, Alec and Kim, Jong Wook and Xu, Tao and Brockman, Greg and McLeavey, Christine and Sutskever, Ilya}, journal={OpenAI Blog}, year={2022} } @article{baevski2020wav2vec, title={wav2vec 2.0: A framework for self-supervised learning of speech representations}, author={Baevski, Alexei and Zhou, Yuhao and Mohamed, Abdelrahman and Auli, Michael}, journal={Advances in Neural Information Processing Systems}, volume={33}, pages={12449--12460}, year={2020} } @article{ou2022towards, title={Towards Transfer Learning of wav2vec 2.0 for Automatic Lyric Transcription}, author={Ou, Longshen and Gu, Xiangming and Wang, Ye}, journal={arXiv preprint arXiv:2207.09747}, year={2022} }