Kid-Whisper: Towards Bridging the Performance Gap in Automatic Speech Recognition for Children VS. Adults


Journal article


Ahmed Adel Attia, Jing Liu, Wei Ai, Dorottya Demszky, Carol Espy-Wilson
arXiv preprint arXiv:2309.07927, 2023

DOI: https://doi.org/10.48550/arXiv.2309.07927

Link<<
Cite

Cite

APA   Click to copy
Attia, A. A., Liu, J., Ai, W., Demszky, D., & Espy-Wilson, C. (2023). Kid-Whisper: Towards Bridging the Performance Gap in Automatic Speech Recognition for Children VS. Adults. ArXiv Preprint ArXiv:2309.07927. https://doi.org/ https://doi.org/10.48550/arXiv.2309.07927


Chicago/Turabian   Click to copy
Attia, Ahmed Adel, Jing Liu, Wei Ai, Dorottya Demszky, and Carol Espy-Wilson. “Kid-Whisper: Towards Bridging the Performance Gap in Automatic Speech Recognition for Children VS. Adults.” arXiv preprint arXiv:2309.07927 (2023).


MLA   Click to copy
Attia, Ahmed Adel, et al. “Kid-Whisper: Towards Bridging the Performance Gap in Automatic Speech Recognition for Children VS. Adults.” ArXiv Preprint ArXiv:2309.07927, 2023, doi: https://doi.org/10.48550/arXiv.2309.07927.


BibTeX   Click to copy

@article{ahmed2023a,
  title = {Kid-Whisper: Towards Bridging the Performance Gap in Automatic Speech Recognition for Children VS. Adults},
  year = {2023},
  journal = {arXiv preprint arXiv:2309.07927},
  doi = { https://doi.org/10.48550/arXiv.2309.07927},
  author = {Attia, Ahmed Adel and Liu, Jing and Ai, Wei and Demszky, Dorottya and Espy-Wilson, Carol}
}

Abstract

Recent advancements in Automatic Speech Recognition (ASR) systems, exemplified by Whisper, have demonstrated the potential of these systems to approach human-level performance given sufficient data. However, this progress doesn't readily extend to ASR for children due to the limited availability of suitable child-specific databases and the distinct characteristics of children's speech. A recent study investigated leveraging the My Science Tutor (MyST) children's speech corpus to enhance Whisper's performance in recognizing children's speech. They were able to demonstrate some improvement on a limited testset. This paper builds on these findings by enhancing the utility of the MyST dataset through more efficient data preprocessing. We reduce the Word Error Rate (WER) on the MyST testset 13.93% to 9.11% with Whisper-Small and from 13.23% to 8.61% with Whisper-Medium and show that this improvement can be generalized to unseen datasets. We also highlight important challenges towards improving children's ASR performance. The results showcase the viable and efficient integration of Whisper for effective children's speech recognition.

Share



Follow this website


You need to create an Owlstown account to follow this website.


Sign up

Already an Owlstown member?

Log in