Common Voice
Improving the Spontaneous Speech English dataset: lifting the lid on speech data quality uplift techniques
Firstly, we’d like to thank you for your patience. After introducing Spontaneous Speech early in 2025, we released most locale datasets when the Mozilla Data Collective platform launched in alpha in September of this year. However, upon inspection, the English Spontaneous Speech dataset required some remedial work prior to