Common Voice - Mozilla Data Collective

Common Voice

Pashto becomes third-highest language by volume of data in Common Voice v24

Key highlights from the Common Voice v24 Scripted Speech and v2 Spontaneous Speech release.

Common Voice

Improving the Spontaneous Speech English dataset: lifting the lid on speech data quality uplift techniques

Firstly, we’d like to thank you for your patience. After introducing Spontaneous Speech early in 2025, we released most locale datasets when the Mozilla Data Collective platform launched in alpha in September of this year. However, upon inspection, the English Spontaneous Speech dataset required some remedial work prior to

Common Voice

We're Changing Access to Older Versions of Common Voice datasets

We’re tightening the circulation of old datasets to protect contributors, while keeping a clear, documented path for researchers who need them.

News

Shared Task: Mozilla Common Voice Spontaneous Speech ASR

Quick links * Registration Form * Link to datasets * Codabench page (to submit results during the testing period) * Contact: sharedtask@mozillafoundation.org Overview Automatic speech recognition (ASR) has come a long way – but most systems are still trained on polished, read-aloud speech. So we set out to build a model that can

News

Common Voice 23.0 Live On Mozilla Data Collective

Common Voice 23.0 is now live – and available for download via Mozilla Data Collective. Mozilla Data Collective is a sister platform from the team behind Common Voice, designed to let dataset owners and creators offer their data on their own terms. Mozilla Data Collective was built in response to