Common Voice 23.0 Live On Mozilla Data Collective
Common Voice 23.0 is now live – and available for download via Mozilla Data Collective.
Mozilla Data Collective is a sister platform from the team behind Common Voice, designed to let dataset owners and creators offer their data on their own terms. Mozilla Data Collective was built in response to community feedback, as many of you asked for more flexibility in licensing and ways to host and manage datasets collected outside the Common Voice platform.
Common Voice 23.0 features:
This release adds 2,100 new hours of open speech data, bringing the total to 35,921 hours. Almost 2,000 newly validated hours have been added to Common Voice (24,600 validated in total to date).
With 149 new languages added to the dataset, this more than doubles the total to 286 languages represented in Common Voice.. This unprecedented expansion of new contributions spans the globe, including Lassi, Scots, Tupuri and Puno Quechua.
Spontaneous Speech data is now available:
Spontaneous Speech mode launched this year on Common Voice, allowing contributors to record natural responses to prompts in their own words for the first time. Common Voice 23.0 features 357 hours of Spontaneous Speech data across 51 languages, many of which have been previously excluded from open speech datasets (e.g., Betawi, Western Penan and Ligurian).
Introducing datasheets:
Each dataset in Common Voice 23.0 is now accompanied by detailed datasheets to help users to understand more about dataset terms, the language the dataset includes and more key features of the data..
Mozilla Data Collective offers more download support:
Common Voice 23.0 – and all future releases – can now be downloaded through Mozilla Data Collective.
We’ve also added a download via API option, as well as an API-supported open-source Python library, allowing for programmatic downloads of datasets. You can learn more about the Mozilla Data Collective Alpha launch here.
Common Voice is a community effort:
We want to extend our thanks to the language communities, activists, developers and community members who made this release possible. A special thanks to Zakia Mustafa for running a Kiswahili community event that validated over 2400 clips. Zakia is a long-time advocate for open technology and language inclusion, with a passion for building resources that strengthen the representation of low-resource languages in digital spaces. She has previously worked as a Mozilla Community Champion and continues to support community-driven initiatives through Common Voice.
We’re taking ongoing applications for a volunteer program to help enable community members like Zakiya to do outreach in your language communities with funding and logistics support. You can learn more about the program here or apply directly via this form.
Tell us what you think:
We’re always excited for your feedback. You can reach us on Discord, Matrix, GitHub or email us directly at commonvoice@mozilla.com.
We also run regular drop-in calls with the community and will be running office hours calls on October (23-10-2025 7am GMT) and November: (20-11-2025 5pm GMT).
We’d love to hear what you think about this release and what you’d like to see next.