News
Updates to MDC REST API and Python Library
Tl;dr: Update to newest version of the MDC Python library (0.2.0 or newer) to continue downloading datasets
News
Tl;dr: Update to newest version of the MDC Python library (0.2.0 or newer) to continue downloading datasets
Common Voice
Key highlights from the Common Voice v24 Scripted Speech and v2 Spontaneous Speech release.
Common Voice
Firstly, we’d like to thank you for your patience. After introducing Spontaneous Speech early in 2025, we released most locale datasets when the Mozilla Data Collective platform launched in alpha in September of this year. However, upon inspection, the English Spontaneous Speech dataset required some remedial work prior to
News
Nahuatl, Bahasa Indonesia and Bulgarian all feature in our very first community curated datasets to be uploaded to the Mozilla Data Collective platform.
Common Voice
We’re tightening the circulation of old datasets to protect contributors, while keeping a clear, documented path for researchers who need them.
News
Quick links * Registration Form * Link to datasets * Codabench page (to submit results during the testing period) * Contact: sharedtask@mozillafoundation.org Overview Automatic speech recognition (ASR) has come a long way – but most systems are still trained on polished, read-aloud speech. So we set out to build a model that can
Common Voice 23.0 is now live – and available for download via Mozilla Data Collective. Mozilla Data Collective is a sister platform from the team behind Common Voice, designed to let dataset owners and creators offer their data on their own terms. Mozilla Data Collective was built in response to
Mozilla Data Collective is now in live alpha, offering the Common Voice 23.0 datasets.
token
Kathy Reid's presentation to PyConAU in Melbourne, Australia covers tokenomics, harvesting tokens, and how Mozilla Data Collective offers a better way forward.
News
We're excited to share a high-level roadmap for the Mozilla Data Collective platform, leading up to our 1.0 launch in early Q1 2026: September: Mozilla Data Collective Alpha Launch October: New Datasets Available November: Mozilla Data Collective Beta Launch * Dataset and datasheet on-boarding and upload flow * Feature
News
Over the last eight years, the Common Voice community has shared wishlists with us for ways to create, curate, and control their data that extend beyond our current platform capabilities. For example, supporting the collection and release of datasets under different licences to CC-0, and the ability to contribute datasets