FAQ
FAQ: What does it mean to exclusively host my dataset on Mozilla Data Collective?
When you exclusively host your dataset on MDC, you choose to only make it available on our site.
FAQ
When you exclusively host your dataset on MDC, you choose to only make it available on our site.
FAQ
Our priority is technology that is more multilingual, multicultural, and multi-modal.
FAQ
When you use MDC to share or download datasets, you enter into an agreement directly with the data provider/consumer.
News
Nahuatl, Bahasa Indonesia and Bulgarian all feature in our very first community curated datasets to be uploaded to the Mozilla Data Collective platform.
FAQ
We limit access to old versions of Common Voice to respect those speakers who have withdrawn their consent to be included in the dataset.
Common Voice
We’re tightening the circulation of old datasets to protect contributors, while keeping a clear, documented path for researchers who need them.
FAQ
Depending on the nature of the issue, you can contact the dataset uploader, use the report dataset link, or contact us directly.
Join us for the official kick‑off of the Shared Task: Mozilla Common Voice Spontaneous Speech! This live, online gathering will bring together researchers, engineers, and language‑technology enthusiasts from around the world to launch the challenge focused on building robust, multilingual ASR systems for under‑represented languages. In a
FAQ
By exclusively hosting Common Voice datasets on MDC, we are best able to govern and respect the wishes of contributors.
News
Quick links * Registration Form * Link to datasets * Codabench page (to submit results during the testing period) * Contact: sharedtask@mozillafoundation.org Overview Automatic speech recognition (ASR) has come a long way – but most systems are still trained on polished, read-aloud speech. So we set out to build a model that
Common Voice 23.0 is now live – and available for download via Mozilla Data Collective. Mozilla Data Collective is a sister platform from the team behind Common Voice, designed to let dataset owners and creators offer their data on their own terms. Mozilla Data Collective was built in response to
Mozilla Data Collective is now in live alpha, offering the Common Voice 23.0 datasets.