data
What makes a good dataset sample — and how to create one
In this post, we walk you through how to create a useful dataset sample as a preview of your dataset, and guide you in uploading it to the MDC platform.
data
In this post, we walk you through how to create a useful dataset sample as a preview of your dataset, and guide you in uploading it to the MDC platform.
data
Learn how we automatically generate Croissant metadata to describe datasets on the Mozilla Data Collective platform, making them more discoverable.
data
Panjebar Semangat, a weekly Javanese-language magazine established before Indonesian independence, is collaborating with Mozilla Data Collective to advance community-governed language dataset frameworks.
Guide
In this guide, you will learn how to use the MDC Python SDK Library to download datasets from the Mozilla Data Collective website.
Common Voice
Firstly, we’d like to thank you for your patience. After introducing Spontaneous Speech early in 2025, we released most locale datasets when the Mozilla Data Collective platform launched in alpha in September of this year. However, upon inspection, the English Spontaneous Speech dataset required some remedial work prior to
News
Nahuatl, Bahasa Indonesia and Bulgarian all feature in our very first community curated datasets to be uploaded to the Mozilla Data Collective platform.
Common Voice
We’re tightening the circulation of old datasets to protect contributors, while keeping a clear, documented path for researchers who need them.
token
Kathy Reid's presentation to PyConAU in Melbourne, Australia covers tokenomics, harvesting tokens, and how Mozilla Data Collective offers a better way forward.