data
Metadata magic: making datasets discoverable and tractable with Croissant
Learn how we automatically generate Croissant metadata to describe datasets on the Mozilla Data Collective platform, making them more discoverable.
Engineer. Data. Machine learning. Voice AI. Conscientious technologist. Knitter. Hearts your dog. *Contains some replacement parts. ** No I haven’t finished my PhD yet, don’t @ me
data
Learn how we automatically generate Croissant metadata to describe datasets on the Mozilla Data Collective platform, making them more discoverable.
News
We're excited to share that the Mozilla Data Collective Data Assistant is now available in Alpha. Visit https://mozilladatacollective.com/chat to get started.
By Kathy Reid · Mozilla Data Collective Ask a Queenslander to say "no worries" into a Home Assistant Voice Preview. There's a decent chance it mishears them. Ask someone from Radelaide and things get worse. Ask anyone who grew up calling things "heaps good" and
Common Voice
Key highlights from the Common Voice v24 Scripted Speech and v2 Spontaneous Speech release.
Common Voice
Firstly, we’d like to thank you for your patience. After introducing Spontaneous Speech early in 2025, we released most locale datasets when the Mozilla Data Collective platform launched in alpha in September of this year. However, upon inspection, the English Spontaneous Speech dataset required some remedial work prior to
News
Nahuatl, Bahasa Indonesia and Bulgarian all feature in our very first community curated datasets to be uploaded to the Mozilla Data Collective platform.
token
Kathy Reid's presentation to PyConAU in Melbourne, Australia covers tokenomics, harvesting tokens, and how Mozilla Data Collective offers a better way forward.