Kathy Reid - Mozilla Data Collective

data

What makes a good dataset sample — and how to create one

In this post, we walk you through how to create a useful dataset sample as a preview of your dataset, and guide you in uploading it to the MDC platform.

data

Metadata magic: making datasets discoverable and tractable with Croissant

Learn how we automatically generate Croissant metadata to describe datasets on the Mozilla Data Collective platform, making them more discoverable.

News

The Mozilla Data Collective Data Assistant is now available in Alpha

We're excited to share that the Mozilla Data Collective Data Assistant is now available in Alpha. Visit https://mozilladatacollective.com/chat to get started.

Why Whisper Still Struggles with Australian English - and What We Did About It

By Kathy Reid · Mozilla Data Collective Ask a Queenslander to say "no worries" into a Home Assistant Voice Preview. There's a decent chance it mishears them. Ask someone from Radelaide and things get worse. Ask anyone who grew up calling things "heaps good" and

Common Voice

Pashto becomes third-highest language by volume of data in Common Voice v24

Key highlights from the Common Voice v24 Scripted Speech and v2 Spontaneous Speech release.

Common Voice

Improving the Spontaneous Speech English dataset: lifting the lid on speech data quality uplift techniques

Firstly, we’d like to thank you for your patience. After introducing Spontaneous Speech early in 2025, we released most locale datasets when the Mozilla Data Collective platform launched in alpha in September of this year. However, upon inspection, the English Spontaneous Speech dataset required some remedial work prior to

Three people in front of a green background, smiling,under the text "people powered data, people centered tech"

News

What do the languages Nahuatl, Bahasa Indonesia and Bulgarian have in common?

Nahuatl, Bahasa Indonesia and Bulgarian all feature in our very first community curated datasets to be uploaded to the Mozilla Data Collective platform.

Image shows two people giving a thumbs up sign in front of a mobile phone.

token

Your datasets, under your control: Mozilla Data Collective at PyConAU in Melbourne, Australia

Kathy Reid's presentation to PyConAU in Melbourne, Australia covers tokenomics, harvesting tokens, and how Mozilla Data Collective offers a better way forward.