News

Exciting Updates for Mozilla Data Collective

Today we’re so excited to announce exciting platform changes to Mozilla Data Collective that bring us closer to fulfilling our promise to give everyone the Data Platform for Human Agency and Fair Value Exchange.

Request to Access feature

We have always wanted to give communities more choice about who accesses their datasets, and the platform supports dozens of licenses, and full downloader authentication. But we also know that some organisations may want to check every downloader themselves, and we want to give them the tools to do that.

With the new Request to Access feature, uploaders can require downloaders to request access to their dataset, and ensure that they’re comfortable with sharing with the specific requester. Imagine, your dataset is a multimodal medical corpus intended for research, or a children’s speech corpus intended only for education non-profits or local start-ups; now you are able to double-check that the downloader is aligned with your license intention.

Until you, the uploader, approves the request, the platform won’t enable the dataset download.

This feature is available for uploaders to use today!

Data Assistant

Finding the right dataset for your machine learning project should not take hours. Whether you are building a speech recognition system, training a text-to-speech model, or assembling a machine translation corpus, Mozilla Data Collective holds a substantial and growing library of high-quality, ethically sourced datasets.

Until now, searching through the datasets on MDC required you to understand what types of tasks and explicit languages you were searching for. The upcoming Data Assistant feature aims to reduce that friction. By understanding your requirements in plain language and matching them against Mozilla Data Collective's dataset catalogue, it reduces the gap between "I need data for this project" and "here are the datasets you should look at."

The MDC Data Assistant will be available starting on May 06, 2026.

Payments and Compensation features

Datasets represent hours of labour, care and attention. Curating a multimodal dataset for an underserved community involves linguists, speakers, transcribers, annotators, quality assurance teams, not to mention tooling, hosting and downloader support. Many of our users open source their datasets, in an incredible act of gift and commons-building.

But for many, this isn’t possible, or even fair or desirable: we have heard from communities who have lived experience of colonialism and exploitation, who insist that their labour and expertise must be valued and recognised. We’ve spoken to people working in industries whose traditional sustainability models are being cannibalised, and who need new revenue streams to ensure flourishing futures.

We are excited that will be releasing a set of compensation and payment features, so that downloaders can pay for a license to use datasets. 100% of the license fee, which is set by the uploader, goes directly to the uploader. We never take a cut of the license fee. We charge the downloader a modest 5% fee to cover our infrastructure, hosting and support costs.

We expect to launch this feature in the next few weeks.

Differentiated Access

One of our biggest motivations for building Mozilla Data Collective was enabling organisations to set different terms for different contexts. You might be happy to share a sample of your simulated doctor-patient dialogues dataset for free with students and researchers, but expect that a large pharmaceutical company who wants the entire dataset should pay for a license.

Now, by stacking our payments and request to access features, you will be able to set up multiple listings and enable differentiated access to your dataset. With these features, you can share your dataset in line with your values.

A dedicated entity based in the UK

To power all this for our community, we knew we needed the capabilities and agility of a company, the trustworthiness and purpose of a non-profit, and the data protections of Europe. That’s why we’re excited to announce our new home in the United Kingdom! Mozilla Data Collective is now structured as a mission-locked British company; backed, incubated and governed by Mozilla Foundation, the non-profit that fights for alternative digital futures, and makes good tech the norm.

We are a social enterprise, and proud to be so. In a world where grant funding can disappear in line with political realities, we want to be firmly self-sustaining and independent.

These features are being released in Alpha, so as always we welcome your feedback, ideas, requests and suggestions for how to make them even better! Reach out at support@mozilladatacollective.com.

Exciting Updates for Mozilla Data Collective

Request to Access feature

Data Assistant

Payments and Compensation features

Differentiated Access

A dedicated entity based in the UK

Read more

Mozilla Data Collective datasets now discoverable through CLARIN’s Virtual Language Observatory

Get a Sneak Preview of Mozilla Data Collective’s Compensation Feature!

15 Datasets for Fine-Tuning Whisper on a New Language in 2026

How Radio Free Europe/Radio Liberty Also Serves Its Communities Through Its Datasets