FAQ: Who is behind Mozilla Data Collective?
We are backed and stewarded by Mozilla Foundation - the non-profit, movement-building, and philanthropy arm of Mozilla.
We are backed and stewarded by Mozilla Foundation - the non-profit, movement-building, and philanthropy arm of Mozilla.
For more than 75 years, Radio Free Europe/Radio Liberty (RFE/RL) has promoted democratic values by providing accurate, uncensored news and debate in countries where a free press is threatened. RFE/RL reaches more than 44 million people every week across 18 countries, in 24 languages, including Persian, Russian,
In this post, we walk you through how to create a useful dataset sample as a preview of your dataset, and guide you in uploading it to the MDC platform.
The problem with "low-resource" machine translation Most production machine-translation systems in 2026 are still trained on a fairly narrow set of language pairs: the 50 or so for which the open web supplies enough parallel text to push BLEU scores into useful territory. Below that line, MT quality
A curated list of 15 text-to-speech training datasets for teams shipping production voice models in 2026 covering emotional, multi-speaker, audiobook-derived, non-Latin script, indigenous-language datasets and more.