FAQ: Why can't I re-host or share Common Voice datasets that I download from MDC?

Mozilla Data Collective wants to provide effective, ethical stewardship support for datasets. This is challenging or impossible when datasets are mirrored or split across a range of forks. Our principles include trying to enable good governance, e.g. the right to be forgotten, as much as possible, which means we need to be able to maintain robust versioning and communication channels. CC0 remains the license for computational use, whilst not allowing mirroring the datasets is a platform term.

Common Voice communities are now starting to collect data under a range of difference licenses. Some of these licenses will include conditions that require stronger transparency measures on downloads and use. This is only possible with safeguards around redistribution.

If you have an edge case that might need support, we're always happy to connect and discuss. You can reach us at commonvoice@mozilla.com or mozilladatacollective@mozillafoundation.org.