FAQ: Can I get the Common Voice or other MDC datasets from other platforms like GitHub or Hugging Face?

We have no plans to host Mozilla community datasets through third parties at this time, as it makes governance and stewardship extremely challenging. Mozilla community datasets, including Mozilla Common Voice datasets are exclusively available through MDC for this reason. Our new terms reflect this. Some of our contributors’ open datasets are available in other places.

We want to make sure that those of you who enjoy Hugging Face’s great model and training features can still use them easily, so we’ve published an API reference page with instructions on how to create access credentials and download datasets programmatically.