FAQ: What kind of datasets can I publish on Mozilla Data Collective?
Our priority is technology that is more multilingual, multicultural, and multi-modal. We prioritise helping communities unlock content that is not on the web already, and prefer audio, image, and video formats, though we will also accept text documents that advance the above goals. Our expectation is that each dataset is (or can be) prepared in a way that enables its use in machine learning contexts, or is intended to be consumed in such a way for research, evaluation, training, or other similar endeavors. The specific details of how each dataset can be used is up to you, and set via terms on your dataset's corresponding datasheet.
Datasets should be organized in a way that makes sense for their contents and intended use. When you upload the dataset to Mozilla Data Collective, you will need to put the contents of your dataset into a .tar.gz format, and upload it as a single file.
Datasets must adhere to the Mozilla Data Collective terms of use. By uploading a dataset to Mozilla Data Collective, you are responsible for ensuring that you have the rights to distribute the dataset and that it does not contain any data in the Prohibited Data Content section of the terms.