FAQ: What kind of datasets can I publish on Mozilla Data Collective?
Our priority is technology that is more multilingual, multicultural, and multi-modal. We prioritise helping communities unlock content that is not on the web already, and prefer audio, image, and video formats, though we will also accept text documents that advance the above goals. Our expectation is that each dataset is (or can be) prepared in a way that enables its use in machine learning contexts, or is intended to be consumed in such a way for research, evaluation, training, or other similar endeavors. The specific details of how each dataset can be used is up to you, and set via terms on your dataset's corresponding datasheet.