Shared Task: Mozilla Common Voice Spontaneous Speech ASR

Quick links
- Registration Form
- Link to datasets
- Codabench page (to submit results during the testing period)
- Contact: sharedtask@mozillafoundation.org
Overview
Automatic speech recognition (ASR) has come a long way – but most systems are still trained on polished, read-aloud speech. So we set out to build a model that can handle the messy, beautiful reality of spontaneous responses and languages long ignored by mainstream tech. We’re raising the standards for accuracy scores and building speech technology that works for everyone, not just the few.
So along with Mozilla Data Collective’s new Spontaneous Speech datasets, we’re launching a shared task that challenges researchers and developers to push ASR further, across 21 underrepresented languages from Africa, Asia, Europe and the Americas.
The goal of this shared task is to promote the development of robust automatic speech recognition (ASR) systems for spontaneous speech in a number of lower-resource languages that have historically been underrepresented in speech technology research.
Many of the large, widely-used ASR datasets are either read speech (previous releases from Mozilla Common Voice), predominantly English (Switchboard datasets), or both (LibriSpeech, WSJ).
This shared task is based on the recently released spontaneous speech datasets from Mozilla Common Voice. In these datasets, participants freely respond to prompts, and the responses are transcribed and validated. The available spontaneous speech datasets represent a wide range of under-served language communities.
The task will evaluate systems based on overall performance, the best improvement over the baseline for any single language, and resource-constrained system performance, encouraging innovative approaches to handle the nuances of spontaneous speech recognition.
Tasks
The shared task includes one main task:
- Multilingual ASR Performance (Task 1) : The average Word Error Rate (WER) on all languages (excluding unseen languages).
and 3 subtasks:
- Best improvement on a single language (Task 2): The largest improvement on WER for any single language as compared to our baseline system's performance.
- Model-size constrained improvement on single language (Task3): The best WER improvement over baseline for any language with a model that is less than 500 MB in size.
- Unseen language ASR (Task 4): In addition to the set of languages that have training data (see the Data section below for details), we also include 5 languages for which no training data will be provided. We will provide the language names, but it is up to the participating teams to find additional data or leverage cross-lingual techniques. The best average WER on the set of unseen languages will win this task. Any additional data used must be able to be shared openly.
Data
The dataset (available on Mozilla Data Collective here) includes approximately 9 hours each of 21 total languages from Africa, the Americas, Europe, and Asia. Each language's dataset is available on the Mozilla Data Collective website. The following table displays all 21 languages:
No. | Language | ISO 639 | |
---|---|---|---|
Africa | |||
1 | Bukusu | bxk |
|
2 | Chiga | cgg |
|
3 | Nubi | kcn |
|
4 | Konzo | koo |
|
5 | Lendu | led |
|
6 | Kenyi | lke |
|
7 | Thur | lth |
|
8 | Ruuli | ruc |
|
9 | Amba | rwm |
|
10 | Rutoro | ttj |
|
11 | Kuku | ukv |
|
Americas | |||
12 | Wixárika | hch |
|
13 | Southwestern Tlaxiaco Mixtec | meh |
|
14 | Michoacán Mazahua | mmc |
|
15 | Papantla Totonac | top |
|
16 | Toba Qom | tob |
|
Europe | |||
17 | Gheg Albanian | aln |
|
18 | Cypriot Greek | el-CY |
|
19 | Scots | sco |
|
Asia | |||
20 | Betawi | bew |
|
21 | Western Penan | pne |
Additionally, for Task 4, we include 5 languages for which only test data will be released: Adyghe (ady), Kabardian (kbd), Basaa (bas), Puno Quechua (qxp), Ushojo (ush)
For these languages, teams are encouraged to consult potential useful data and/or leverage cross-lingual approaches. Any data used must be openly licensed to facilitate reproducibility.
Prizes
- Task 1: $5,000 USD
- Tasks 2-4: $2,000 USD each
Note: Contestants are not eligible to receive prizes if they are on the US Specifically Designated Nationals (SDN) list or if there are sanctions against the contestant’s country such that Mozilla is prohibited from paying them.
Registration
Please register for the competition through the following form.
Important dates
- 26th September, 2025: Train/Dev data released (via Mozilla Data Collective)
- 1st October, 2025: Shared task announced
- 1st December, 2025: Test data released
- 8th December, 2025: Deadline for submitting final results and system description paper
- 12th December, 2025: Winners announced
Submission
Once we release the test data on the 1st December (audio only), teams will have 1 week to submit their system's predicted transcriptions for the relevant tasks on the shared task CodaBench page. Submissions should take the form of a zip file containing three subdirectories, each with a set of one or more tsv files (1 per language being attempted):
- multilingual-general
- aln.tsv
- bew.tsv
- …
- small-model
- aln.tsv
- bew.tsv
- …
- unseen-langs
- ady.tsv
- bas.tsv
- …
The tsv files should have two columns, the first is the name of the audio file, and the second is the predicted transcription. The scores for each task are the average over all of the languages in the respective task (21 for general and small model, 5 for unseen). If you do not submit transcriptions for a given language, we will treat it as a blank transcription, resulting in a WER of 1.0. For the “Biggest improvement over baseline” task, we will automatically select the language from the multilingual-general and multilingual-small-model that improves the most over our baseline.
The team with the best performance in each task will be asked to submit their model and a script to perform inference, so that we can reproduce the results (to avoid the possibility of, e.g., post-editing predicted transcriptions to improve performance). If we are unable to reproduce the system performance, the team will be disqualified and we will request the model and inference script from the next-highest-scoring team.
System description papers
Each team must submit, in addition to their system's predicted transcriptions on the test data, a system description paper, between 4-8 pages (excluding acknowledgments and references). Please use the ACL Template.
Submissions should omit the author names for review.
Organisers
Programme chairs:
- Francis M. Tyers, Indiana University
- Robert Pugh, Mozilla Data Collective
- Anastasia Kuznetsova, Rev.com
- Jean Maillard, Meta
Programme committee:
- Antonios Anastasopoulos, George Mason University
- Kathy Reid, Australian National University
- Miguel del Rio, Rev.com
- Pooneh Mousavi, MILA
- Abteen Ebrahimi, University of Colorado, Boulder
- Ximena Gutierrez Vasquez, UNAM
Advisory committee:
- Emmanuel Ngué Um, University of Yaounde 1
- Belu Ticona, George Mason University
- Jennifer Smith, University of Glasgow
- Joyce Nabende, Makerere University
- Jonathan Mukiibi, Makerere University
- Elwin Huaman, Innsbruck University
- Yacub Fahmilda, Universitas Gadjah Mada
- Riska Legistari Febri, Universitas Gadjah Mada
- Murat Topçu, Okan University
- Rosario de Fátima Alvarez García, Universidad Autónoma Metropolitana
- Athziri Madeleine Vega Martínez, Universidad Nacional Autónoma de México
- Marlon Vargas Méndez, Escuela Nacional de Antropología e Historia
- Antonio Hayuaneme García Mijarez, Nación Wixárika
- Vivian Stamou, Archimedes Athena Research Centre
- Meesum Alam, Indiana University
- Jonathan Lewis-Jong, University of Oxford