Community Authors

On contributing my Thorsten-Voice voice datasets

Thorsten Müller has created five TTS voice datasets totaling 40 hours of German speech data. In this community-authored post, he speaks to the importance of sharing his voice.

MDC Community

26 Mar 2026 — 6 min read

Photo by Amin Asbaghipour

Autor: Thorsten Müller
(Sie können diesen Beitrag auch auf Deutsch lesen.)

Was it my original idea to publish my voice under a CC0 license for everyone to use freely? Honestly: no. In fact, it was nearly quite the opposite.

When I started diving deeper into speech technology in 2019, I had a very different goal in mind. I wanted to build my own voice assistant. But locally, without any cloud dependency. I didn’t want a microphone sitting in my home that constantly communicates with some server on the internet and streams my voice data somewhere else. Given how sensitive voice data and personal talk is, this dependency on US-based cloud services felt uncomfortable to me.

My fascination with voice interaction, however, goes much further back. As a teenager in the 1990s, watching shows like Knight Rider or Star Trek, I was already fascinated by the idea of humans talking to machines and machines responding. Back then, it was pure science fiction and Hollywood stories. Decades later, it suddenly became real.

So I started exploring open-source projects that could fit my idea, including Mycroft. In that environment, I also met people who are deeply committed to open technologies and data. Among them Kathy Reid, who has been advocating for open systems for many years and is now active in Mozilla Data Collective.

Technically, everything was exciting. But the quality of speech or TTS (text to speech) voices, especially in German, was not great. Those classic eSpeak-style voices: fast, efficient, but completely robotic. Fine for testing, but nothing you would actually want to listen to every day. At some point while reading Mycroft documentation, I came across the idea that I could train a synthetic voice using my own recordings. If you read that today, in 2026, you might think. Just record a few seconds and you’re done. Back in 2019, it was a very different story. The recommendation was: record at least 16 hours of clean, neutral audio.

So I started recording. Evenings. Weekends. Sentence by sentence. Month after month. And despite all the motivation and excitement I made a lot of mistakes. I used a cheap USB headset instead of a high quality one. I tried to speak as “clean” as possible and ended up losing any natural flow. Simple sentences sounded more like an exaggerated news anchor than a human being. After thousands of recordings (around 10.000) I trained my first model. My computer was running for days. And the result?

Well... you could kind of recognize my voice. But it was far from good. Alongside the speech, there was noise, humming, and echo. So I asked the Mycroft community for help and two interesting things happened. First, there was real interest. German-speaking community members asked whether I planned to release the recordings or the trained voice. Interest? In my voice? Even though I know there are many better voices out there. That felt... surprisingly good.

And second, something much more important happened. Dominik Kreutz reached out and offered to analyze my recordings. His message was simple: “Send me your recordings, I’ll take a look.”And I thought: wait a second. The whole reason I started this project was to avoid putting my voice on the internet. And now I’m about to send it to a stranger from an online community? I had to think about that for a few days. In the end, I made a conscious decision: I chose to trust. So I sent him the data. And i never had to regret this decision.

And his feedback was tough, but honest: the recordings were not good. Some might be salvageable, but most were not usable.

That hit hard! I had spent months recording and now I had to accept that much of it was basically useless. Only when I listened at full volume did I notice the real issues: noise, interference, inconsistent setup, unnatural speech patterns.

That was the moment I learned one of the most important lessons in this whole journey: Shit in, shit out. Or more politely: the quality of your data defines the quality of your results.

If your input data contains noise the machine learning will take that serious and reproduce that noise in TTS output. At that point, I had a choice: stop or start over. The tech enthusiast in me was already hooked, so I kept going. I bought better equipment, built a small recording booth with wood, carpet and acoustic foam panels, and spent the next months of free time recording again. At the same time, I kept sharing progress with the community and realized that the interest in open German speech data was real.

And then came the big question. What do I do with all of this? Do I publish nothing? Do I publish only a trained model? Or do I publish the raw data as well? And if I publish it. With restrictions or completely open?

I was very aware of what this meant. Speech technology was clearly going to become more important that was obvious even then. And I knew if I release my voice, I give up control. Maybe I even limit future options for myself. For example, voice-based authentication systems e.g. opening doors, accessing sensitive systems all of that becomes questionable if your voice is publicly available.

And of course, the obvious concerns came up. What if someone uses my voice for things I strongly disagree with? Political content? Extremism? Fraud? I didn’t feel fear, but I definitely felt respect for these risks.

After thinking about it for a few days and discussing it with family and friends, I made my decision. If I do this I do it properly. CC0! No restrictions. Just like Mozilla Common Voice. I didn’t want to exclude anyone. Not research, not commercial use. I didn’t want to start adding “yes, but only if...”. If open then truly open.

Looking back now, it’s still surprising to me what has grown out of “Thorsten-Voice” over the years. Both in very positive ways and in some uncomfortable ones.

At one point, someone sent me a video from the so-called “Reichsbürger” scene in Germany. My voice was used in a context that I personally strongly reject.

That was the moment when a theoretical risk became reality. Not a great feeling. And still I have never regretted my decision. Because the positive impact clearly outweighs it.

I have received so many encouraging messages. A computer science teacher in Berlin told me that his students can now build real speech systems locally, without cloud dependency. The Swiss “Lernstick” uses voices like this for accessibility in education. People use it in smart home setups. Someone told me my voice is speaking from the ceiling of a house on a finca in Mallorca. And there
are use cases in screen readers supporting people with visual or reading impairments.

These are the moments where you realize: it actually makes a difference.

I always include a personal note with my datasets. Not as a restriction but as a personal statement, as I cannot control what is done and said with my voice. But I can communicate what I as a person stand for.

"I believe that all people are equal, regardless of gender, sexual orientation, religion, skin color, or where they were born. I believe in a global world where everyone is welcome. And I believe that knowledge and education should be freely accessible to everyone. And I believe that, as humans, we are capable of achieving amazing things if we trust each other."

Today, the situation has evolved even further. Creating synthetic voices has become much easier. You no longer need 16 hours of audio. In many cases, a few seconds are enough. A simple voice message can be sufficient.

This creates both opportunities and challenges. Which is exactly why I believe open, transparent, and ethically sourced datasets are so important and why platforms like Mozilla Data Collectiv e matter. Many modern AI systems have been trained on data where the origin is unclear and consent is questionable. Open data provides a real alternative.

Organizations like Mozilla have been laying the groundwork for this for years with projects like Common Voice and now the Data Collective. Over time, I’ve had the chance to connect with people in this space who are deeply committed to openness and responsible data practices. Maybe this short story encourages others to contribute in their own way. I can say: it doesn’t hurt.

Looking back now, after several years, I can say with full conviction: I have never regretted donating my voice. I would do it again. Despite the risks. Because I strongly believe:

If we trust each other more,
if we share knowledge,
if we collaborate openly,
then we can achieve a lot — together.

And that is exactly why I chose to share my voice.

On contributing my Thorsten-Voice voice datasets

MDC Community

Read more

On contributing my Thorsten-Voice voice datasets [DE]

Why Whisper Still Struggles with Australian English - and What We Did About It

How can a nearly century-old publisher remain relevant and continue to grow?

Call for proposals: MDC is commissioning mission-aligned datasets!