Your Data, Your Rules: A Community Workshop for Dataset Governance

Your Data, Your Rules: A Community Workshop for Dataset Governance
Photo by paul milley / Unsplash

A practical guide for communities creating datasets together—no legal expertise required. Based on our data governance workshop at Mozilla Festival Zambia 2024


Your community has created something valuable: a dataset. Maybe it's voice recordings in your language. Maybe it's traditional knowledge, local photographs, or cultural stories. Whatever it is, you built it together.

Now comes the hard question: What happens to it next? Who gets to use it? For what purposes? Do they need to pay? Credit you? Ask permission first?

These aren't just legal questions—they're questions about power, values, and what your community wants to see in the world. But you don't need to be a lawyer to figure this out. You just need to have honest conversations with your community.

This guide walks you through a participatory workshop to do exactly that.


Before You Start

Who should be in the room? Bring together people who contributed to the dataset, community leaders, and anyone who cares about how the data gets used. Aim for 8-20 people—enough for diverse perspectives, small enough for real conversation.

What you'll need:

  • A facilitator (can be anyone comfortable guiding discussion)
  • Colored cards or sticky notes (green, yellow, red)
  • A large wall or table for sorting
  • About 2-3 hours
  • Snacks (always snacks)

Part 1: The Feeling Check (45 minutes)

Before diving into legal language, start with gut reactions. This exercise surfaces what your community actually cares about—which might surprise you.

How it works

Read out each scenario below, one at a time. Give people 30 seconds to think silently, then everyone holds up a card:

  • 🟢 Green: This is good / I like it
  • 🟡 Yellow: This is okay / I don't mind
  • 🔴 Red: This is bad / I don't like it

After each reveal, spend 2-3 minutes discussing: Why did people feel differently? What concerns came up?

The Scenarios

Scenario 1: A large tech company downloads your data and uses it to expand free tools like translators that anyone can use.

Scenario 2: A large tech company downloads your data and uses it to build commercial services, charging people a fee to use them.

Scenario 3: A local nonprofit uses your data to build tools for social good in your community.

Scenario 4: Researchers use your dataset extensively and produce important findings, but they never cite, credit, or acknowledge your community publicly.

Scenario 5: Engineers use your dataset to build somewhat controversial tools—like software that guesses someone's gender or nationality from their name or voice.

Scenario 6: A local startup uses your data to build commercial products in your community's language—creating jobs locally, but also making profit.

Scenario 7: People who contribute data get paid when they create it.

Scenario 8: People who contribute data receive non-cash benefits: training programs, computing resources, or job opportunities.

Scenario 9: A government uses your data to improve helpful citizen services. Later, after an election, a new government uses the same data to try to identify and monitor certain communities.

Scenario 10: An outside company hosts a copy of your dataset for free—saving you money, but you lose visibility into who downloads it.

Scenario 11: Your licensing rules become so complicated that almost no one uses the dataset anymore.

After all scenarios

Take 10 minutes to reflect together:

  • Where did we mostly agree?
  • Where were we most divided?
  • What themes kept coming up?

Part 2: Building Your License (60 minutes)

Now that you've surfaced your values, let's get concrete. A data license is just a document that tells people: "Here's what you can and can't do with our stuff."

You're going to build one together—not by writing legal text, but by sorting what matters to you.

The Building Blocks

Write each of these on separate cards and spread them on a table:

Who can use it?

  • Anyone
  • Only nonprofits
  • Only organizations in our region
  • Only small organizations (under X employees)
  • Only organizations we approve one-by-one
  • Only researchers/academics

What can they use it for?

  • Anything at all
  • Only non-commercial purposes
  • Only research
  • Only purposes that benefit our community
  • Only specific applications we approve

What can they NOT use it for?

  • Military or weapons
  • Surveillance
  • Tools that could discriminate against people
  • Training AI without explicit permission
  • Purposes that harm our community

Do they need to give us credit?

  • Yes, always—publicly and prominently
  • Yes, but just in documentation
  • Not required

Do they need to pay?

  • Never—always free
  • Free for nonprofits, paid for corporations
  • Everyone pays a licensing fee
  • Sliding scale based on organization size
  • Percentage of any revenue they make using it

What happens if someone breaks the rules?

  • We can revoke their access
  • They face legal penalties
  • We publicly name them
  • We haven't thought about this yet

Who decides about edge cases?

  • Our community makes decisions together
  • A small committee we elect
  • The organization hosting the data
  • Decisions are automatic based on written rules

How to sort them

Create three zones on your table or wall:

  • WANT: This should definitely be in our license
  • DON'T WANT: We don't want this
  • UNSURE: We need to discuss more / it depends

Work through the cards together. For each one, discuss briefly and place it in a zone. It's okay to have lots in "unsure"—that's where the real learning happens.


Part 3: The Hard Trade-offs (30 minutes)

Here's the uncomfortable truth: you can't have everything. Some choices conflict with others.

Discuss these tensions as a group:

Control vs. Impact The more restrictions you add, the fewer people will use your dataset. A dataset nobody uses doesn't help anyone—but a dataset used in harmful ways doesn't either. Where's your balance point?

Money vs. Mission
Charging fees can sustain your community and feels fair. But it might price out the small local organizations you most want to help. How do you handle this?

Simplicity vs. Precision A simple license is easy to understand and comply with. A detailed license protects against more misuse scenarios. Which matters more to you?

Trust vs. Verification Do you trust users to follow the rules, or do you need ways to check? Verification takes resources. Is it worth it?


Part 4: What's Your Starting Point? (15 minutes)

Based on everything you've discussed, try to summarize your community's position in plain language. Something like:

"We want our dataset to be used freely by nonprofits and researchers, but commercial companies should pay a fee. Everyone must credit our community. We don't want the data used for surveillance or discrimination. A small committee will review requests from large corporations."

This isn't legal text—it's a values statement. You can later work with legal experts to turn it into proper licensing language, but this gives them clear direction.


What Comes Next

Your workshop outputs—the scenario reactions, the sorted cards, and your values statement—are gold. Here's what to do with them:

  1. Document everything. Take photos of your sorted cards. Write up the values statement.
  2. Share with your community. Not everyone could be in the room. Get feedback from the broader group.
  3. Find legal help. Organizations like Creative Commons, Mozilla, and various digital rights groups can help translate your values into actual license language.
  4. Revisit regularly. Your values might change. Technology will definitely change. Plan to have this conversation again in a year or two.

Common Questions

"Can we just use an existing license like Creative Commons?"

You can! Existing licenses are simpler and more widely understood. But they might not capture everything your community cares about. Many communities use a standard license as a base and add specific terms on top.

"What if someone ignores our license?"

This is hard. Enforcement depends on your resources and the legal system in your jurisdiction. Some communities focus on building relationships and trust rather than enforcement. Others partner with larger organizations that can help with legal action if needed.

"What if we can't agree?"

That's normal. Start with what you do agree on. For contested areas, consider: Can you pilot different approaches? Can you revisit the decision in six months with more information? Is there a compromise position?

"Do we need a lawyer for this?"

For the workshop? No. For the final license document? Ideally yes, especially if you want it to be legally enforceable. But going to a lawyer with clear values and priorities is much better than going with nothing.


A Final Thought

Your dataset exists because your community came together to create something. The license should reflect that same spirit—collective, intentional, and rooted in what you actually care about. There's no perfect answer. There's just the answer that's right for your community, right now.

Good luck! Your data, your rules.


This workshop framework is adapted from Mozilla Data Collective work on alternative data licenses. For support developing your license, reach out to us at mozilladatacollective@mozillafoundation.org