How can a nearly century-old publisher remain relevant and continue to grow?

Panjebar Semangat, a weekly Javanese-language magazine established before Indonesian independence, is collaborating with Mozilla Data Collective to advance community-governed language dataset frameworks.

By: Yacub Fahmilda & Panjebar Semangat

Most international and national language datasets are well resourced  and widely accommodated in technology for AI and other possible forms of tech products. Meanwhile, many local languages remain less visible, low-resource and under-represented in technology advancement. This gap leads to alienating some native speakers of small communities for their own languages and raises demand for linguistic justice.

Adapting under-represented language to the technological ecosystem is one of the key points. A community should have their own space to participate and to control over their data. This adaptation idea has been taken by Panjebar Semangat, a weekly Javanese-language magazine established pre-independence of Indonesia. This magazine is recognized as frontier media which remains relevant in today’s and future technological contexts.

Taken by Yustri Agung Prastiyono & Masyhuri Farhan.

The Long Journey of Panjebar Semangat

Panjebar Semangat, a leading dataset contributor from Indonesia, may truly “spread the collective spirit”, reflecting the meaning of its name of “Panjebar Semangat”, signifying to initiate, encourage, and inspire others. Dr. Soetomo was one of the key figures establishing Panjebar Semangat on September 2nd 1933 with his colleague Imam Supardi who served as the executor of the initiative. Since then, this magazine publisher has continuously contributed to Javanese communities across dynamic social, political and cultural transformations,  as outlined in the chronology below. 

  1. Pre-independence (1933-1942)

During the Dutch East Indies colonial period, Javanese language was a means of power to evoke nationalism across Javanese communities as colonizers did not understand it. This spirit grew, contributing them to the idea of integrating an archipelagic nation into Indonesia as one nation, a national identity. 

  1. Post-independence (after 1949)

There was deep silence between 1942 and 1945 as Panjebar Semangat was shut down during Japan occupation. In 1949 there was a second military aggression of the Dutch, and later the publisher gained power back after the independence declaration. During this post-independence, Panjebar Semangat played a crucial role in sustaining the spirit of nationalism and fostering nation-building amongst Javanese communities.

  1. Industrialisation and transmigration (1990s-2000s)

This period marked a critical phase for Javanese community, as the number of the speakers started to decline amid industrialisation and state-led transmigration programs. A large number of Javanese people moved to other regions for new settlements, such as Aceh, Sumatera, Lampung, Jakarta, Kalimantan, Sulawesi, Nusa Tenggara, and other parts of eastern Indonesia. Panjebar Semangat took an initiative to preserve the language and culture through periodicals, reaching transmigrants of Javanese communities and presenting Javanese perspectives through arts, cultural, and literary works. 

  1. The digital era (from 2018 onwards) 

Over the last five years, Panjebar Semangat has taken a strategic action to embrace technology and mediate their valuable readers by providing the magazines in digital formats. Interestingly, some subscribers who come from overseas, including the Netherlands, seek to learn Javanese. This shift demonstrates that technological adaptation allows a local publisher to gradually expand to a global scale.

Taken by Yustri Agung Prastiyono & Masyhuri Farhan.

More recently, Panjebar Semangat has extended this adaptation into the field of artificial intelligence through community-led governance. The publisher has shared nearly  2 million words from publications over the last three years through Mozilla Data Collective. This number represents only a small part of their total archive, indicating even greater potential for future contributions and collaborations. Through this contribution, the publisher encourages those who are working on language and cultural minorities, to take real and measurable action in technology.

Staying Relevant in a Changing World

Adapting to dynamic social changes, including technology and people's lifestyle, allows Panjebar Semangat to stay relevant for present and future Javanese communities. Rather than allowing unregulated data extraction, the publisher curated and shared texts for a dataset titled “Korpus Majalah Bahasa Jawa Panjebar Semangat”, intended for data training in NLP tasks and other socially valuable technological products. This contribution represents a small-scale yet globally connected effort.

Beyond language communities, language-learning platforms such as Italki, HelloTalk, Tandem, and other educational technologies may also benefit from this shared dataset. This reflects Panjebar Semangat’s institutional resilience and courage in adapting across historical periods, from local print culture to global digital ecosystems.

Panjebar Semangat is one of many language and cultural activists across the Indonesian archipelago who share a similar spirit, commitment, and motivation. Panjebar Semangat is a role model which I do believe other communities can follow to take part in ethical, impactful and community-governed data initiatives.

From Archive to Community Governance

The rapid expansion of AI systems has exposed a structural weakness in global language datasets: minority and indigenous languages are included without governance, consent, or reciprocity. While open data initiatives have accelerated innovation, they have also unintentionally normalized extraction-based practices that marginalize the very communities they claim to include.

In response to this challenge, Panjebar Semangat presents a practical counter-model through its collaboration with Mozilla Data Collective through community-centered data governance. The “Korpus Majalah Bahasa Jawa Panjebar Semangat” consists of curated Javanese-language text, treated not as raw data but as a governed linguistic asset–editorially vetted, context-rich, and incrementally released. This initiative demonstrates that language datasets can be treated as cultural infrastructure, not disposable inputs. Panjebar Semangat advances a Community-Governed Language Dataset (CGLD) framework that prioritizes community authority, purpose-bound usage and benefit reciprocity. The model aligns directly with Mozilla’s mission to promote trustworthy, human-centered AI while reducing ethical, reputational, and regulatory risks.

For Mozilla Data Collective, CGLD offers a scalable governance layer that strengthens dataset legitimacy without undermining openness. It reframes “open” not as unrestricted extraction, but as accountable collaboration. Institutions that adopt this approach gain more resilient datasets, deeper community trust, and long-term alignment with emerging AI governance norms.

The strategic question is no longer how to include more languages in AI, but how to include them without reproducing historical inequities. Panjebar Semangat’s collaboration with Mozilla Data Collective provides a real replicable answer, one that positions Mozilla not just as a data aggregator, but as a global standard-setter for ethical language AI.