Content moderation plays a pivotal role in shaping digital discourse and our democracies. It determines what content is visible, removed or amplified across online platforms.Current methods to moderate content online face well-documented challenges. Human moderators are vital for contextual understanding, but human review is costly, difficult to scale, and often psychologically harmful. Automated moderation addresses scalability to some extent, but falls short in transparency, interpretability and fairness.
LLMs—powerful generative AI systems trained on vast textual datasets—are increasingly used in content moderation for tasks such as automated removal, flagging, behaviour detection or user notification. This new ECNL report critically examines the growing use of LLMs in content moderation and explores how they can be used more responsibly to reduce harm to civil society, human rights and democratic discourse while challenging dominant narratives.
Here are the 6 key takeaways from the report:
- LLM moderation has strong impacts on human rights and the space for civil society. The research breaks down in detail the positive and negative impacts on core civic freedoms, including privacy, freedom of expression, freedom of information, freedom of opinion, freedom of peaceful assembly, freedom of association, non-discrimination, right to participation and remedy. Our goal for this assessment is to incite and guide future human rights impact assessment or fundamental rights impact assessments carried out by AI developers and deployers.
- LLMs exacerbate and accelerate existing risks. While they offer scalability and adaptability, LLMs also replicate many of the risks seen in earlier machine learning systems—exacerbating issues like systemic discrimination, censorship, and surveillance. Importantly, these systems often over- or underenforce content policies, disproportionately impacting marginalised groups. Legitimate speech may be silenced, while harmful content slips through, exposing already at-risk communities to hate and violence.
- LLM moderation is especially unreliable, and potentially harmful, for non-dominant languages and in the Global Majority. Most models are trained on data rooted in colonial and imperialist dynamics, leading to discriminatory outcomes, especially for communities in the Global Majority and marginalised groups. Debiasing efforts have shown limited effectiveness, and significant performance gaps persist between dominant (colonial) and underrepresented languages.
- Rights-based LLM moderation is possible. The best use cases of LLMs are not around content removal but around supporting broader content moderation policies and enforcement. For example, platforms could use LLMs to assist moderators by triaging simpler cases, summarising emerging trends or offering second opinions. They could support participatory design, generate clearer explanations for moderation decisions, speed up appeals and identify systemic errors.
- Concentration of power: a small number of foundational LLMs dictate global online speech norms—creating a form of “algorithmic monoculture”. Since most platforms fine-tune foundational models rather than developing their own, decisions made at the training stage of LLMs cascade down across multiple platforms, shaping how content is moderated online. This centralisation of content moderation risks reinforcing systemic biases and ideological homogeneity, limiting diversity in the content we see online and stifling alternative views.
- From the ground up: community-led initiatives can potentially offer a more inclusive path forward. Community-led initiatives in the Global Majority, focus on public-interest-driven AI development, culturally informed moderation, and decentralization. These models—though smaller in scale— demonstrate comparable performance in tasks like translation and sentiment analysis, highlighting the potential for more rights-based, participatory LLM development that does not rely on monopolistic providers.
We extend our sincere gratitude to everyone who generously contributed their invaluable time, insights, and expertise to the preparation of this report. Your thoughtfulness and creativity have greatly enriched the quality and depth of our findings. We thank Betsy Popken and Jonathan Stray of the UC Berkeley Human Rights Center; Corynne McSherry from the Electronic Frontier Foundation (EFF); Daniel Leufer and Eliska Pirkova from Access Now; Dave Willner of Stanford University; Dunstan Alison Hope; Justin Hendrix of Tech Policy Press; Mike Masnick of Techdirt; Paul Barrett from New York University; Sabina Nong of Stanford AI Alignment; Tarunima Prabhakar from Tattle; Vladimir Cortes; Evani Radiya-Dixit from the American Civil Liberties Union (ACLU); Lindsey Andersen from Business for Social Responsibility (BSR); Mona Elswah and Aliya Bhatia from the Center for Democracy and Technology (CDT); independent researcher and policy expert Luca Belli; Roya Pakzad from Taraaz; representatives of Meta’s Human Rights Team, the Policy and Safety Machine Learning Teams at Discord, and the Research Team at Jigsaw.
This work was made possible through the generous support of the Omidyar Network.
To facilitate easier access, we divided the report into individual downloadable sections, each addressing a particular right and preliminary recommendations. You can also read them independently, as each chapter examines the significant impact of content moderation on online dialogue.