Media Bias Analysis – slanted news coverage identification

CONTACT

Anastasia Zhukova
Timo Spinde
Dr. Felix Hamborg

LINKS

▸ GitHub news-please
▸ GitHub NewsBird (MNA)
▸ GitHub Giveme5W1H
▸ GitHub XCoref
▸ GitHub NewsWCL50
▸ GitHub DA-ROBERTa
▸ GitHub NewsMTSC
▸ Zenodo POLUSA
▸ GitHub Lexically diverse CDCR datasets
▸ Zenodo MBIC
▸ GitHub MBIB
▸ GitHub BABE
▸ GitHub domain-adapted bias detection

The following group of projects seeks to (semi-)automatically identify slanted news coverage, i.e., media bias, in news articles. Fundamentally, we aim to approach the issue of media bias by combining the expertise of two academic disciplines: computer science and the social sciences. Specifically, we employ automated and efficient text analysis methods, such as natural language processing (NLP), with manual and effective media bias analysis concepts, such as frame analysis.

news-please is an open source, easy-to-use news crawler that extracts structured information from almost any news website. It can follow recursively internal hyperlinks and read RSS feeds to fetch both most recent and also old, archived articles.
NewsBird is a news aggregator that implements matrix-based news aggregation (MNA). In MNA, users explore different perspectives in news coverage by visually inspecting a two-dimensional matrix, which, for example, shows the main media perspective within one country about another country.
Givem5W1H is a system that extracts the journalistic five W and one H (5W1H) questions from news articles, i.e., who did what, when, where, why, and how.
XCoref is an end-to-end cross-document coreference resolution (CDCR) system aiming at resolving entity, event, and more abstract concepts with a word choice and labeling diversity from a set of related articles.
DA-RoBERTa is a new state-of-the-art transformer-based model adapted to the media bias domain which identifies sentence-level bias by outperforming all previous models.
newsalyze is the first system to automatically identify and then communicate person-targeting forms of bias in news articles reporting on policy issues by 1) applying XCoref to resolve mentions referring to the person entities, 2) extracting frames by identify how semantic concepts are portrayed in a given news text, e.g., positively or negatively, 3) visualizing how the identified persons are portrayed in a set of related news articles. A prototype scheme presented by Felix Hamborg in his Ph.D. thesis "Towards Automated Frame Analysis : Natural Language Processing Techniques to Reveal Media Bias in News Articles"
Domain-adaptive Pre-training Approach for Language Bias Detection in News is a new state-of-the-art transformer-based model adapted to the media bias domain which identifies sentence-level bias called DA-RoBERTa, which performs a challenging task of detecting biased word choices full of linguistic complexity in a setup of the lack of representative gold-standard corpora.

Additionally, we create and annotate datasets that facilitate research on media bias identification.

NewsWCL50 is an evaluation dataset for methods seeking to identify bias by word choice and labeling, e.g., CDCR with high lexical diversity.
NewsMTSC ((Multi-)Target-dependent Sentiment Classification) is a dataset for target-dependent sentiment classification (TSC) on news articles reporting on policy issues.
Media bias teaching platform is a survey system that allows combination surveys with a tool testing and annotation section to get information about the effectiveness of visualizing bias.
POLUSA is a dataset that consists of 0.9M political news articles balanced by time and outlet popularity.
BABE (Bias Annotations By Experts) is a robust and diverse dataset for text classification into biased/non-biased and opinionated/factual/mixed annotated by trained experts. Our data set offers better annotation quality and higher inter-annotator agreement than existing work. It consists of 3,700 sentences balanced among topics and outlets, containing media bias labels on the word and sentence level.
MBIC (Media Bias Including Characteristics) is a matrix-based methodology to crowdsource such data using a self-developed annotation platform and a dataset of the first sample of 1,700 statements representing various media bias instances. The dataset aims to fill in current media bias detection research to create a robust, representative, and diverse dataset containing biased words and sentences annotated. In particular, existing datasets do not control for the individual background of annotators, which may affect their assessment and, thus, represents critical information for contextualizing their annotations.
CDCR-GLUE is a collection of diverse datasets for cross-document coreference resolution (CDCR) that aims at evaluating the models on understanding different types and strengths of coreference relations, which often can be affected by the context-specific relations due to bias by word choice and labeling.
MBIB is the first Media Bias Identification Benchmark task and dataset collection, i.e., a comprehensive benchmark that groups different types of media bias (e.g., linguistic, cognitive, political) under a common framework to test how prospective detection techniques generalize. After reviewing 115 datasets, we select nine tasks and carefully propose 22 associated datasets for evaluating media bias detection techniques. We evaluate MBIB using state-of-the-art Transformer techniques (e.g., T5, BART).

Our group has been awarded a 3-year research grant by the Heidelberger Akademie der Wissenschaften for our interdisciplinary research on media bias.

Check out our ongoing projects that are available as B.Sc./M.Sc. theses here.

Publications

The Promises and Pitfalls of LLM Annotations in Dataset Labeling: a Case Study on Media Bias Detection
T. Horych, C. Mandl, T. Ruas, A. Greiner-Petter, B. Gipp, A. Aizawa, and T. Spinde
in Findings of the 2025 Conference of the The Nations of the Americas Chapter of the Association for Computational Linguistics: NAACL 2025, Albuquerque, USA, 2025.
PDF
Automated Detection of Media Bias: From the Conceptualization of Media Bias to its Computational Classification,
T. Spinde
Springer Vieweg Wiesbaden, 2025.
PDF
MAGPIE: Multi-Task Analysis of Media-Bias Generalization with Pre-Trained Identification of Expressions
T. Horych, M. Wessel, J. P. Wahle, T. Ruas, J. Wassmuth, A. Greiner-Petter, A. Aizawa, B. Gipp, and T. Spinde
in Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation, 2024.
PDF
What's in the News? Towards Identification of Bias by Commission, Omission, and Source Selection (COSS)
A. Zhukova, T. L. Ruas, F. Hamborg, K. Donnay, and B. Gipp
in 2023 ACM/IEEE Joint Conference on Digital Libraries (JCDL), 2023.
PDF
Introducing MBIB – the first Media Bias Identification Benchmark Task and Dataset Collection
M. Wessel, T. Horych, T. Ruas, A. Aizawa, B. Gipp, and T. Spinde
in Proceedings of 46th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '23), New York, NY, USA, 2023.
PDF
A Domain-adaptive Pre-training Approach for Language Bias Detection in News
D. Krieger, T. Spinde, T. Ruas, J. Kulshrestha, and B. Gipp
in 2022 ACM/IEEE Joint Conference on Digital Libraries (JCDL), 2022.
PDF
Towards Evaluation of Cross-document Coreference Resolution Models Using Datasets with Diverse Annotation Schemes
A. Zhukova, F. Hamborg, and B. Gipp
in Proceedings of the 13th Language Resources and Evaluation Conference, 2022.
PDF
Towards Automated Frame Analysis: Natural Language Processing Techniques to Reveal Media Bias in News Articles
F. Hamborg
PhD Thesis, University of Konstanz, Dept. of Computer and Information Science, 2022.
PDF
XCoref: Cross-document Coreference Resolution in the Wild
A. Zhukova, F. Hamborg, K. Donnay, and B. Gipp,
in Proceedings of the iConference 2022, 2022.
PDF
Exploiting Transformer-based Multitask Learning for the Detection of Media Bias in News Articles
T. Spinde, J. Krieger, T. Ruas, J. Mitrovic, F. Goetz-Hahn, A. Aizawa, and B. Gipp
in Proceedings of the iConference 2022, 2022
PDF
Towards Target-dependent Sentiment Classification in News Articles
F. Hamborg, K. Donnay, and B. Gipp
in Proceedings of the iConference 2021, 2021
PDF
Newsalyze: Effective Communication of Person-Targeting Biases in News Articles
F. Hamborg, K. Heinser, A. Zhukova, K. Donnay, and B. Gipp
in Proceedings of the ACM/IEEE Joint Conference on Digital Libraries (JCDL), 2021.
Identification of Biased Terms in News Articles by Comparison of Outlet-specific Word Embeddings
T. Spinde, F. Hamborg, L. Rudnitckaia, and B. Gipp
in Proceedings of the iConference 2021, 2021.
PDF
MBIC – A Media Bias Annotation Dataset Including Annotator Characteristics
T. Spinde, L. Rudnitckaia, K. Sinha, F. Hamborg, B. Gipp, and K. Donnay
in Proceedings of the 16th International Conference (iConference 2021), 2021.
PDF
How Can the Perception Of Media Bias in News Articles Be Objectively Measured
T. Spinde, C. Kreuter, W. Gaissmaier, F. Hamborg, B. Gipp, and H. Giese
in Proceedings of the of the ACM/IEEE Joint Conference on Digital Libraries (JCDL), 2021.
PDF
Automated identification of bias inducing words in news articles using linguistic and context-oriented features
T. Spinde, L. Rudnitckaia, J. Mitrovic, F. Hamborg, M. Granitzer, B. Gipp, and K. Donnay
Information Processing & Management, vol. 58, iss. 3, 2021.
PDF
Towards A Reliable Ground-Truth For Biased Language Detection
T. Spinde, D. Krieger, M. Plank, and B. Gipp
in Proceedings of the ACM/IEEE Joint Conference on Digital Libraries (JCDL), 2021.
PDF
Concept Identification of Directly and Indirectly Related Mentions Referring to Groups of Persons
A. Zhukova, F. Hamborg, K. Donnay, and B. Gipp
in Proceedings of the iConference 2021, 2021.
PDF
Media Bias in German News Articles: A Combined Approach
T. Spinde, F. Hamborg, and B. Gipp
in Proceedings of the 8th International Workshop on News Recommendation and Analytics (INRA 2020), Virtual event, 2020.
PDF
An Integrated Approach to Detect Media Bias in German News Articles
T. Spinde, F. Hamborg, and B. Gipp
in Proceedings of the ACM/IEEE Joint Conference on Digital Libraries (JCDL), 2020.
PDF
Newsalyze: Enabling News Consumers to Understand Media Bias
F. Hamborg, A. Zhukova, and B. Gipp
in Proceedings of the ACM/IEEE Joint Conference on Digital Libraries (JCDL), 2020.
PDF
Enabling News Consumers to View and Understand Biased News Coverage: A Study on the Perception and Visualization of Media Bias
T. Spinde, F. Hamborg, A. Becerra, K. Donnay, and B. Gipp
in Proceedings of the ACM/IEEE Joint Conference on Digital Libraries (JCDL), 2020.
PDF
Automated Identification of Media Bias by Word Choice and Labeling in News Articles
F. Hamborg, A. Zhukova, and B. Gipp
in Proceedings of the ACM/IEEE Joint Conference on Digital Libraries (JCDL), 2019.
PDF
Illegal Aliens or Undocumented Immigrants? Towards the Automated Identification of Bias by Word Choice and Labeling
F. Hamborg, A. Zhukova, and B. Gipp
in Proceedings of the iConference 2019, 2019
PDF
Giveme5W1H: A Universal System for Extracting Main Events from News Articles
F. Hamborg, C. Breitinger, and B. Gipp
in Proceedings of the 13th ACM Conference on Recommender Systems, 7th International Workshop on News Recommendation and Analytics (INRA 2019), 2019.
PDF
Automated Identification of Media Bias in News Articles: An Interdisciplinary Literature Review
F. Hamborg, K. Donnay, and B. Gipp
International Journal on Digital Libraries (IJDL), 2018
PDF
Bias-aware News Analysis using Matrix-based News Aggregation
F. Hamborg, N. Meuschke, and B. Gipp
International Journal on Digital Libraries (IJDL), 2018
PDF
Extraction of Main Event Descriptors from News Articles by Answering the Journalistic Five W and One H Questions
F. Hamborg, C. Breitinger, M. Schubotz, S. Lachnit, and B. Gipp
in Proceedings of the ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL), 2018
PDF
news-please: A Generic News Crawler and Extractor
F. Hamborg, N. Meuschke, C. Breitinger, and B. Gipp
in Proceedings of the 15th International Symposium on Information Science, 2017
PDF