Media Bias Analysis2024-05-16T13:02:22+02:00

Media Bias Analysis – slanted news coverage identification

The following group of projects seeks to (semi-)automatically identify slanted news coverage, i.e., media bias, in news articles. Fundamentally, we aim to approach the issue of media bias by combining the expertise of two academic disciplines: computer science and the social sciences. Specifically, we employ automated and efficient text analysis methods, such as natural language processing (NLP), with manual and effective media bias analysis concepts, such as frame analysis.

  • news-please is an open source, easy-to-use news crawler that extracts structured information from almost any news website. It can follow recursively internal hyperlinks and read RSS feeds to fetch both most recent and also old, archived articles.
  • NewsBird is a news aggregator that implements matrix-based news aggregation (MNA). In MNA, users explore different perspectives in news coverage by visually inspecting a two-dimensional matrix, which, for example, shows the main media perspective within one country about another country.
  • Givem5W1H is a system that extracts the journalistic five W and one H (5W1H) questions from news articles, i.e., who did what, when, where, why, and how.
  • XCoref is an end-to-end cross-document coreference resolution (CDCR) system aiming at resolving entity, event, and more abstract concepts with a word choice and labeling diversity from a set of related articles.
  • DA-RoBERTa is a new state-of-the-art transformer-based model adapted to the media bias domain which identifies sentence-level bias by outperforming all previous models.
  • newsalyze is the first system to automatically identify and then communicate person-targeting forms of bias in news articles reporting on policy issues by 1) applying XCoref to resolve mentions referring to the person entities, 2)  extracting frames by identify how semantic concepts are portrayed in a given news text, e.g., positively or negatively, 3) visualizing how the identified persons are portrayed in a set of related news articles. A prototype scheme presented by Felix Hamborg in his Ph.D. thesis “Towards Automated Frame Analysis : Natural Language Processing Techniques to Reveal Media Bias in News Articles”
  • Domain-adaptive Pre-training Approach for Language Bias Detection in News is a new state-of-the-art transformer-based model adapted to the media bias domain which identifies sentence-level bias called DA-RoBERTa, which performs a challenging task of detecting biased word choices full of linguistic complexity in a setup of the lack of representative gold-standard corpora.

Additionally, we create and annotate datasets that facilitate research on media bias identification.

  • NewsWCL50 is an evaluation dataset for methods seeking to identify bias by word choice and labeling, e.g., CDCR with high lexical diversity.
  • NewsMTSC ((Multi-)Target-dependent Sentiment Classification) is a dataset for target-dependent sentiment classification (TSC) on news articles reporting on policy issues.
  • Media bias teaching platform is a survey system that allows combination surveys with a tool testing and annotation section to get information about the effectiveness of visualizing bias.
  • POLUSA is a dataset that consists of 0.9M political news articles balanced by time and outlet popularity.
  • BABE (Bias Annotations By Experts) is a robust and diverse dataset for text classification into biased/non-biased and opinionated/factual/mixed annotated by trained experts. Our data set offers better annotation quality and higher inter-annotator agreement than existing work. It consists of 3,700 sentences balanced among topics and outlets,  containing media bias labels on the word and sentence level.
  • MBIC  (Media Bias Including Characteristics) is a matrix-based methodology to crowdsource such data using a self-developed annotation platform and a dataset of the first sample of 1,700 statements representing various media bias instances. The dataset aims to fill in current media bias detection research to create a robust, representative, and diverse dataset containing biased words and sentences annotated. In particular, existing datasets do not control for the individual background of annotators, which may affect their assessment and, thus, represents critical information for contextualizing their annotations.
  • CDCR-GLUE is a collection of diverse datasets for cross-document coreference resolution (CDCR) that aims at evaluating the models on understanding different types and strengths of coreference relations, which often can be affected by the context-specific relations due to bias by word choice and labeling.
  • MBIB is the first Media Bias Identification Benchmark task and dataset collection, i.e.,  a comprehensive benchmark that groups different types of media bias (e.g., linguistic, cognitive, political) under a common framework to test how prospective detection techniques generalize. After reviewing 115 datasets, we select nine tasks and carefully propose 22 associated datasets for evaluating media bias detection techniques. We evaluate MBIB using state-of-the-art Transformer techniques (e.g., T5, BART).

Our group has been awarded a 3-year research grant by the Heidelberger Akademie der Wissenschaften for our interdisciplinary research on media bias.

Check out our ongoing projects that are available as B.Sc./M.Sc. theses here.

Publications

  1. What’s in the News? Towards Identification of Bias by Commission, Omission, and Source Selection (COSS)
    A. Zhukova, T. L. Ruas, F. Hamborg, K. Donnay, and B. Gipp
    in 2023 ACM/IEEE Joint Conference on Digital Libraries (JCDL), 2023.
    PDF
  2. Introducing MBIB – the first Media Bias Identification Benchmark Task and Dataset Collection
    M. Wessel, T. Horych, T. Ruas, A. Aizawa, B. Gipp, and T. Spinde
    in Proceedings of 46th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR ’23), New York, NY, USA, 2023.
    PDF
  3. A Domain-adaptive Pre-training Approach for Language Bias Detection in News
    D. Krieger, T. Spinde, T. Ruas, J. Kulshrestha, and B. Gipp
    in 2022 ACM/IEEE Joint Conference on Digital Libraries (JCDL), 2022.
    PDF
  4. Towards Evaluation of Cross-document Coreference Resolution Models Using Datasets with Diverse Annotation Schemes
    A. Zhukova, F. Hamborg, and B. Gipp
    in Proceedings of the 13th Language Resources and Evaluation Conference, 2022.
    PDF
  5. Towards Automated Frame Analysis: Natural Language Processing Techniques to Reveal Media Bias in News Articles
    F. Hamborg
    PhD Thesis, University of Konstanz, Dept. of Computer and Information Science, 2022.
    PDF
  6. XCoref: Cross-document Coreference Resolution in the Wild
    A. Zhukova, F. Hamborg, K. Donnay, and B. Gipp,
    in Proceedings of the iConference 2022, 2022.
    PDF
  7. Exploiting Transformer-based Multitask Learning for the Detection of Media Bias in News Articles
    T. Spinde, J. Krieger, T. Ruas, J. Mitrovic, F. Goetz-Hahn, A. Aizawa, and B. Gipp
    in Proceedings of the iConference 2022, 2022
    PDF
  8. Towards Target-dependent Sentiment Classification in News Articles
    F. Hamborg, K. Donnay, and B. Gipp
    in Proceedings of the iConference 2021, 2021
    PDF
  9. Newsalyze: Effective Communication of Person-Targeting Biases in News Articles
    F. Hamborg, K. Heinser, A. Zhukova, K. Donnay, and B. Gipp
    in Proceedings of the ACM/IEEE Joint Conference on Digital Libraries (JCDL), 2021.
  10. Identification of Biased Terms in News Articles by Comparison of Outlet-specific Word Embeddings
    T. Spinde, F. Hamborg, L. Rudnitckaia, and B. Gipp
    in Proceedings of the iConference 2021, 2021.
    PDF
  11. MBIC – A Media Bias Annotation Dataset Including Annotator Characteristics
    T. Spinde, L. Rudnitckaia, K. Sinha, F. Hamborg, B. Gipp, and K. Donnay
    in Proceedings of the 16th International Conference (iConference 2021), 2021.
    PDF
  12. How Can the Perception Of Media Bias in News Articles Be Objectively Measured
    T. Spinde, C. Kreuter, W. Gaissmaier, F. Hamborg, B. Gipp, and H. Giese
    in Proceedings of the of the ACM/IEEE Joint Conference on Digital Libraries (JCDL), 2021.
    PDF
  13. Automated identification of bias inducing words in news articles using linguistic and context-oriented features
    T. Spinde, L. Rudnitckaia, J. Mitrovic, F. Hamborg, M. Granitzer, B. Gipp, and K. Donnay
    Information Processing & Management, vol. 58, iss. 3, 2021.
    PDF
  14. Towards A Reliable Ground-Truth For Biased Language Detection
    T. Spinde, D. Krieger, M. Plank, and B. Gipp
    in Proceedings of the ACM/IEEE Joint Conference on Digital Libraries (JCDL), 2021.
    PDF
  15. Concept Identification of Directly and Indirectly Related Mentions Referring to Groups of Persons
    A. Zhukova, F. Hamborg, K. Donnay, and B. Gipp
    in Proceedings of the iConference 2021, 2021.
    PDF
  16. Media Bias in German News Articles: A Combined Approach
    T. Spinde, F. Hamborg, and B. Gipp
    in Proceedings of the 8th International Workshop on News Recommendation and Analytics (INRA 2020), Virtual event, 2020.
    PDF
  17. An Integrated Approach to Detect Media Bias in German News Articles
    T. Spinde, F. Hamborg, and B. Gipp
    in Proceedings of the ACM/IEEE Joint Conference on Digital Libraries (JCDL), 2020.
    PDF
  18. Newsalyze: Enabling News Consumers to Understand Media Bias
    F. Hamborg, A. Zhukova, and B. Gipp
    in Proceedings of the ACM/IEEE Joint Conference on Digital Libraries (JCDL), 2020.
    PDF
  19. Enabling News Consumers to View and Understand Biased News Coverage: A Study on the Perception and Visualization of Media Bias
    T. Spinde, F. Hamborg, A. Becerra, K. Donnay, and B. Gipp
    in Proceedings of the ACM/IEEE Joint Conference on Digital Libraries (JCDL), 2020.
    PDF
  20. Automated Identification of Media Bias by Word Choice and Labeling in News Articles
    F. Hamborg, A. Zhukova, and B. Gipp
    in Proceedings of the ACM/IEEE Joint Conference on Digital Libraries (JCDL), 2019.
    PDF
  21. Illegal Aliens or Undocumented Immigrants? Towards the Automated Identification of Bias by Word Choice and Labeling
    F. Hamborg, A. Zhukova, and B. Gipp
    in Proceedings of the iConference 2019, 2019
    PDF
  22. Giveme5W1H: A Universal System for Extracting Main Events from News Articles
    F. Hamborg, C. Breitinger, and B. Gipp
    in Proceedings of the 13th ACM Conference on Recommender Systems, 7th International Workshop on News Recommendation and Analytics (INRA 2019), 2019.
    PDF
  23. Automated Identification of Media Bias in News Articles: An Interdisciplinary Literature Review
    F. Hamborg, K. Donnay, and B. Gipp
    International Journal on Digital Libraries (IJDL), 2018
    PDF
  24. Bias-aware News Analysis using Matrix-based News Aggregation
    F. Hamborg, N. Meuschke, and B. Gipp
    International Journal on Digital Libraries (IJDL), 2018
    PDF
  25. Extraction of Main Event Descriptors from News Articles by Answering the Journalistic Five W and One H Questions
    F. Hamborg, C. Breitinger, M. Schubotz, S. Lachnit, and B. Gipp
    in Proceedings of the ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL), 2018
    PDF
  26. news-please: A Generic News Crawler and Extractor
    F. Hamborg, N. Meuschke, C. Breitinger, and B. Gipp
    in Proceedings of the 15th International Symposium on Information Science, 2017
    PDF

Media Coverage

Go to Top