Our Natural Language Processing (NLP) research addresses basic research challenges and applied problems, partially in cooperation with industry partners.

Our main goal is to enable computers to understand and use natural language to support humans with numerous tasks, such as text classification, machine translation, question answering, sentiment analysis, text summarization, text generation, language modeling, dialog systems, and word sense disambiguation.

OriginStamp is a web-based, trusted timestamping service that uses the decentralized Bitcoin blockchain to store anonymous, tamper-proof timestamps for any digital content. OriginStamp allows users to hash files, emails, or plain text, and subsequently, store the created hashes in the Bitcoin blockchain as well as retrieve and verify timestamps that have been committed to the blockchain. OriginStamp is free of charge and easy to use and thus allows anyone, e.g., students, researchers, authors, journalists, or artists, to prove that they were the originator of certain information at a given point in time.

HyPlag is a system that implements hybrid plagiarism detection (hybridPD) – a novel approach capable of detecting also heavily disguised plagiarism in academic texts. The hybridPD approach combines the analysis of non-textual content in academic documents, such as citations, images, and mathematical expressions, with traditional text similarity analysis. Existing plagiarism detection software only examines text similarity, and thus typically fails to detect disguised plagiarism forms, including paraphrases, translations, or idea plagiarism. hybridPD addresses this shortcoming by additionally analyzing non-textual content to form a language-independent semantic “fingerprint” of document similarity.

The hybridPD approach implemented in HyPlag integrates and continues several of our previous research projects, particularly on Citation-based Plagiarism Detection (CbPD)
and Mathematics-based Plagiarism Detection (MathPD).

The following group of projects seeks to (semi-)automatically identify slanted news coverage, i.e., media bias, in news articles. Current projects include news-please (an integrated web crawler and information extractor for news articles), NewsBird (a news aggregator that reveals different perspectives in international news topics), and Giveme5W1H (a system that extracts phrases answering the journalistic 5W1H questions).

As part of the DFG-funded research project GI 1259/1-1: Methods and Tools to Advance the Retrieval of Mathematical Knowledge from Digital Libraries for Search-, Recommendation- and Assistance-Systems, we investigate fundamental methods and tools for making mathematical knowledge accessible to information retrieval tools.