Doctoral Thesis Norman Meuschke
Below you find the data, source code, and other resources for the doctoral thesis:
Analyzing Non-Textual Content Elements to Detect Academic Plagiarism
Norman Meuschke, University of Konstanz, 2021.
Grade: summa cum laude
Winner of the Airbus Research Award “Claude Dornier” 2022
PDF | DOI | BibTeX | Springer Book
Ceremony for Airbus Research Award “Claude Dornier” 2022
Doctoral Defense
Slides
Click on the image to download the slides for the defense talk (PDF, 7 MB)
Data & Source Code
Hybrid Plagiarism Detection System HyPlag
-
- Demo system (user: | pw: hybridPD)
- Source code (login to GitHub first! user: hyplag-guest | pw: hybridPD20)
Citation-based Plagiarism Detection
-
- Source Code: see HyPlag source code above
- Data:
-
-
- Reference collection: 185,170 documents from PMC OAS collection, provided as part of the CITREC dataset (5 GB zipped, ~20 GB raw) — includes document metadata, citation data and pre-computed similarity scores
- User-perceived cases of plagiarism (available upon request)
-
Image-based Plagiarism Detection
- Source Code
- Data: 15 test cases embedded into a reference collection of 10,000 images extracted from PMC OAS documents (547 MB zipped)
Mathematics-based Plagiarism Detection
-
- Source Code: see HyPlag source code above
- Data:
-
-
- Test cases: 10 confirmed cases of plagiarism available as PDF and TEI
(login to GitHub first! user: hyplag-guest | pw: hybridPD20) - Reference collection: 105,120 arXiv documents converted to XHMTL
- Test cases: 10 confirmed cases of plagiarism available as PDF and TEI
-