Doctoral Thesis Norman Meuschke

Doctoral Defense
Slides
Click on the image to download the slides for the defense talk (PDF, 7 MB)
Data & Source Code
Hybrid Plagiarism Detection System HyPlag
-
- Demo system (user: guest@hyplag.org | pw: hybridPD)
- Source code (login to GitHub first! user: hyplag-guest | pw: hybridPD20)
Citation-based Plagiarism Detection
-
- Source Code: see HyPlag source code above
- Data:
-
-
- Reference collection: 185,170 documents from PMC OAS collection, provided as part of the CITREC dataset (5 GB zipped, ~20 GB raw) — includes document metadata, citation data and pre-computed similarity scores
- User-perceived cases of plagiarism (available upon request)
-
Image-based Plagiarism Detection
- Source Code
- Data: 15 test cases embedded into a reference collection of 10,000 images extracted from PMC OAS documents (547 MB zipped)
Mathematics-based Plagiarism Detection
-
- Source Code: see HyPlag source code above
- Data:
-
-
- Test cases: 10 confirmed cases of plagiarism available as PDF and TEI
(login to GitHub first! user: hyplag-guest | pw: hybridPD20) - Reference collection: 105,120 arXiv documents converted to XHMTL
- Test cases: 10 confirmed cases of plagiarism available as PDF and TEI
-