CitePlag is the first prototype of a citation-based Plagiarism Detection (CbPD) System. The prototype was recently demonstrated at the SIGIR conference 2013.
What makes CitePlag novel?
In contrast to existing text-based approaches to plagiarism detection, CitePlag does not solely analyze literal text matches to determine document suspiciousness – but rather, CitePlag makes use of the unique citation placement in the full-text of documents to determine similarity and detect potential plagiarism.
In examining citation placement, position, and order, CitePlag forms a text-independent / and even language-barrier transcending “fingerprint” of the semantic content of documents, which can then be used to detect potential unoriginality and plagiarism.
CitePlag has come a long way from its humble beginnings in 2010, when we proposed the first citation-based approach to detect semantic similarity between documents for use in plagiarism detection. A year later, we developed the algorithms, and today we have a working prototype available for public use.
CitePlag now has a new user interface with improved functionality.
You can:
- upload your own files (PDF/ text documents)
- examine recent plagiarism findings and examples of retracted plagiarism cases
- compare any two publications from the Open Access subset of PubMed’s database (200,000+ medical publications)
Test the CitePlag prototype for yourself at its new home on the web: http://citeplag.org/
If you’re curious about the project, see our related publications or the doctoral thesis of Bela Gipp, which narrows in on all aspects of Citation-based Plagiarism Detection.