Co-Citation Proximity Analysis (CPA) is a method to compute both local and global instances of semantic similarity in academic documents by examining citation proximity in the full texts of documents.
CPA was developed with two applications in mind: recommender systems and clustering. Regarding the first application, an improved measure of document semantic similarity, which computes similarity at a more fine-grained resolution, has the potential to significantly improve the relevance of academic literature recommendations. Regarding the second application, a more granular measure of document similarity allows the development of more precise clustering algorithms for academic literature.
The CPA approach is an advancement of the well-known and widespread co-citation analysis. However, in addition to co-citation analysis, CPA was the first approach that proposed using modified weights based on the proximity of co-citations to each other within an article’s full text. The underlying idea is that the closer citations are to each other in the full-text of documents, the more likely they are related.
In comparison to existing approaches, like bibliographic coupling, co-citation analysis or keyword-based similarity computations, CPA achieves a higher precision and offers the possibility to pinpoint related chapters, sections or paragraphs within the texts of academic documents. Moreover, CPA allows a more precise automatic document classification.