MathIR2023-05-08T13:57:14+02:00

Mathematical Information Retrieval (MathIR)

As part of the DFG-funded research project GI 1259/1-1

Methods and Tools to Advance the Retrieval of Mathematical Knowledge from Digital Libraries for Search-, Recommendation- and Assistance-Systems,

we investigate fundamental methods and tools for making mathematical knowledge accessible to information retrieval tools.

Achieving this goal requires methods to reliably extract mathematical knowledge from documents. In the domain of natural language processing (NLP), a number of well-established, general purpose text processing methods and tools exist that are applied to a text to enable domain specific extraction tasks. Similar to state-of-the-art text processing tools, such as the Stanford NLP toolkit, our research will determine how similar tools for processing mathematical language can be realized.

Our approach is to expand upon the concept of Mathematical Language Processing (MLP), a concept for which we have already demonstrated its feasibility when we presented it at the ACM SIGIR conference in 2016 (link to paper). In the context of this project, we expand upon our preliminary research to make the approach more effective and applicable for real world mathematical information retrieval applications. Specifically, the project has the following objectives:

  1. Identify mathematical formulae and expressions in documents, and reliably differentiate them from similar or neighboring structures.
  2. Perform type detection and tokenization of mathematical expressions.
  3. Extract the corresponding mathematical concepts from the tokenized mathematical formulae and expressions.

Our goal is enabling other scientists to use our methods and tools for mathematical language processing to tackle their own novel problems. We hope that MLP will continue to improve during this process, as was once the case for early NLP approaches.

A wide variety of applications would benefit from advancements to mathematical information retrieval. In the STEM disciplines, improvements could be made to academic literature search, literature recommendation, and even plagiarism prevention. Additionally, expert search or applications in pure mathematics, such as theorem search or definition lookup, would significantly benefit from our developments. Applications beyond STEM fields include the improvement of tutoring assistance tools, as well as patent search and enterprise search, which could become more valuable to companies if they integrate math-aware information retrieval methods.

RELATED PUBLICATIONS

2020

  1. Discovering Mathematical Objects of Interest – a Study of Mathematical Notations
    A Greiner-Petter, M Schubotz, F Müller, C Breitinger, HS Cohl, A Aizawa, B Gipp
    Proceedings of the Web Conference 2020 (WWW’20), April 20–24, 2020, Taipei, Taiwan
    DOI: 10.1145/3366423.3380218  Preprint  Core Rank A*

2019

  1. Improving Academic Plagiarism Detection for STEM Documents by Analyzing Mathematical Content and Citations
    N Meuschke, V Stange, M Schubotz, M Kramer, B Gipp
    Proceedings of the Annual International ACM/IEEE Joint Conference on Digital Libraries (JCDL)
    DOI: 10.1109/JCDL.2019.00026 Preprint   Core Rank A*
  2. AnnoMath TeX- a Formula Identifier Annotation Recommender System for STEM Documents
    P Scharpf, I Mackerracher, M Schubotz, J Beel, C Breitinger, B Gipp
    Proceedings of the 13th ACM Conference on Recommender Systems 2019, Copenhagen, Denmark, September 16-20, 2019
    DOI: 10.1145/3298689.3347042  Preprint  Bibtex  Homepage  Core Rank B
  3. Why Machines Cannot Learn Mathematics, Yet
    A Greiner-Petter, T Ruas, M Schubotz, A Aizawa, W Grosky, B Gipp
    4th Joint Workshop on Bibliometric-Enhanced Information Retrieval and Natural Language Processing for Digital Libraries co-located with the 42nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval
    PDF
  4. Semantic Preserving Bijective Mappings for Expressions Involving Special Functions in Computer Algebra Systems and Document Preparation Systems
    A Greiner-Petter, M Schubotz, HS Cohl, B Gipp
    Aslib Journal of Information Management
    DOI: 10.1108/AJIM-08-2018-0185  Preprint  Bibtex
  5. Towards Formula Concept Discovery and Recognition
    P Scharpf, M Schubotz, HS Cohl, B Gipp
    Proceedings of the 4th Joint Workshop on Bibliometric-Enhanced Information Retrieval and Natural Language Processing for Digital Libraries (BIRNDL 2019) co-located with the 42nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Paris, France, July 25, 2019.
    PDF  Preprint  Bibtex
  6. Forms of Plagiarism in Digital Mathematical Libraries
    M Schubotz, O Teschke, V Stange, N Meuschke, B Gipp
    Proceedings International Conference on Intelligent Computer Mathematics
    DOI: 10.1007/978-3-030-23250-4_18  Preprint

2018

  1. Improving the Representation and Conversion of Mathematical Formulae by Considering their Textual Context
    M Schubotz, A Greiner-Petter, P Scharpf, N Meuschke, HS Cohl, B Gipp
    Proceedings of the 18th ACM/IEEE on Joint Conference on Digital Libraries (JCDL)
    DOI: 10.1145/3197026.3197058  Preprint  Bibtex  Core Rank A*
  2. HyPlag: A Hybrid Approach to Academic Plagiarism Detection
    N Meuschke, V Stange, M Schubotz, B Gipp
    Proceedings of the 41st International ACM SIGIR Conference on Research & Development in Information Retrieval
    DOI: 10.1145/3209978.3210177  Preprint  Bibtex  Core Rank A*.
  3. Automated Symbolic and Numerical Testing of DLMF Formulae Using Computer Algebra Systems
    HS Cohl, A Greiner-Petter, M Schubotz
    Intelligent Computer Mathematics – 11th International Conference, CICM 2018, Hagenberg, Austria, August 13-17, 2018, Proceedings
    DOI: 10.1007/978-3-319-96812-4_4  Bibtex
  4. MathTools: An open API for convenient MathML handling
    A Greiner-Petter, M Schubotz, HS Cohl, B Gipp
    Intelligent Computer Mathematics – 11th International Conference, CICM 2018, Hagenberg, Austria, August 13-17, 2018, Proceedings
    DOI: 10.1007/978-3-319-96812-4_9  Bibtex
  5. Towards Formula Translation Using Recursive Neural Networks
    F Petersen, M Schubotz, B Gipp
    Proceedings of the 11th Conference on Intelligent Computer Mathematics (CICM)
    PDF  Preprint  Bibtex
  6. Representing Mathematical Formulae in Content MathML Using Wikidata
    P Scharpf, M Schubotz, B Gipp
    Proceedings of the 3rd joint workshop on bibliometric-enhanced information retrieval and natural language processing for digital libraries (BIRNDL 2018) co-located with the 41st international ACM SIGIR conference on research and development in information retrieval (SIGIR 2018), ann arbor, USA, july 12, 2018.
    PDF  Preprint  Bibtex
  7. Generating OpenMath Content Dictionaries from Wikidata
    M Schubotz
    Joint Proceedings of the CME-EI, FMM, CAAT, FVPS, M3SRD, OpenMath Workshops, Doctoral Program and Work in Progress at the Conference on Intelligent Computer Mathematics 2018 co-located with the 11th Conference on Intelligent Computer Mathematics (CICM 2018)
    DOI: 10.5281/zenodo.1409946  Preprint
  8. Mathematische Formeln in Wikipedia
    M Schubotz
    Beiträge zum Mathematikunterricht 2018
    DOI: 10.17877/de290r-19676  Preprint  Bibtex
  9. Introducing MathQA – a Math-Aware Question Answering System
    M Schubotz, P Scharpf, K Dudhat, Y Nagar, F Hamborg, B Gipp
    Proceedings of the Annual International ACM/IEEE Joint Conference on Digital Libraries (JCDL), Workshop on Knowledge Discovery
    DOI: 10.1108/IDD-06-2018-0022  Preprint  Bibtex

MEDIA COVERAGE

Go to Top