Introduction to Information Retrieval and Natural Language Processing

Processing natural language and retrieving information from large collections, especially the World Wide Web, has become an integral part of information systems for personal and business use cases alike. The fast-paced growth of digital collections since the 1950s gave rise to new research fields—Information Retrieval (IR) and Natural Language Processing (NLP).

This course introduces core concepts and technologies of Information Retrieval and Natural Language Processing, particularly for the Web.

Course Content

The lecture will cover the following topics:

  • Basics: background, documents, terms, vocabulary, inverted index
  • Boolean retrieval, positional retrieval, tolerant retrieval
  • Efficient index construction, index compression
  • Term weighting, relevance scoring, ranked retrieval
  • Semantic text analysis, link analysis
  • Complete retrieval systems
  • Results visualization and exploration
  • Evaluation of retrieval systems

In the exercise, students will work on applied research projects (teamwork is possible) that address complex information retrieval and natural language processing tasks. Using the programming language Python and presenting the intermediate and final results of the projects is mandatory.

After successfully completing the course, students should be able to:

  • Summarize major IR and NLP applications
  • Explain important IR and NLP algorithms and data structures
  • Determine the conceptual requirements of specific IR and NLP problems
  • Compare the suitability of algorithms and data structures for specific tasks
  • Devise solutions for complex IR and NLP tasks by implementing and adapting suitable algorithms and data structures
  • Evaluate IR and NLP methods and systems quantitatively and qualitatively

The course provides a good foundation for a bachelor’s or master’s thesis in our group. Check this page for our current theses proposals.

Requirements

  • Knowledge of at least one object-oriented programming language, preferably Python, is required.
  • For participants unfamiliar with Python, we provide a self-study course that covers all essential topics required.

Exam

  • Applied research project (includes teaser, intermediate, and final presentation) – 67% of the final grade
  • Written test (90 min.) or oral exam (approx. 20 min.) on the lecture content – 33% of the final grade

Schedule

Day Time Periodicity Duration Room Type
Wed 10:15 – 11:45 weekly 2023-10-23 – 2024-02-09 tba lecture
Wed 12:00 – 13:30 weekly  2023-10-23 – 2024-02-09 tba exercise