Introduction to Information Retrieval and Natural Language Processing

Retrieving information from large collections, especially the World Wide Web, has become an integral part of information systems for personal and business use cases alike.
The need to organize vast amounts of information for effective retrieval long predates the World Wide Web and even computers. Traditional libraries were the birthplace for many techniques to effectively organize and retrieve information. The introduction of computers redefined our methods of storing, accessing and searching for information and gave rise to a new research field – Information Retrieval. The diversity of information on the World Wide Web introduced new retrieval tasks, which triggered the advancement of traditional and the creation of new information retrieval technologies.
This course introduces core concepts and technologies of both traditional information retrieval as well as information retrieval on the Web.

The lecture will cover the following topics:

  • Basics: Background, Documents, Terms, Vocabulary, Inverted Index
  • Boolean Retrieval, Positional Retrieval, Tolerant Retrieval
  • Efficient Index Construction, Index Compression
  • Term Weighting, Relevance Scoring, Ranked Retrieval
  • Semantic Text Analysis, Link Analysis
  • Complete Retrieval Systems
  • Results Visualization and Exploration
  • Evaluation of Retrieval Systems

By completing the course, the participants will get to know the predominant information retrieval tasks, e.g., Web search and recommendation. The participants will understand the conceptual requirements of specific retrieval tasks and be able to devise retrieval approaches consisting of suitable data structures and algorithms to address these tasks. The participants will be able to critically evaluate the strengths and weaknesses of retrieval approaches and to prototypically implement suitable retrieval approaches to solve complex practical information retrieval problems.

Requirements

Exam

  • Successful completion of programming projects (includes teaser, intermediate, and final presentation)
  • Final exam (oral or written)
  • The final exam can include up to 50% of its questions about the programming project.

Time schedule

Day Time Periodicity Duration Room Type
Wed 10:15 – 11:45 weekly 06.04.2022 – 13.07.2022 SUB lecture
tba tba weekly 06.04.2022 – 13.07.2022 SUB exercise

The exercise sessions will focus on individual programming projects (teamwork is possible) that will address complex information retrieval tasks.
Using the programming language Python and presenting the intermediate and final results of the projects is mandatory.

view on uni-vz

The course provides a good foundation for a bachelor’s or master’s thesis in our group. 
Visit https://gipplab.org/students-corner/graduation-projects for our current theses proposals.