Introduction to Information Retrieval and Natural Language Processing
Retrieving information from large collections, especially the World Wide Web, has become an integral part of information systems for personal and business use cases alike.
The need to organize vast amounts of information for effective retrieval long predates the World Wide Web and even computers. Traditional libraries were the birthplace for many techniques to effectively organize and retrieve information. The introduction of computers redefined our methods of storing, accessing and searching for information and gave rise to a new research field – Information Retrieval. The diversity of information on the World Wide Web introduced new retrieval tasks, which triggered the advancement of traditional and the creation of new information retrieval technologies.
This course introduces core concepts and technologies of both traditional information retrieval as well as information retrieval on the Web.

The lecture will cover the following topics:
- Basics: Background, Documents, Terms, Vocabulary, Inverted Index
- Boolean Retrieval, Positional Retrieval, Tolerant Retrieval
- Efficient Index Construction, Index Compression
- Term Weighting, Relevance Scoring, Ranked Retrieval
- Semantic Text Analysis, Link Analysis
- Complete Retrieval Systems
- Results Visualization and Exploration
- Evaluation of Retrieval Systems
By completing the course, the participants will get to know the predominant information retrieval tasks, e.g., Web search and recommendation. The participants will understand the conceptual requirements of specific retrieval tasks and be able to devise retrieval approaches consisting of suitable data structures and algorithms to address these tasks. The participants will be able to critically evaluate the strengths and weaknesses of retrieval approaches and to prototypically implement suitable retrieval approaches to solve complex practical information retrieval problems.
Requirements
- Knowledge of at least one object-oriented programming language, preferably Python, is required.
- Python is used as part of the exercise sessions. For participants who are unfamiliar with Python we provide a free introduction course covering all essential topics required https://isgroup.atlassian.net/wiki/spaces/STUD/pages/1682899068/Python+Basics.
Exam
- Successful completion of programming projects (includes teaser, intermediate, and final presentation)
- Final exam (oral or written)
- The final exam can include up to 50% of its questions about the programming project.
Time schedule
Day | Time | Periodicity | Duration | Room | Type |
---|---|---|---|---|---|
Wed | 10:15 – 11:45 | weekly | 06.04.2022 – 13.07.2022 | SUB | lecture |
tba | tba | weekly | 06.04.2022 – 13.07.2022 | SUB | exercise |
The exercise sessions will focus on individual programming projects (teamwork is possible) that will address complex information retrieval tasks.
Using the programming language Python and presenting the intermediate and final results of the projects is mandatory.
The course provides a good foundation for a bachelor’s or master’s thesis in our group.
Visit https://gipplab.org/students-corner/graduation-projects for our current theses proposals.