These lab sessions are designed to help you follow along with the contents presented during the lectures, and introduce you to the skills and tools needed to complete the final projects.
The lab sessions will be a mix of tutorials and exercises. The tutorials will present modern frameworks and tools to implement advanced NLP analyses and pipelines. The exercises are designed to teach you the skills needed for final projects. Here is a brief overview of the schedule:
Some notes:
-
The core contents are covered in the first few weeks of the course to kickstart your work. Exercise sessions are dropped from week 6 onwards to allow you to focus on the final project.
-
Participation to the lab sessions is highly encouraged, as they cover fundamental notions for the assignment portfolios and the final projects. Instructors will be available to answer questions and provide guidance.
The lab sessions make use of the Jupyter environment. You can use the following links to get started:
Alternatively, it is possible to use the notebooks via the Google Colab web environment simply by clicking on the button at the beginning of each notebook. If you’re running on Windows, we recommend following along using a Colab notebook. If you’re using a Linux distribution or macOS, you can use either approach described here. For an intro to the Colab environment, refer to:
Since the lab session will introduce you to OSS libraries such as spaCy, Stanza, Scikit-learn, 🤗 Transformers and 🤗 Datasets, most of the first few sessions' contents are adapted from official tutorials and docs. Here is a non-exhaustive list of the most relevant sources for additional reference:
- Advanced NLP with spaCy
- Stanza tutorials
- spaCy Linguistic Features
- HuggingFace Course, Chapter 1
- HuggingFace Transformers Docs
- HuggingFace Datasets Docs
- Scikit-learn "Working with Text Data" Tutorial
- NLP class materials by Dirk Hovy
- HuggingFace "How to Generate" Tutorial
- A Gentle Introduction to 8-bit Matrix Multiplication for transformers at scale using Hugging Face Transformers, Accelerate and bitsandbytes
- HuggingFace PEFT: Parameter-Efficient Fine-Tuning of Billion-Scale Models on Low-Resource Hardware
The file requirements.txt in this repository contains the list of all the packages required to run the lab sessions. You can create a Python virtual environment (Python>=3.6) and install them using the following command:
python -m venv venv
source venv/bin/activate
pip install -r requirements.txtMake sure the virtual environment is activated before running Jupyter. If you are using Colab, simply run the cell at the beginning of each notebook to install the required packages. Refer to Using a Python Virtual Environment for more details on how to create and activate a virtual environment. Alternatively, you can use Poetry to manage the dependencies.
For any troubleshooting, please consult the FAQ before asking for help. You are encouraged to contribute to it by adding your solutions!
These lab sessions have received contributions by several teachers & teaching assistants, listed here and below (under TA Alumni). We are particularly grateful to Gabriele Sarti for building the first version of this repo.
![]() |
Arianna Bisazza is an Associate Professor in Computational Linguistics and Natural Language Processing at the Computational Linguistics Group of the University of Groningen. She is passionate about the study of human languages, how they differ from each other, and how they can be modeled by computational tools. Her primary interest is in the development of language technologies supporting a large variety of languages around the world. She is also interested in the new knowledge that computational models can reveal about the nature of language. |
![]() |
Jirui Qi is a PhD student in the Computational Linguistics Group of the University of Groningen. He is part of the Dutch consortium LESSEN, and his research mainly focuses on low-resource conversational generation, the generalization of factual knowledge across languages, and prompt-based learning for classification. |
![]() |
Leo Zotos is a PhD student in the Computational Linguistics Group of the University of Groningen. He works on the intersection between language modelling and human learning with a focus on multifaceted event understanding. The current focus is on multiple choice assessment methods and how these tests can be better designed to improve long term retention. |
![]() |
Francesca Padovani is a 2nd year PhD student in the Computational Linguistics Group of the University of Groningen. Her project is part of the Polyglot Machines VIDI Grant, working at the intersection between Language Acquisition, Linguistic Theory and Computational Modelling. |
Please open as issue here on Github! This is the second year we are using these contents for the course and although most of them come from battle-tested online tutorials, we are always looking for feedback and suggestions.
We thank our past students Georg Groenendaal, Robin van der Noord, Ayça Avcı and Remco Leijenaar for their contributions in spotting errors in the course materials.
Teaching Assistants Alumni
| 2022–2025 | |
|
|
Gabriele Sarti is a postdoc in the BauLab at Northeastern University, working on the National Deep Inference Fabric (NDIF) project to empower interpretability researchers with intuitive and extensible interfaces. Previously, he was a PhD student at the University of Groningen, where he completed his thesis on actionable interpretability for machine translation as a member of the InCLoW team, the GroNLP group, and the Dutch InDeep consortium. His supervisors were Arianna Bisazza, Malvina Nissim, and Grzegorz Chrupała. Before that, he was an applied scientist intern at Amazon Translate NYC, a research scientist at Aindo, and a Data Science MSc student at the University of Trieste, where he helped found the AI Student Society. His research aims to translate theoretical advances in language models interpretability into actionable insights for improving trustworthiness and human-AI collaboration. |
| 2023 | |
|
|
Ludwig Sickert was an MSc candidate in AI at the University of Groningen. He attended the IK-NLP course in 2022 and worked on interpreting formality in machine translation systems for his master thesis under the supervision of Gabriele and Arianna. He served as TA for the 2023 edition of the course. |
| 2022 | |
|
|
Anjali Nair was an MSc candidate in AI at the University of Groningen. She served as teaching assistant for the 2022 edition of the course. |



