Welcome to the IK-NLP Course! 🎉

These lab sessions are designed to help you follow along with the contents presented during the lectures, and introduce you to the skills and tools needed to complete the final projects.

What to expect?

The lab sessions will be a mix of tutorials and exercises. The tutorials will present modern frameworks and tools to implement advanced NLP analyses and pipelines. The exercises are designed to teach you the skills needed for final projects. Here is a brief overview of the schedule:

Week	Lab Tutorial	Lab Exercise
1	· Intro, Setup work environment and team creation · Start Intro to 🤗 Transformers and Datasets	-
2	Implement Transformer Architecture	🤗 Pipelines & Sentence Transformers for semantic search and QA
3	Natural Language Generation with 🤗 Transformers	Training a BPE tokenizer
4	Linguistic analysis	Combining Textual and Non-textual Features in NLP Models
5	· Intro to the Hábrók cluster · Fine-tuning and Inference with 🤗 Transformers	Analyzing language generation models with Inseq 🐛
6	Advanced Prompting and Generation with 🤗 Transformers	-
7	Final Project Progress Report	-

Some notes:

The core contents are covered in the first few weeks of the course to kickstart your work. Exercise sessions are dropped from week 6 onwards to allow you to focus on the final project.
Participation to the lab sessions is highly encouraged, as they cover fundamental notions for the assignment portfolios and the final projects. Instructors will be available to answer questions and provide guidance.

Tools and Frameworks

The lab sessions make use of the Jupyter environment. You can use the following links to get started:

Alternatively, it is possible to use the notebooks via the Google Colab web environment simply by clicking on the button at the beginning of each notebook. If you’re running on Windows, we recommend following along using a Colab notebook. If you’re using a Linux distribution or macOS, you can use either approach described here. For an intro to the Colab environment, refer to:

Colab Quickstart

Since the lab session will introduce you to OSS libraries such as spaCy, Stanza, Scikit-learn, 🤗 Transformers and 🤗 Datasets, most of the first few sessions' contents are adapted from official tutorials and docs. Here is a non-exhaustive list of the most relevant sources for additional reference:

The file requirements.txt in this repository contains the list of all the packages required to run the lab sessions. You can create a Python virtual environment (Python>=3.6) and install them using the following command:

python -m venv venv
source venv/bin/activate
pip install -r requirements.txt

Make sure the virtual environment is activated before running Jupyter. If you are using Colab, simply run the cell at the beginning of each notebook to install the required packages. Refer to Using a Python Virtual Environment for more details on how to create and activate a virtual environment. Alternatively, you can use Poetry to manage the dependencies.

For any troubleshooting, please consult the FAQ before asking for help. You are encouraged to contribute to it by adding your solutions!

About us

These lab sessions have received contributions by several teachers & teaching assistants, listed here and below (under TA Alumni). We are particularly grateful to Gabriele Sarti for building the first version of this repo.

	Arianna Bisazza is an Associate Professor in Computational Linguistics and Natural Language Processing at the Computational Linguistics Group of the University of Groningen. She is passionate about the study of human languages, how they differ from each other, and how they can be modeled by computational tools. Her primary interest is in the development of language technologies supporting a large variety of languages around the world. She is also interested in the new knowledge that computational models can reveal about the nature of language.
	Jirui Qi is a PhD student in the Computational Linguistics Group of the University of Groningen. He is part of the Dutch consortium LESSEN, and his research mainly focuses on low-resource conversational generation, the generalization of factual knowledge across languages, and prompt-based learning for classification.
	Leo Zotos is a PhD student in the Computational Linguistics Group of the University of Groningen. He works on the intersection between language modelling and human learning with a focus on multifaceted event understanding. The current focus is on multiple choice assessment methods and how these tests can be better designed to improve long term retention.
	Francesca Padovani is a 2nd year PhD student in the Computational Linguistics Group of the University of Groningen. Her project is part of the Polyglot Machines VIDI Grant, working at the intersection between Language Acquisition, Linguistic Theory and Computational Modelling.

You see something wrong or missing?

Please open as issue here on Github! This is the second year we are using these contents for the course and although most of them come from battle-tested online tutorials, we are always looking for feedback and suggestions.

We thank our past students Georg Groenendaal, Robin van der Noord, Ayça Avcı and Remco Leijenaar for their contributions in spotting errors in the course materials.

Teaching Assistants Alumni

2022–2025
	Gabriele Sarti is a postdoc in the BauLab at Northeastern University, working on the National Deep Inference Fabric (NDIF) project to empower interpretability researchers with intuitive and extensible interfaces. Previously, he was a PhD student at the University of Groningen, where he completed his thesis on actionable interpretability for machine translation as a member of the InCLoW team, the GroNLP group, and the Dutch InDeep consortium. His supervisors were Arianna Bisazza, Malvina Nissim, and Grzegorz Chrupała. Before that, he was an applied scientist intern at Amazon Translate NYC, a research scientist at Aindo, and a Data Science MSc student at the University of Trieste, where he helped found the AI Student Society. His research aims to translate theoretical advances in language models interpretability into actionable insights for improving trustworthiness and human-AI collaboration.
2023
	Ludwig Sickert was an MSc candidate in AI at the University of Groningen. He attended the IK-NLP course in 2022 and worked on interpreting formality in machine translation systems for his master thesis under the supervision of Gabriele and Arianna. He served as TA for the 2023 edition of the course.
2022
	Anjali Nair was an MSc candidate in AI at the University of Groningen. She served as teaching assistant for the 2022 edition of the course.

Name		Name	Last commit message	Last commit date
Latest commit History 56 Commits
img		img
notebooks		notebooks
FAQ.md		FAQ.md
Habrok.md		Habrok.md
LICENSE		LICENSE
README.md		README.md
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Welcome to the IK-NLP Course! 🎉

What to expect?

Tools and Frameworks

About us

You see something wrong or missing?

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Welcome to the IK-NLP Course! 🎉

What to expect?

Tools and Frameworks

About us

You see something wrong or missing?

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages