Skip to content

nicolaiberk/llm_ws

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

117 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Crashcourse: LLMs for Social Scientists

This workshop provides an introduction to fundamentals of NLP and LLMs for social scientists. It covers basic and advanced text representation, fundamentals of machine learning, transformer architectures, and applied questions regarding LLMs. The course consists of twelve 90-minute sessions, mostly divided into a lecture with a conceptual focus, and a tutorial covering implementation in python. The course is designed to provide a fast overview of major topics in the application of LLMs. It covers most content rather superficially, aiming to provide students with a good intuition of each concept and code as a starting point to implement their own ideas.

Day 1

Representing Text

Session 1

Intro to Python & Text Representation

🖥️ Lecture Slides

🧑‍💻 Tutorial 1: Intro to Python

🧑‍💻 Tutorial 2: Pandas & basic text representation

Further Reading

  • Introduction to text representation for social scientists: Grimmer, J., Roberts, M. E., & Stewart, B. M. (2022). Bag of Words. In Text as data: A new framework for machine learning and the social sciences. Princeton University Press.
  • pandas cheatsheet
  • Regular Expressions Cheatsheet

Session 2

Embeddings

🖥️ Lecture Slides

🧑‍💻 Tutorial 1: Intro to embedding manipulation with gensim

🧑‍💻 Tutorial 2: Scaling Word Embeddings & Document Embeddings

Further Reading

Foundational Papers

  • Word Embeddings: Mikolov, Tomas, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. “Efficient Estimation of Word Representations in Vector Space.” arXiv Preprint arXiv:1301.3781.
  • Document Embeddings: Le, Quoc, and Tomas Mikolov. 2014. “Distributed Representations of Sentences and Documents.” In International Conference on Machine Learning, 1188–96. PMLR.
  • Embedding Regression: Rodriguez, Pedro L, Arthur Spirling, and Brandon M Stewart. 2023. “Embedding Regression: Models for Context-Specific Description and Inference.” American Political Science Review 117 (4): 1255–74.

Social Science Applications

  • Studying word Meaning with Embeddings: Kozlowski, Austin C, Matt Taddy, and James A Evans. 2019. “The Geometry of Culture: Analyzing the Meanings of Class Through Word Embeddings.” American Sociological Review 84 (5): 905–49.
  • Measuring Bias and Stereotypes with Word Embeddings: Kroon, Anne C, Damian Trilling, and Tamara Raats. 2021. “Guilty by Association: Using Word Embeddings to Measure Ethnic Stereotypes in News Coverage.” Journalism & Mass Communication Quarterly 98 (2): 451–77.
  • Scaling Representatives with Document Embeddings: Rheault, Ludovic, and Christopher Cochrane. 2020. “Word Embeddings for the Analysis of Ideological Placement in Parliamentary Corpora.” Political Analysis 28 (1): 112–33.
  • Studying over-time Changes in Word Meaning: Rodman, Emma. 2020. “A Timely Intervention: Tracking the Changing Meanings of Political Concepts with Word Vectors.” Political Analysis 28 (1): 87–111.

Day 2

Machine Learning

Session 1

Intro to Supervised Machine Learning

🖥️ Lecture Slides

🧑‍💻 Tutorial 1: Intro to Supervised Machine Learning with scikit-learn

🧑‍💻 Tutorial 2: Hackathon

Further Reading

Session 2

Subword Tokenization, Attention & the Transformer Architecture

🖥️ Lecture Slides

🧑‍💻 Tutorial 1: Contextualized Embeddings, Tokenization, and Inference with Transformers

🧑‍💻 Tutorial 2: Fine-tuning Transformer Models

Further Reading

Interactive Tools

Day 3

Generative Models

Session 1

Generative Transformers

🖥️ Lecture Slides

🧑‍💻 Tutorial 1: Annotation with Generative Models

🧑‍💻 Tutorial 2: API Calls & Structured Output

Further Reading

Visual Guides

Session 2

Using LLMs in Social Science Research

🖥️ Lecture Slides

🧑‍💻 Tutorial 1: Informed Prompting and Retrieval-Augmented Generation

Further Reading

About

Materials for my Workshop on LLMs

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors