This workshop provides an introduction to fundamentals of NLP and LLMs for social scientists. It covers basic and advanced text representation, fundamentals of machine learning, transformer architectures, and applied questions regarding LLMs. The course consists of twelve 90-minute sessions, mostly divided into a lecture with a conceptual focus, and a tutorial covering implementation in python. The course is designed to provide a fast overview of major topics in the application of LLMs. It covers most content rather superficially, aiming to provide students with a good intuition of each concept and code as a starting point to implement their own ideas.
Intro to Python & Text Representation
🧑💻 Tutorial 1: Intro to Python
🧑💻 Tutorial 2: Pandas & basic text representation
- Introduction to text representation for social scientists: Grimmer, J., Roberts, M. E., & Stewart, B. M. (2022). Bag of Words. In Text as data: A new framework for machine learning and the social sciences. Princeton University Press.
- pandas cheatsheet
- Regular Expressions Cheatsheet
Embeddings
🧑💻 Tutorial 1: Intro to embedding manipulation with gensim
🧑💻 Tutorial 2: Scaling Word Embeddings & Document Embeddings
- Explainer on Algorithm behind Word Embeddings: McCormick, C. (2016, April 19). Word2Vec tutorial - The skip-gram model. Chris McCormack's Blog.
gensimdocumentation and tutorials on embeddings
Foundational Papers
- Word Embeddings: Mikolov, Tomas, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. “Efficient Estimation of Word Representations in Vector Space.” arXiv Preprint arXiv:1301.3781.
- Document Embeddings: Le, Quoc, and Tomas Mikolov. 2014. “Distributed Representations of Sentences and Documents.” In International Conference on Machine Learning, 1188–96. PMLR.
- Embedding Regression: Rodriguez, Pedro L, Arthur Spirling, and Brandon M Stewart. 2023. “Embedding Regression: Models for Context-Specific Description and Inference.” American Political Science Review 117 (4): 1255–74.
Social Science Applications
- Studying word Meaning with Embeddings: Kozlowski, Austin C, Matt Taddy, and James A Evans. 2019. “The Geometry of Culture: Analyzing the Meanings of Class Through Word Embeddings.” American Sociological Review 84 (5): 905–49.
- Measuring Bias and Stereotypes with Word Embeddings: Kroon, Anne C, Damian Trilling, and Tamara Raats. 2021. “Guilty by Association: Using Word Embeddings to Measure Ethnic Stereotypes in News Coverage.” Journalism & Mass Communication Quarterly 98 (2): 451–77.
- Scaling Representatives with Document Embeddings: Rheault, Ludovic, and Christopher Cochrane. 2020. “Word Embeddings for the Analysis of Ideological Placement in Parliamentary Corpora.” Political Analysis 28 (1): 112–33.
- Studying over-time Changes in Word Meaning: Rodman, Emma. 2020. “A Timely Intervention: Tracking the Changing Meanings of Political Concepts with Word Vectors.” Political Analysis 28 (1): 87–111.
Intro to Supervised Machine Learning
🧑💻 Tutorial 1: Intro to Supervised Machine Learning with scikit-learn
- Google's Machine Learning Crach Course
- scikit-learn documentation. Not only a documentation of the major library for machine learning, but a great resource of tutorials and explainers on machine learning.
Subword Tokenization, Attention & the Transformer Architecture
🧑💻 Tutorial 1: Contextualized Embeddings, Tokenization, and Inference with Transformers
🧑💻 Tutorial 2: Fine-tuning Transformer Models
- Huggingface Explainer on Subword Tokenization
- Simple Explanation of Attention & Transformer Architecture: Tunstall, L., Von Werra, L., & Wolf, T. (2022). Hello Transformers. In: Natural language processing with transformers. " O'Reilly Media, Inc.".
- Original Transformer Paper: Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., ... & Polosukhin, I. (2017). Attention is all you need. Advances in neural information processing systems, 30.
- Paper introducing BERT Architecture: BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding (Devlin et al., NAACL 2019)
Interactive Tools
- Interactive Neural Network Playground by Tensorflow. Play around with network architecture and hyperparameter choices to gain an intuitive understanding of neural networks.
Generative Transformers
🧑💻 Tutorial 1: Annotation with Generative Models
🧑💻 Tutorial 2: API Calls & Structured Output
Visual Guides
- LLM Visualization by Brendan Bycroft: Full interactive visualization of GPT Architecture with simple explanations of each step in the architecture.
- Jay Alamar's Illustrated Transfromer: Accessible visual explanation of transformer architecture.
- Transformer Explainer Interactive visualization of tranformer forward pass, focusing on attention and impact of specific hyperparameters.
Using LLMs in Social Science Research
🧑💻 Tutorial 1: Informed Prompting and Retrieval-Augmented Generation
