Skip to content

Class notes for University of Pennsylvania course on machine learning for social science (CRIM6012/SOCI6012)

Notifications You must be signed in to change notification settings

gregridgeway/ML4SocialScience

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

185 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ML4SocialScience

These notes are best viewed at the ML4SocialScience github.io site.

These are the class notes for my course on machine learning for social science (CRIM6012/SOCI6012) that I have taught at the University of Pennsylvania since 2024. The course aims to

  • build foundational skills essential for machine learning (calculus, linear algebra, probability)
  • cover a range of machine learning methods, primarily supervised learning methods
  • show applications on a range of social science problems
    • predicting recidivism risk
    • predicting high school dropout (NELS88)
    • exploring media censorship (Varieties of Democracy, V-Dem)
    • studying links between arrest and opiod use (NSDUH)
    • measure racial disparities in pedestrian stop outcomes
    • text analysis of officer-involved shooting incident reports
    • building a small language model based only on Crime and Punishment

Table of contents

  1. Probability review
  2. Naïve Bayes classifier
  3. Prediction, bias, variance, and noise
    • k-nearest neighbor regression and classification
    • Example: Predict dropout risk from the NELS88 data
    • Spam example
  4. Differential calculus review
  5. Classification and regression trees
  6. Linear algebra
    • Basic matrix operations, including matrix derivatives
    • Ordinary least squares and ridge regression
    • Multivariate Taylor series, Newton-Raphson, logistic regression, iteratively reweighted least squares (IRLS)
  7. Singular value decomposition
    • Image compression
    • Image classification with emojis
  8. Boosting and L1 regularization
    • Lasso
    • Forward stagewise selection
    • Gradient boosting
  9. Propensity score estimation
    • Simpson's paradox and confounders
    • Neyman-Rubin causal model
    • Propensity score weighting
      • using machine learning to estimate propensity scores
      • fastDR package
  10. Neural networks
    • Backpropagation "by hand"
    • neuralnet package
    • Tensorflow and Keras
    • Convolutional layers
    • MNIST postal digits dataset
  11. Text analysis
    • Working with text2vec
    • DTM and TFIDF
    • SVD for text
  12. Long short-term memory (LSTM) neural networks
    • LSTM models
    • a small language model

About

Class notes for University of Pennsylvania course on machine learning for social science (CRIM6012/SOCI6012)

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages