Skip to content

Maxi520206/causal-nlp

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

31 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Causal NLP Project

This repository studies shortcut learning and causal mechanisms in NLP.

Modern NLP models often achieve high accuracy by exploiting spurious correlations rather than learning stable causal mechanisms.

This project builds an experimental pipeline to investigate this problem using the IMDB sentiment classification task.

Research pipeline:

  1. Baseline sentiment classifier
  2. Spurious correlation injection
  3. Counterfactual data augmentation
  4. Causal adjustment methods
  5. Representation probing

Baseline v1 (implemented)

The first step of the pipeline is a reproducible IMDB sentiment classification baseline using TF-IDF + Logistic Regression.

Reproducibility protocol:

  • fixed dataset split
  • 3 random seeds (1,2,3)
  • model artifacts saved for each run

Example run:

python -m src.baseline.train --seed 1

Project Structure

causal-nlp
├── config/
├── docs/
│   └── research_logs/
├── experiments/
│   ├── baseline/
│   ├── spurious/
│   ├── cda/
│   ├── adjustment/
│   └── probing/
├── notebooks/
├── src/
│   └── baseline/
├── requirements.txt
├── .gitignore
└── README.md

About

Research experiments on shortcut learning and causal mechanisms in NLP models. Focus on distribution shift, causal probing, and representation-level interventions.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors