Causal NLP Project

This repository studies shortcut learning and causal mechanisms in NLP.

Modern NLP models often achieve high accuracy by exploiting spurious correlations rather than learning stable causal mechanisms.

This project builds an experimental pipeline to investigate this problem using the IMDB sentiment classification task.

Research pipeline:

Baseline sentiment classifier
Spurious correlation injection
Counterfactual data augmentation
Causal adjustment methods
Representation probing

Baseline v1 (implemented)

The first step of the pipeline is a reproducible IMDB sentiment classification baseline using TF-IDF + Logistic Regression.

Reproducibility protocol:

fixed dataset split
3 random seeds (1,2,3)
model artifacts saved for each run

Example run:

python -m src.baseline.train --seed 1

Project Structure

causal-nlp
├── config/
├── docs/
│   └── research_logs/
├── experiments/
│   ├── baseline/
│   ├── spurious/
│   ├── cda/
│   ├── adjustment/
│   └── probing/
├── notebooks/
├── src/
│   └── baseline/
├── requirements.txt
├── .gitignore
└── README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Causal NLP Project

Baseline v1 (implemented)

Project Structure

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
config		config
docs/ research_logs		docs/ research_logs
experiments		experiments
notebooks		notebooks
src/baseline		src/baseline
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

Causal NLP Project

Baseline v1 (implemented)

Project Structure

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages