Causal AI Scientist: Facilitating Causal Data Science with Large Language Models

Causal AI Scientist (CAIS) is an LLM-powered tool for generating data-driven answers to natural language causal queries. It takes a natural language query (for example, "Does participating in a job training program lead to higher income?"), an accompanying dataset, and the corresponding description as inputs. CAIS then frames a suitable causal estimation problem by selecting appropriate treatment and outcome variables. It finds the suitable method for causal effect estimation, implements it, runs diagnostic tests, and finally interprets the numerical results in the context of the original query.

This repo includes instructions on both using the tool to perform causal analysis on a dataset of interest and reproducing results from our paper.

Note : This repository is a work in progress and will be updated with additional instructions and files.

Getting Started

🔧 Environment Installation

Prerequisites:

Python 3.10 (create a new conda environment first)
Required Python libraries (specified in requirements.txt)

Step 1: Copy the example configuration

cp .env.example .env

Step 2: Create Python 3.10 environment

# Create a new conda environment with Python 3.10
conda create -n cais python=3.10
conda activate cais
pip install -r requirement.txt

Step3: Setup cais library

pip install -e .

Dataset Information

All datasets used to evaluate CAIs and the baseline models are available in the data/ directory. Specifically:

all_data: Folder containing all CSV files from the QRData and real-world study collections.
synthetic_data: Folder containing all CSV files corresponding to synthetic datasets.
qr_info.csv: Metadata for QRData files. For each file, this includes the filename, description, causal query, reference causal effect, intended inference method, and additional remarks.
real_info.csv: Metadata for the real-world datasets.
synthetic_info.csv: Metadata for the synthetic datasets.

Run

To execute CAIS, run

python main/run_cais.py \
    --metadata_path {path_to_metadata} \
    --data_dir {path_to_data_folder} \
    --output_dir {output_folder} \
    --output_name {output_filename} \
    --llm_name {llm_name}
    --llm_provider {llm_provider}

Args:

metadata_path (str): Path to the CSV file containing the queries, dataset descriptions, and data file names
data_dir (str): Path to the folder containing the data in CSV format
output_dir (str): Path to the folder where the output JSON results will be saved
output_name (str): Name of the JSON file where the outputs will be saved
llm_name (str): Name of the LLM to be used (e.g., 'gpt-4', 'claude-3', etc.)
llm_provider (str): Name of the LLM service provider (e.g., 'openai', 'anthropic', 'together', etc.)

A specific example,

python run_cais.py \
    --metadata_path "data/qr_info.csv" \
    --data_dir "data/all_data" \
    --output_dir "output" \
    --output_name "results_qr_4o" \
    --llm_name "gpt-4o-mini" \
    --llm_provider "openai"

Reproducing paper results

Will be updated soon

⚠️ Important Notes:

Keep your .env file secure and never commit it to version control

License

Distributed under the MIT License. See LICENSE for more information.

Name		Name	Last commit message	Last commit date
Latest commit History 72 Commits
baselines		baselines
blob/main/asset		blob/main/asset
cais		cais
data		data
data_generation		data_generation
docs		docs
reference_files		reference_files
tests		tests
.env.example		.env.example
.gitignore		.gitignore
.readthedocs.yaml		.readthedocs.yaml
LICENSE		LICENSE
README.md		README.md
README_PYPI.md		README_PYPI.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
run_cais.py		run_cais.py
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Causal AI Scientist: Facilitating Causal Data Science with Large Language Models

Getting Started

🔧 Environment Installation

Dataset Information

Run

Reproducing paper results

License

About

Uh oh!

Releases

Packages

Contributors 6

Uh oh!

Languages

License

causalNLP/causal-agent

Folders and files

Latest commit

History

Repository files navigation

Causal AI Scientist: Facilitating Causal Data Science with Large Language Models

Getting Started

🔧 Environment Installation

Dataset Information

Run

Reproducing paper results

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 6

Uh oh!

Languages

Packages