We release three benchmark datasets:
-
Real-world Correlations: Variable pairs with observed correlations, extracted from the Cause-Effect and Kaggle datasets. For the Kaggle portion, we build on the dataset curated by Trummer et al..
-
Counterfactual Correlations: Cause-effect pairs with hypothetical contexts that reverse their original correlations.
→ benchmark/counterfactual_correlations.csv -
Chicago Correlations: A set of 115 correlations calculated on Chicago Open Data, sampled from the dataset released by the Nexus authors. Of these, 15 are marked as hypothesis-worthy (
hypothesis=Truein the CSV file), based on the annotations reported in Table 2 of the original Nexus evaluation.
→ benchmark/chicago_correlations.csv
We also release the raw experimental outputs corresponding to the three benchmarks above, available in the outputs directory. This data can be used to reproduce all the results reported in the paper.
Besides, we include the raw experiment data for the RoBERTa classifer in the outputs/roberta_classifier directory.
$ cd correlation_prior
$ conda create -n corr_prior python=3.11 -y
$ conda activate corr_prior
$ pip install -r requirements.txt
$ export PYTHONPATH="$(pwd):$PYTHONPATH" # make the correlation_prior importableexport OPENAI_API_KEY="your_api_key_here"The script eval_llm_prior_parallel.py generates different types of correlation priors using LLMs. The LLM calls are paralleled. The full usage of it is listed below:
usage: eval_llm_prior_parallel.py [-h] [--input_file INPUT_FILE] [--output_dir OUTPUT_DIR] [--model MODEL] [--prior PRIOR] [--num_iter NUM_ITER]
[--ref_file REF_FILE] [--workers WORKERS]
Elicit various types of correlation priors from LLMs.
options:
-h, --help show this help message and exit
--input_file INPUT_FILE
Path to the input correlations file.
--output_dir OUTPUT_DIR
Path to the output directory.
--model MODEL The LLM model to use.
--prior PRIOR Prior type to use. Options: "gaussian_prior", "kde_prior", "lc_prior"
--num_iter NUM_ITER Number of iterations to run.
--ref_file REF_FILE Path to a previous output file for lookup; any correlation already in this file will be reused instead of being reprocessed.
--workers WORKERS The number of workers to use for parallel processing.The generated priors are saved as CSV files in the specified output folder. For LCP, the discrete probability distribution is stored; when used online, these are converted into continuous distributions.
To reproduce all prior-generation methods used in the paper, simply run:
$ sh run_exp.shWe define various metrics such as sign accuracy, absolute error and Information content in the paper to evaluate the quality of a prior. You can run process_results.py script to get the performance of each metric for every prior on a benchmark.
Example Usage:
python eval/process_results.py \
--benchmark_name "real_world_correlations" \
--output_dir "outputs/real_world_correlations/" \
--num_iter 1 \
--model_type "gpt-4o" \
--priors "Uniform,Gaussian,KDE,LCP"Example Output:
+----------+---------------+-----------+-----------+---------------------+--------------+
| Method | Sign Accuracy | Error | p(r) | Information Content | 95% coverage |
+----------+---------------+-----------+-----------+---------------------+--------------+
| Uniform | 0.511 | 0.51±0.29 | 0.50±0.00 | 0.69±0.00 | 92.3% |
| Gaussian | 0.731 | 0.26±0.28 | 1.73±2.80 | 4.10±7.56 | 49.1% |
| KDE | 0.788 | 0.26±0.27 | 1.61±1.89 | 1.73±4.48 | 59.9% |
| LCP | 0.788 | 0.26±0.27 | 0.92±0.38 | 0.27±0.82 | 89.2% |
+----------+---------------+-----------+-----------+---------------------+--------------+