Skip to content

This repository provides data, code, and experiments for evaluating how well LLMs can predict statistical correlations. We propose and assess the Logit-Based Calibrated Prior (LCP), a novel method for using LLM logits to estimate correlation priors in a calibrated way.

Notifications You must be signed in to change notification settings

TheDataStation/LLM-Prior-for-Correlation-Assessment

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Exploiting LLMs for Automatic Hypothesis Assessment via a Logit-Based Calibrated Prior

Data

Benchmarks

We release three benchmark datasets:

  1. Real-world Correlations: Variable pairs with observed correlations, extracted from the Cause-Effect and Kaggle datasets. For the Kaggle portion, we build on the dataset curated by Trummer et al..

    benchmark/real_world_correlations.csv

  2. Counterfactual Correlations: Cause-effect pairs with hypothetical contexts that reverse their original correlations.
    benchmark/counterfactual_correlations.csv

  3. Chicago Correlations: A set of 115 correlations calculated on Chicago Open Data, sampled from the dataset released by the Nexus authors. Of these, 15 are marked as hypothesis-worthy (hypothesis=True in the CSV file), based on the annotations reported in Table 2 of the original Nexus evaluation.
    benchmark/chicago_correlations.csv

Experiment Data

We also release the raw experimental outputs corresponding to the three benchmarks above, available in the outputs directory. This data can be used to reproduce all the results reported in the paper.

Besides, we include the raw experiment data for the RoBERTa classifer in the outputs/roberta_classifier directory.

Installation

$ cd correlation_prior
$ conda create -n corr_prior python=3.11 -y
$ conda activate corr_prior
$ pip install -r requirements.txt
$ export PYTHONPATH="$(pwd):$PYTHONPATH" # make the correlation_prior importable

Setting your OpenAI API Key

export OPENAI_API_KEY="your_api_key_here"

Build Correlation Priors

The script eval_llm_prior_parallel.py generates different types of correlation priors using LLMs. The LLM calls are paralleled. The full usage of it is listed below:

usage: eval_llm_prior_parallel.py [-h] [--input_file INPUT_FILE] [--output_dir OUTPUT_DIR] [--model MODEL] [--prior PRIOR] [--num_iter NUM_ITER]
                                  [--ref_file REF_FILE] [--workers WORKERS]

Elicit various types of correlation priors from LLMs.

options:
  -h, --help            show this help message and exit
  --input_file INPUT_FILE
                        Path to the input correlations file.
  --output_dir OUTPUT_DIR
                        Path to the output directory.
  --model MODEL         The LLM model to use.
  --prior PRIOR         Prior type to use. Options: "gaussian_prior", "kde_prior", "lc_prior"
  --num_iter NUM_ITER   Number of iterations to run.
  --ref_file REF_FILE   Path to a previous output file for lookup; any correlation already in this file will be reused instead of being reprocessed.
  --workers WORKERS     The number of workers to use for parallel processing.

The generated priors are saved as CSV files in the specified output folder. For LCP, the discrete probability distribution is stored; when used online, these are converted into continuous distributions.

To reproduce all prior-generation methods used in the paper, simply run:

$ sh run_exp.sh

Evaluate the quality of a correlation prior

We define various metrics such as sign accuracy, absolute error and Information content in the paper to evaluate the quality of a prior. You can run process_results.py script to get the performance of each metric for every prior on a benchmark.

Example Usage:

python eval/process_results.py \
  --benchmark_name "real_world_correlations" \
  --output_dir "outputs/real_world_correlations/" \
  --num_iter 1 \
  --model_type "gpt-4o" \
  --priors "Uniform,Gaussian,KDE,LCP"

Example Output:

+----------+---------------+-----------+-----------+---------------------+--------------+
|  Method  | Sign Accuracy |   Error   |    p(r)   | Information Content | 95% coverage |
+----------+---------------+-----------+-----------+---------------------+--------------+
| Uniform  |     0.511     | 0.51±0.29 | 0.50±0.00 |      0.69±0.00      |    92.3%     |
| Gaussian |     0.731     | 0.26±0.28 | 1.73±2.80 |      4.10±7.56      |    49.1%     |
|   KDE    |     0.788     | 0.26±0.27 | 1.61±1.89 |      1.73±4.48      |    59.9%     |
|   LCP    |     0.788     | 0.26±0.27 | 0.92±0.38 |      0.27±0.82      |    89.2%     |
+----------+---------------+-----------+-----------+---------------------+--------------+

About

This repository provides data, code, and experiments for evaluating how well LLMs can predict statistical correlations. We propose and assess the Logit-Based Calibrated Prior (LCP), a novel method for using LLM logits to estimate correlation priors in a calibrated way.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published