Exploiting LLMs for Automatic Hypothesis Assessment via a Logit-Based Calibrated Prior

Data

Benchmarks

We release three benchmark datasets:

Real-world Correlations: Variable pairs with observed correlations, extracted from the Cause-Effect and Kaggle datasets. For the Kaggle portion, we build on the dataset curated by Trummer et al..

→ benchmark/real_world_correlations.csv
Counterfactual Correlations: Cause-effect pairs with hypothetical contexts that reverse their original correlations.
→ benchmark/counterfactual_correlations.csv
Chicago Correlations: A set of 115 correlations calculated on Chicago Open Data, sampled from the dataset released by the Nexus authors. Of these, 15 are marked as hypothesis-worthy (hypothesis=True in the CSV file), based on the annotations reported in Table 2 of the original Nexus evaluation.
→ benchmark/chicago_correlations.csv

Experiment Data

We also release the raw experimental outputs corresponding to the three benchmarks above, available in the outputs directory. This data can be used to reproduce all the results reported in the paper.

Besides, we include the raw experiment data for the RoBERTa classifer in the outputs/roberta_classifier directory.

Installation

$ cd correlation_prior
$ conda create -n corr_prior python=3.11 -y
$ conda activate corr_prior
$ pip install -r requirements.txt
$ export PYTHONPATH="$(pwd):$PYTHONPATH" # make the correlation_prior importable

Setting your OpenAI API Key

export OPENAI_API_KEY="your_api_key_here"

Build Correlation Priors

The script eval_llm_prior_parallel.py generates different types of correlation priors using LLMs. The LLM calls are paralleled. The full usage of it is listed below:

usage: eval_llm_prior_parallel.py [-h] [--input_file INPUT_FILE] [--output_dir OUTPUT_DIR] [--model MODEL] [--prior PRIOR] [--num_iter NUM_ITER]
                                  [--ref_file REF_FILE] [--workers WORKERS]

Elicit various types of correlation priors from LLMs.

options:
  -h, --help            show this help message and exit
  --input_file INPUT_FILE
                        Path to the input correlations file.
  --output_dir OUTPUT_DIR
                        Path to the output directory.
  --model MODEL         The LLM model to use.
  --prior PRIOR         Prior type to use. Options: "gaussian_prior", "kde_prior", "lc_prior"
  --num_iter NUM_ITER   Number of iterations to run.
  --ref_file REF_FILE   Path to a previous output file for lookup; any correlation already in this file will be reused instead of being reprocessed.
  --workers WORKERS     The number of workers to use for parallel processing.

The generated priors are saved as CSV files in the specified output folder. For LCP, the discrete probability distribution is stored; when used online, these are converted into continuous distributions.

To reproduce all prior-generation methods used in the paper, simply run:

$ sh run_exp.sh

Evaluate the quality of a correlation prior

We define various metrics such as sign accuracy, absolute error and Information content in the paper to evaluate the quality of a prior. You can run process_results.py script to get the performance of each metric for every prior on a benchmark.

Example Usage:

python eval/process_results.py \
  --benchmark_name "real_world_correlations" \
  --output_dir "outputs/real_world_correlations/" \
  --num_iter 1 \
  --model_type "gpt-4o" \
  --priors "Uniform,Gaussian,KDE,LCP"

Example Output:

+----------+---------------+-----------+-----------+---------------------+--------------+
|  Method  | Sign Accuracy |   Error   |    p(r)   | Information Content | 95% coverage |
+----------+---------------+-----------+-----------+---------------------+--------------+
| Uniform  |     0.511     | 0.51±0.29 | 0.50±0.00 |      0.69±0.00      |    92.3%     |
| Gaussian |     0.731     | 0.26±0.28 | 1.73±2.80 |      4.10±7.56      |    49.1%     |
|   KDE    |     0.788     | 0.26±0.27 | 1.61±1.89 |      1.73±4.48      |    59.9%     |
|   LCP    |     0.788     | 0.26±0.27 | 0.92±0.38 |      0.27±0.82      |    89.2%     |
+----------+---------------+-----------+-----------+---------------------+--------------+

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
benchmark		benchmark
eval		eval
exp		exp
outputs		outputs
prompts		prompts
scripts		scripts
src		src
templates		templates
.gitignore		.gitignore
README.md		README.md
app.py		app.py
requirements.txt		requirements.txt
run_exp.sh		run_exp.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Exploiting LLMs for Automatic Hypothesis Assessment via a Logit-Based Calibrated Prior

Data

Benchmarks

Experiment Data

Installation

Setting your OpenAI API Key

Build Correlation Priors

Evaluate the quality of a correlation prior

About

Uh oh!

Releases

Packages

Languages

TheDataStation/LLM-Prior-for-Correlation-Assessment

Folders and files

Latest commit

History

Repository files navigation

Exploiting LLMs for Automatic Hypothesis Assessment via a Logit-Based Calibrated Prior

Data

Benchmarks

Experiment Data

Installation

Setting your OpenAI API Key

Build Correlation Priors

Evaluate the quality of a correlation prior

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages