Figure 1: Uncertainty Decomposition with Auxiliary Data (Above). Decomposition for Two-Moons Dataset (Below).
The following delineates the installation instructions. Clone this repository and navigate to it in your terminal. Create an environment using a preferred package manager.
Note: can replace conda with micromamba or uv.
conda create -n vud python=3.10
conda activate vud
pip install vllm
pip install ipykernel
pip install -U ipywidgets
pip install nbconvert
pip install accelerate
pip install pandas matplotlib datasets scikit-learn flask
pip install gpytorch botorch
To run an experiment, first, serve the language model in a terminal.
bash run_llm.sh
Then in a different terminal, run the desired experiments.
Example Scripts:
python run_toy_classification.py --dataset_name [NAME_OF_DATASET]
python run_toy_regression.py --dataset_name [NAME_OF_DATASET]
Parameters:
API Parameters
model_name: The name of the model to use for predictions. Options:Qwen/Qwen2.5-7B,Qwen/Qwen2.5-14Bandmeta-llama/Meta-Llama-3-8B.Qwen/Qwen2.5-14Bis the default.model_port: The port number for the model server. Default is8000.model_ip: The IP address of the model server. Default islocalhost.model_temperature: The temperature for the model. Default is1.0.is_local_client: Whether to use a local client for the model. Default is1(True).0for OpenAI API.
Dataset Parameters
dataset_name: The name of the dataset to use. Options:logistic_regression,moons_1,moons_2,spirals,linear_regressiongaps. Default islogistic_regressionfortoy_classification.pyandlinear_regressionfortoy_regression.py.D_size: The size of the dataset D. Default is15.
X Parameters
-
x_row_method: The method to use for generating the x row. Options:x_range,x_features,sample. Default isx_range.-
x_range: Generates x values based on the range of the features. -
x_features: Specify a set of x values. Default isNone. -
sample: Samples x values randomly from the dataset that are not in the context.
-
-
num_x_samples: Ifx_row_methodissample, this is the number of x values to sample. Default is1. -
x_features: Ifx_row_methodisx_features, this is the set of x values to use. Provide as a string of a dictionary. e.g. for x values (0.5, 0.3), and (0.3, 0.4) the input would be"{'feature1': [0.5, 0.6], 'feature2': [0.3, 0.4]}". Default isNone. -
x_range: Ifx_row_methodisx_range, this is the grid of x values to use. Provide as a string of a dictionary. e.g. for a grid of x values wherefeature1is the range$[0.5, 0.6)$ with step 0.1 andfeature2is the range [0.3, 0.4) with step 0.1, the input would be"{'feature1': [0.5, 0.6, 0.1], 'feature2': [0.3, 0.4, 0.1]}". Default isNone. -
x_sample_seed: The seed for sampling x values. Default is0. -
decimal_places: The number of decimal places to round the x values to. Default is1.
Seed Parameters
numpy_seed: The seed for NumPy random number generation. Default is0.data_split_seed: The seed for splitting the ICL dataset. Default is0.icl_sample_seed: The seed for sampling from the ICL dataset. Default is0.fixed_permutation_seed: Ifpermute_contextis0, this seed is used for permuting the context. Default is0.
Permutation Related Parameters
num_permutations: The number of ICL permutations to average over. Default is5.permute_context: If1, the context is permuted when sampling. If0, the context is not permuted. Default is1.
Z Parameters
num_z: The number of auxiliary z values to use. Default is15.perturb_about_x: If1, the z values are perturbed about the x values. If0, the z values are perturbed about the mean of the ICL data. Default is1.perturbation_std: The amount by which the standard deviation of the Gaussian perturbations (for generating the z values) is scaled. Default is0.1.num_bo_z: The number of z values to use for Bayesian Optimization. The firstnum_z-num_bo_zz values are randomly sampled. If0, no Bayesian Optimization is performed. Default is0.num_candidates: The number of candidates to generate for Bayesian Optimization. Default is3.
Other parameters
run_name: The name of the run. Default istest.save_directory: The sub-directory within/results/toy_classificationor/results/toy_regression(respectively) to save the results in. Default isother.verbose_output: If1, verbose output is printed. Default is0.
Example Scripts:
python run_bandit_classification.py
python run_bandit_classification.py --model_temperature 2.0 --bandit_num_arms 10 --bandit_midpoint 0.6 --bandit_gap 0.1 --bandit_exploration_rate 1.0 --num_trials 100 --num_random_trials 10 --uncertainty_type total --run_name buttons_midpoint_0.6_gap_0.1 --save_directory 10_arm_bandit
Parameters:
API Parameters
model_name: The name of the model to use for predictions. Options:Qwen/Qwen2.5-14B,Qwen/Qwen2.5-14Bandmeta-llama/Meta-Llama-3-8B.Qwen/Qwen2.5-14Bis the default.model_port: The port number for the model server. Default is8000.model_ip: The IP address of the model server. Default islocalhost.model_temperature: The temperature for the model. Default is1.0.is_local_client: Whether to use a local client for the model. Default is1(True).0for OpenAI API.
Bandit Parameters
bandit_name: Name of the bandit to be used. Default is "buttons".bandit_num_arms: Number of arms for the bandit. Default is5.bandit_midpoint: Midpoint reward probability for the bandit. Default is0.5.bandit_gap: Gap between the best and worst arm. Default is0.2.bandit_seed: Seed for the bandit reward generation. Default is0.bandit_exploration_rate: Exploration rate for the bandit algorithm. Default is2.0.is contextual_bandit:0if a contextual bandit problem.1otherwise. Default is0
Experiment Parameters
num_trials: Number of trials to run. Default is10.num_random_trials: Number of random trials to run. Default is3.uncertainty_type: Type of uncertainty to use. Default is "epistemic". Options are "epistemic", "total", and "ucb1".
Seed Parameters
numpy_seed: The seed for NumPy random number generation. Default is0.fixed_permutation_seed: Ifpermute_contextis0, this seed is used for permuting the context. Default is0.
Permutation Related Parameters
num_permutations: The number of ICL permutations to average over. Default is10.permute_context: If1, the context is permuted when sampling. If0, the context is not permuted. Default is1.
Z Parameters
num_z: The number of auxiliary z values to use. Default is1.perturbation_std: The amount by which the standard deviation of the Gaussian perturbations (for generating the z values) is scaled. Default is1.0.decimal_places: The number of decimal places to round the x values to. Default is1.min_KL_rank: Chooses the z value with the lowest Va from the z values with smallestkKL values. Defaultk=1.
Other parameters
run_name: The name of the run. Default istest.save_directory: The sub-directory within/results/banditsto save the results in. Default isother.verbose_output: If1, verbose output is printed. Default is0.
Available built-in question-answering datasets to run:
BoolQA: https://arxiv.org/abs/1905.10044
HotPotQA: https://arxiv.org/abs/1809.09600
PubMedQA: https://aclanthology.org/D19-1259/
Scripts:
python run_qa.py --id [NAME_OF_ID_DATASET] --ood [NAME_OF_OOD_DATASET]
python run_qa.py --id boolqa --ood pubmedqa
Parameters:
id: The name of the in-distribution dataset to use. Options:boolqa,hotpotqa,pubmedqa. Default isboolqa.ood: The name of the out-of-distribution dataset to use. Options:boolqa,hotpotqa,pubmedqa. Default ispubmedqa.num_D: Number of in-context training examples. Default is15.num_z: Number of z perturbations. Default is20.
Evaluation:
Before evaluating out-of-distribution results, ensure that the data paths are updated in eval_ood.py.
python eval_ood.py
Please consider citing our paper if you find it helpful. Thank you 😀!
@misc{jayasekera2025variationaluncertaintydecompositionincontext,
title={Variational Uncertainty Decomposition for In-Context Learning},
author={I. Shavindra Jayasekera and Jacob Si and Filippo Valdettaro and Wenlong Chen and A. Aldo Faisal and Yingzhen Li},
year={2025},
eprint={2509.02327},
archivePrefix={arXiv},
primaryClass={stat.ML},
url={https://arxiv.org/abs/2509.02327},
}
