Learning vs Retrieval: The Role of In-Context Examples in Regression with Large Language Models (NAACL 2025, Outstanding Paper Award)

Project Overview

This project explores the in-context learning (ICL) capabilities of Large Language Models (LLMs) in regression tasks. It aims to evaluate the extent to which LLMs retrieve internal knowledge versus learn from in-context examples. The research proposes a framework for evaluating in-context learning mechanisms, focusing on regression tasks using real-world datasets.

Key aspects of the project include:

Demonstrating LLMs' ability to perform regression on realistic datasets.
Measuring the balance between internal knowledge retrieval and learning from in-context examples.
Analyzing factors that influence these mechanisms, such as prior knowledge about tasks and the information provided by in-context examples.
Providing insights for prompt engineering to leverage meta-learning and foster knowledge retrieval.

Prompt Configurations:

Figure 1: Overview of the main prompt configurations used in our experiments.

Named Features (Configuration a): Reveals actual names of features and target variable
Anonymized Features (Configuration b): Uses generic names like "Feature #" and "Output"
Randomized Ground Truth (Configuration c): Maintains named features but replaces ground truth with random numbers
Direct Question Answering (Direct QA): Asks LLM to estimate target variable without in-context examples

Datasets

The project uses the following datasets:

Insurance_Cost¹
Admission_Chance²
Used_Car_Prices³

Files

main_openai.py: Script for running experiments with OpenAI models
main_llama.py: Script for running experiments with Meta's LLaMA model
test.py: Script for evaluating model performance
utils.py: Utility functions for file naming, example creation, and response processing
reader.py: Functions for reading and preprocessing datasets

Usage

OpenAI Models

To run experiments with OpenAI models:

python main_openai.py --api-key-token YOUR_API_KEY [other arguments]

LLaMA Model

To run experiments with Meta's LLaMA model:

python main_llama.py --api-key-token YOUR_API_KEY [other arguments]

Evaluating Results

To evaluate the performance of the models:

python test.py [arguments]

Arguments Overview

All scripts use a variety of shared arguments. Below is a summary of them:

--datasets: List of datasets to evaluate (default: ["Insurance_Cost", "Admission_Chance", "Used_Car_Prices"])
--models: List of models to use (varies by script)
--in-context-numbers: List of in-context example numbers (default: [0, 10, 30, 100])
--feature-nums: List of feature numbers to use (default range: [1, 2, 3, 4])
--configs: List of prompt configurations
--api-key-token: Required API key for the respective service
--test-sample-num: Number of test samples to evaluate (default: 300)
--max-retries: Number of retries before skipping an instance (default: 10)
--output-folder: Folder to save output results (default: LLM_Results)
--testing-sampling: Sampling number for outputs (default: 0)
--seed: Random seed for reproducibility (OpenAI script only, default: 100)
--input-folder: Folder to read LLM results from (test script only, default: LLM_Results)

Example Commands

LLaMA Models

python main_llama.py --api-key-token YOUR_API_KEY --datasets Insurance_Cost --models meta/meta-llama-3-70b-instruct --in-context-numbers 10 30 --feature-nums 1 2 --configs Named_Features --test-sample-num 200 --output-folder output_llama

OpenAI Models

python main_openai.py --api-key-token YOUR_API_KEY --datasets Admission_Chance --models gpt-4-0125-preview --in-context-numbers 10 --feature-nums 1 3 --configs Anonymized_Features --seed 42 --output-folder output_openai

Evaluation

python test.py --datasets Used_Car_Prices --models gpt-3.5-turbo-0125 --in-context-numbers 0 --feature-nums 1 4 --configs Reasoning --input-folder LLM_Results --output-folder Evaluation_Results

Data Preprocessing and Traditional ML Model Training

For instructions on data preprocessing and training traditional machine learning models, please refer to the readme.md file in the Datasets folder.

Requirements

pandas
numpy
scikit-learn
openai
replicate

Citation

To cite our work, please use the following BibTeX entry:

@inproceedings{nafar-etal-2025-learning,
    title = "Learning vs Retrieval: The Role of In-Context Examples in Regression with Large Language Models",
    author = "Nafar, Aliakbar  and
      Venable, K. Brent  and
      Kordjamshidi, Parisa",
    editor = "Chiruzzo, Luis  and
      Ritter, Alan  and
      Wang, Lu",
    booktitle = "Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)",
    month = apr,
    year = "2025",
    address = "Albuquerque, New Mexico",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2025.naacl-long.417/",
    pages = "8206--8229",
    ISBN = "979-8-89176-189-6",
    abstract = "Generative Large Language Models (LLMs) are capable of being in-context learners. However, the underlying mechanism of in-context learning (ICL) is still a major research question, and experimental research results about how models exploit ICL are not always consistent. In this work, we propose a framework for evaluating in-context learning mechanisms, which we claim are a combination of retrieving internal knowledge and learning from in-context examples by focusing on regression tasks. First, we show that LLMs can solve real-world regression problems and then design experiments to measure the extent to which the LLM retrieves its internal knowledge versus learning from in-context examples. We argue that this process lies on a spectrum between these two extremes. We provide an in-depth analysis of the degrees to which these mechanisms are triggered depending on various factors, such as prior knowledge about the tasks and the type and richness of the information provided by the in-context examples. We employ three LLMs and utilize multiple datasets to corroborate the robustness of our findings. Our results shed light on how to engineer prompts to leverage meta-learning from in-context examples and foster knowledge retrieval depending on the problem being addressed."
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Learning vs Retrieval: The Role of In-Context Examples in Regression with Large Language Models (NAACL 2025, Outstanding Paper Award)

Project Overview

Prompt Configurations:

Datasets

Files

Usage

OpenAI Models

LLaMA Model

Evaluating Results

Arguments Overview

Example Commands

LLaMA Models

OpenAI Models

Evaluation

Data Preprocessing and Traditional ML Model Training

Requirements

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 40 Commits
Datasets		Datasets
Diagrams		Diagrams
LLM_Results		LLM_Results
images		images
QAnalysis.py		QAnalysis.py
QAnalysis_plot.py		QAnalysis_plot.py
main_llama.py		main_llama.py
main_openai.py		main_openai.py
reader.py		reader.py
readme.md		readme.md
requirements.txt		requirements.txt
test.py		test.py
utils.py		utils.py

HLR/LvsR-LLM

Folders and files

Latest commit

History

Repository files navigation

Learning vs Retrieval: The Role of In-Context Examples in Regression with Large Language Models (NAACL 2025, Outstanding Paper Award)

Project Overview

Prompt Configurations:

Datasets

Files

Usage

OpenAI Models

LLaMA Model

Evaluating Results

Arguments Overview

Example Commands

LLaMA Models

OpenAI Models

Evaluation

Data Preprocessing and Traditional ML Model Training

Requirements

Citation

Footnotes

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages