Skip to content

hltcoe/conformal-context-engineering

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 

Repository files navigation

Conformal Context Engineering for RAG

Paper arXiv License

Supplementary materials for the ECIR 2026 short paper:

"Principled Context Engineering for RAG: Statistical Guarantees via Conformal Prediction"


Contents

  • prompts/ - Prompt templates for relevance labeling and LLM scoring
  • examples/ - Example inputs and outputs

Relevance Labeling

Binary relevance judgments are generated using Llama 3.3-70B-Instruct with the following prompt:

Does this snippet contain or support the following fact?

Fact: {fact}

Snippet: {snippet}

Answer YES if the snippet contains this information, NO otherwise.
Think step by step:
1. What is the fact claiming?
2. Does the snippet mention this information?
3. Is the information in the snippet consistent with the fact?

Format: YES/NO
Answer:

See prompts/relevance_labeling.txt for full details.


Model Configurations

Component Model Details
Language Model Llama 3.3-70B-Instruct Relevance labeling, LLM scoring
Language Model GPT-4o LLM scoring (Conformal-LLM)
Embedding Model Qwen3-Embedding-8B -

Conformal Filtering

Conformal-Embedding

Scoring Function:

A_emb(q,s) = 1 - cos(emb(q), emb(s))

Conformal-LLM

Scoring Function:

A_LLM(q,s) = 1 - rating

Where rating ∈ [0,1] is the LLM-provided relevance score. The same prompt was used for both GPT-4o and Llama 3.3-70B-Instruct. See prompts/llm_scoring.txt for the full prompt.

Threshold Calibration

τ̂_α = Quantile_{1-α}({A(q,s) : r(q,s) = 1})

Coverage Guarantee:

P(s ∈ K_q | r(q,s) = 1) ≥ 1 - α

Snippet Extraction

Parameter Value
Chunk size 500 characters
Overlap 100 characters
Boundary handling Preserve sentence boundaries

Datasets

Dataset Calibration Test Split Strategy
NeuCLIR 1,440 snippets 740 snippets Disjoint query topics
RAGTIME 1,710 snippets 560 snippets Disjoint query topics

Results Summary

Method α F1 ConRed%
Conformal-Embedding 0.05 0.720† 22.2
Conformal-Embedding 0.10 0.700† 35.0
Conformal-Embedding 0.20 0.680 52.8
Conformal-LLM 0.05 0.710† 46.5
Conformal-LLM 0.10 0.700† 58.0
Conformal-LLM 0.20 0.680 57.8
Unfiltered Baseline - 0.69 0

† indicates significant improvement over unfiltered baseline (p<0.05).

Key findings:

  • Conformal methods achieve target coverage guarantees
  • 2-3× context reduction while maintaining factual accuracy
  • F1 improves under strict filtering (α ≤ 0.10)

Citation

@misc{chakraborty2025principledcontextengineeringrag,
      title={Principled Context Engineering for RAG: Statistical Guarantees via Conformal Prediction}, 
      author={Debashish Chakraborty and Eugene Yang and Daniel Khashabi and Dawn Lawrie and Kevin Duh},
      year={2025},
      eprint={2511.17908},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2511.17908}, 
}

License

MIT License - see LICENSE for details.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors