Conformal Context Engineering for RAG

Supplementary materials for the ECIR 2026 short paper:

"Principled Context Engineering for RAG: Statistical Guarantees via Conformal Prediction"

Relevance Labeling

Binary relevance judgments are generated using Llama 3.3-70B-Instruct with the following prompt:

Does this snippet contain or support the following fact?

Fact: {fact}

Snippet: {snippet}

Answer YES if the snippet contains this information, NO otherwise.
Think step by step:
1. What is the fact claiming?
2. Does the snippet mention this information?
3. Is the information in the snippet consistent with the fact?

Format: YES/NO
Answer:

See prompts/relevance_labeling.txt for full details.

Model Configurations

Component	Model	Details
Language Model	Llama 3.3-70B-Instruct	Relevance labeling, LLM scoring
Language Model	GPT-4o	LLM scoring (Conformal-LLM)
Embedding Model	Qwen3-Embedding-8B	-

Conformal Filtering

Conformal-Embedding

Scoring Function:

A_emb(q,s) = 1 - cos(emb(q), emb(s))

Conformal-LLM

Scoring Function:

A_LLM(q,s) = 1 - rating

Where rating ∈ [0,1] is the LLM-provided relevance score. The same prompt was used for both GPT-4o and Llama 3.3-70B-Instruct. See prompts/llm_scoring.txt for the full prompt.

Threshold Calibration

τ̂_α = Quantile_{1-α}({A(q,s) : r(q,s) = 1})

Coverage Guarantee:

P(s ∈ K_q | r(q,s) = 1) ≥ 1 - α

Snippet Extraction

Parameter	Value
Chunk size	500 characters
Overlap	100 characters
Boundary handling	Preserve sentence boundaries

Datasets

Dataset	Calibration	Test	Split Strategy
NeuCLIR	1,440 snippets	740 snippets	Disjoint query topics
RAGTIME	1,710 snippets	560 snippets	Disjoint query topics

Results Summary

Method	α	F1	ConRed%
Conformal-Embedding	0.05	0.720†	22.2
Conformal-Embedding	0.10	0.700†	35.0
Conformal-Embedding	0.20	0.680	52.8
Conformal-LLM	0.05	0.710†	46.5
Conformal-LLM	0.10	0.700†	58.0
Conformal-LLM	0.20	0.680	57.8
Unfiltered Baseline	-	0.69	0

† indicates significant improvement over unfiltered baseline (p<0.05).

Key findings:

Conformal methods achieve target coverage guarantees
2-3× context reduction while maintaining factual accuracy
F1 improves under strict filtering (α ≤ 0.10)

Citation

@misc{chakraborty2025principledcontextengineeringrag,
      title={Principled Context Engineering for RAG: Statistical Guarantees via Conformal Prediction}, 
      author={Debashish Chakraborty and Eugene Yang and Daniel Khashabi and Dawn Lawrie and Kevin Duh},
      year={2025},
      eprint={2511.17908},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2511.17908}, 
}

License

MIT License - see LICENSE for details.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
examples		examples
prompts		prompts
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Conformal Context Engineering for RAG

Contents

Relevance Labeling

Model Configurations

Conformal Filtering

Conformal-Embedding

Conformal-LLM

Threshold Calibration

Snippet Extraction

Datasets

Results Summary

Citation

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

Conformal Context Engineering for RAG

Contents

Relevance Labeling

Model Configurations

Conformal Filtering

Conformal-Embedding

Conformal-LLM

Threshold Calibration

Snippet Extraction

Datasets

Results Summary

Citation

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Packages