Kalibr Resilience Benchmark

Claim: When your best execution path degrades, Kalibr routes around it automatically. Hardcoded systems keep failing until a human intervenes.

What This Tests

This is execution path routing, not model routing.

Each path is a complete strategy:

Path ID	Model	Tool
`gpt4o-serper`	gpt-4o	Serper
`gpt4o-tavily`	gpt-4o	Tavily
`gpt4o-mini-tavily`	gpt-4o-mini	Tavily

The Agent

5-step research agent:

Plan → Generate search queries (LLM)
Search → Call Serper or Tavily API
Extract → Pull facts with sources (LLM)
Synthesize → Write cited answer (LLM)
Validate → Verify citations

Phases

Phase	Tasks	Description
Learning	15	Normal operation
Degraded	25	Serper fails 70%
Recovery	10	Measure adaptation

Results

Phase	Hardcoded	Kalibr	Delta
Learning	100%	100%	+0%
Degraded	~25%	~90%	+65%
Recovery	~25%	~100%	+75%

Kalibr routes to healthy Tavily paths. Hardcoded keeps failing.

Run It

pip install -r requirements.txt

export KALIBR_API_KEY=your-key
export KALIBR_TENANT_ID=your-tenant
export OPENAI_API_KEY=your-key
export SERPER_API_KEY=your-key
export TAVILY_API_KEY=your-key

python resilience_benchmark.py
python resilience_benchmark.py --quick  # faster
python resilience_benchmark.py --full   # more tasks

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
resilience_benchmark.py		resilience_benchmark.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Kalibr Resilience Benchmark

What This Tests

The Agent

Phases

Results

Run It

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Kalibr Resilience Benchmark

What This Tests

The Agent

Phases

Results

Run It

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages