Syntra Testing Refactor

A complete Python benchmarking and evaluation framework for LLM systems. Supports GSM8K, ARC, CMT, and custom datasets with grading, visualization, and PDF report generation.

Features

Prompt and dataset loading (JSONL/ Hugging Face)
Automated evaluation runners with concurrency
Grading and metric calculation
Visualization (matplotlib) and PDF reports (reportlab + pypdf)
CMT extraction from PDFs
Reproducible benchmark pipelines

Quick Start

Clone and install

cd syntra-testing-refactor
python3 -m venv venv
source venv/bin/activate  # or venv\Scripts\activate on Windows
pip install -r requirements.txt
pip install -e .

Configure (optional)

export OPENAI_API_KEY="sk-..."
export HF_TOKEN="hf_..."  # for datasets
mkdir -p data/ runs/

Run a benchmark

# Run full evaluation
python -m src.syntra_testing.runners.eval_runner --dataset gsm8k --output runs/gsm8k/

# Generate visualizations and PDF report
python -m src.syntra_testing.tools.visualization.viz_hf_cmt --input runs/ --output runs/report.pdf

Run tests
```
pytest tests/ -q
```

Project Layout

src/syntra_testing/: Core package
Tools/: Benchmark and visualization tools
prompts/: Prompt templates
benchmarks/: Configuration and results
runs/: Output directory (gitignored)

API Keys

Set environment variables for providers:

OPENAI_API_KEY
ANTHROPIC_API_KEY
HF_TOKEN (for datasets)

See FIXES.md for troubleshooting.

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
.github/workflows		.github/workflows
Scripts		Scripts
Sources		Sources
Tools		Tools
benchmarks		benchmarks
data/splits		data/splits
demos		demos
prompts		prompts
resources		resources
src		src
tests		tests
.dockerignore		.dockerignore
.gitattributes		.gitattributes
.gitignore		.gitignore
.gitignore.bak		.gitignore.bak
BENCHMARKS.md		BENCHMARKS.md
Dockerfile		Dockerfile
FIXES.md		FIXES.md
Makefile		Makefile
README.md		README.md
VENV_SETUP.md		VENV_SETUP.md
VERSION		VERSION
card.yaml		card.yaml
docker-compose.yml		docker-compose.yml
expanded-tests.md		expanded-tests.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
setup.py		setup.py
test_imports.py		test_imports.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Syntra Testing Refactor

Features

Quick Start

Project Layout

API Keys

License

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Syntra Testing Refactor

Features

Quick Start

Project Layout

API Keys

License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages