Latence Python SDK

Catch hallucinations, drift, and unused context before your users do.
Groundedness scoring for RAG pipelines and AI coding agents, with a one-call path to upgrade data quality — from messy input files to fully generated markdown and knowledge graphs — as well as a high-performance retrieval engine (OSS).

Charge your RAG pipelines and harnesses based on real data.

Quickstart • Trace • Upgrade Data Quality • Upgrade Retrieval • Trace Reference • Full Tutorial

Quickstart

pip install latence
export LATENCE_API_KEY="lat_..."

from latence import Latence

client = Latence()  # reads LATENCE_API_KEY from the environment

r = client.experimental.trace.rag(
    response_text="Paris is the capital of France.",
    raw_context="France's capital city is Paris.",
)
print(r.score, r.band, r.context_coverage_ratio, r.context_unused_ratio)

That's it. You now know whether the answer was grounded, how much of your retrieved context was actually used, and whether to trust it.

Step 1 — Trace your answers

Three lanes, one mental model. Pick the one that matches what your app is doing right now.

RAG groundedness — did the answer actually come from your context?

from latence import Latence

client = Latence()

r = client.experimental.trace.rag(
    response_text="Paris is the capital of France.",
    raw_context="France's capital city is Paris.",
)

print(r.score)                   # 0.0 - 1.0
print(r.band)                    # "green" | "amber" | "red" | "unknown"
print(r.context_coverage_ratio)  # how much of the answer is grounded in context
print(r.context_unused_ratio)    # how much retrieved context was dead weight

Code agents — catch phantom APIs and drift turn-over-turn

Chain turns with the opaque next_session_state handoff. The SDK never forces you to track session internals.

turn1 = client.experimental.trace.code(
    response_text="def add(a, b): return a + b",
    raw_context="# utils.py\ndef sub(a, b): return a - b",
    response_language_hint="python",
)

turn2 = client.experimental.trace.code(
    response_text="def mul(a, b): return a * b",
    raw_context="# utils.py\ndef sub(a, b): return a - b",
    response_language_hint="python",
    session_state=turn1.next_session_state,   # chain turns
)

print(turn2.band)
print(turn2.session_signals.recommendation)   # "continue" | "re_anchor" | "fresh_chat"

Hosted Trace pricing is $0.008/request by default. For higher-cost quality mode, pass profile="quality" to trace.rag(...) or trace.code(...); quality requests bill at $0.016/request.

Session rollup — one scoreboard for a live session

Stateless, CPU-only, sub-ms on the pod. Safe to call on every keystroke.

rollup = client.experimental.trace.rollup(turns=[turn1, turn2])

print(rollup.noise_pct)              # fraction of turns flagged as noise
print(rollup.retrieval_waste_pct)    # fraction of retrieved context left unused
print(rollup.model_drift_pct)        # fraction of turns with drift
print(rollup.reason_code_histogram)  # why the turns failed, aggregated
print(rollup.risk_band_trail)        # per-turn band, chronological
print(rollup.recommendations)        # actionable session-level advice

What the signals tell you to do next

The numbers above are not diagnostics. They are routing rules:

Signal	Meaning	Next step
`band` amber/red, low `context_coverage_ratio`	The answer isn't grounded in what you retrieved.	Upgrade data quality — your upstream documents are the bottleneck.
High `context_unused_ratio`, `retrieval_waste_pct > 30%`	You retrieved the wrong chunks.	Upgrade retrieval — your retriever is the bottleneck.
`session_signals.recommendation = "re_anchor"` / `"fresh_chat"` on the code lane	Session drift is compounding.	Reset the agent's context on the next turn.

Full reference: Trace docs and SDK tutorial §18.

Async

Every method above has an await-able twin under AsyncLatence:

from latence import AsyncLatence

async with AsyncLatence() as client:
    r = await client.experimental.trace.rag(
        response_text="Paris is the capital of France.",
        raw_context="France's capital city is Paris.",
    )

Step 2 — Upgrade data quality

Trace is showing low coverage or amber/red bands? The model is rarely the problem. It's usually the upstream data: un-OCR'd PDFs, missing entities, unresolved references. The Latence Data Intelligence Pipeline cleans that in one call.

job = client.pipeline.run(files=["contract.pdf"])
pkg = job.wait_for_completion()

print(pkg.document.markdown)                         # clean markdown
print(pkg.entities.summary)                          # {"total": 142, "by_type": {...}}
print(pkg.knowledge_graph.summary.total_relations)   # 87
pkg.download_archive("./results.zip")

Smart defaults: OCR → entity extraction → relation extraction. Configure any step explicitly:

job = client.pipeline.run(
    files=["contract.pdf"],
    steps={
        "ocr": {"mode": "performance"},
        "redaction": {"mode": "balanced", "redact": True},
        "extraction": {"label_mode": "hybrid", "threshold": 0.3},
        "relation_extraction": {"resolve_entities": True},
    },
)

Every run returns a structured DataPackage:

pkg.document — markdown + per-page layout (OCR)
pkg.entities — entity list + summary (extraction)
pkg.knowledge_graph — entities + relations + graph summary (relation extraction)
pkg.redaction — cleaned text + PII list (redaction)
pkg.compression — compressed text + ratio (compression)
pkg.quality — per-stage confidence, latency, cost

Power users: the typed PipelineBuilder accepts YAML and validates client-side. See docs/pipelines.md for the full orchestration reference (DAG execution, resumable jobs, progress callbacks).

Corpus-level: Dataset Intelligence

Feed pipeline outputs into client.experimental.dataset_intelligence_service to build corpus-wide knowledge graphs, ontologies, and enriched feature spaces with incremental ingestion:

Tier	Method	What it does
1	`di.enrich()`	Semantic feature vectors (CPU-only, fast)
2	`di.build_graph()`	Entity resolution, knowledge graph, link prediction
3	`di.build_ontology()`	Concept clustering, hierarchy induction
Full	`di.run()`	All three tiers sequentially

See docs/dataset_intelligence.md.

Step 3 — Upgrade retrieval

If Trace keeps flagging a high context_unused_ratio, or the session rollup shows retrieval_waste_pct > 30%, your model isn't the problem — your retrieval engine is shipping the wrong chunks.

→ ColSearch — High Performance Late Interaction and multimodal search engine

ColSearch is our late-interaction retrieval engine: token-level ColBERT recall, native multimodal search over PDFs and images, and a drop-in replacement for the retrieval step in your RAG stack. Wire it in and context_unused_ratio collapses.

Error handling

from latence import (
    LatenceError, AuthenticationError, InsufficientCreditsError,
    RateLimitError, JobError, JobTimeoutError, TransportError,
)

try:
    r = client.experimental.trace.rag(
        response_text="Paris is the capital of France.",
        raw_context="France's capital city is Paris.",
    )
except AuthenticationError:
    ...  # 401
except InsufficientCreditsError:
    ...  # 402
except RateLimitError as e:
    ...  # 429, retry after e.retry_after
except JobError as e:
    ...  # pipeline job failed; check e.is_resumable
except TransportError:
    ...  # network / DNS

The SDK retries on 429 and 5xx with exponential backoff (default 2 retries, respects Retry-After).

Configuration

export LATENCE_API_KEY="lat_your_key"

from latence import Latence
import latence

client = Latence(
    api_key="lat_...",       # or LATENCE_API_KEY env var
    base_url="https://...",  # or LATENCE_BASE_URL env var
    timeout=60.0,            # request timeout (default: 60s)
    max_retries=2,           # retry attempts (default: 2)
)

latence.setup_logging("DEBUG")  # logs every HTTP request/response

Resources


Trace reference	docs/trace.md — parameters and full response schema
Full tutorial	SDK_TUTORIAL.md — every service, every parameter
API docs	docs.latence.ai
Portal	app.latence.ai

_{MIT License • latence.ai}

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
.github/workflows		.github/workflows
docs		docs
examples		examples
notebooks		notebooks
src/latence		src/latence
tests		tests
.env.example		.env.example
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
CONTRIBUTING.md		CONTRIBUTING.md
GITHUB_SETUP.md		GITHUB_SETUP.md
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
SDK_TUTORIAL.md		SDK_TUTORIAL.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Latence Python SDK

Quickstart

Step 1 — Trace your answers

RAG groundedness — did the answer actually come from your context?

Code agents — catch phantom APIs and drift turn-over-turn

Session rollup — one scoreboard for a live session

What the signals tell you to do next

Async

Step 2 — Upgrade data quality

Corpus-level: Dataset Intelligence

Step 3 — Upgrade retrieval

Error handling

Configuration

Resources

About

Uh oh!

Releases 3

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Latence Python SDK

Quickstart

Step 1 — Trace your answers

RAG groundedness — did the answer actually come from your context?

Code agents — catch phantom APIs and drift turn-over-turn

Session rollup — one scoreboard for a live session

What the signals tell you to do next

Async

Step 2 — Upgrade data quality

Corpus-level: Dataset Intelligence

Step 3 — Upgrade retrieval

Error handling

Configuration

Resources

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 3

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages