Skip to content

latenceainew/latence-python

Repository files navigation

Latence

Latence Python SDK

Catch hallucinations, drift, and unused context before your users do.
Groundedness scoring for RAG pipelines and AI coding agents, with a one-call path to upgrade data quality — from messy input files to fully generated markdown and knowledge graphs — as well as a high-performance retrieval engine (OSS).

Charge your RAG pipelines and harnesses based on real data.

PyPI Python License

QuickstartTraceUpgrade Data QualityUpgrade RetrievalTrace ReferenceFull Tutorial


Quickstart

pip install latence
export LATENCE_API_KEY="lat_..."
from latence import Latence

client = Latence()  # reads LATENCE_API_KEY from the environment

r = client.experimental.trace.rag(
    response_text="Paris is the capital of France.",
    raw_context="France's capital city is Paris.",
)
print(r.score, r.band, r.context_coverage_ratio, r.context_unused_ratio)

That's it. You now know whether the answer was grounded, how much of your retrieved context was actually used, and whether to trust it.


Step 1 — Trace your answers

Three lanes, one mental model. Pick the one that matches what your app is doing right now.

RAG groundedness — did the answer actually come from your context?

from latence import Latence

client = Latence()

r = client.experimental.trace.rag(
    response_text="Paris is the capital of France.",
    raw_context="France's capital city is Paris.",
)

print(r.score)                   # 0.0 - 1.0
print(r.band)                    # "green" | "amber" | "red" | "unknown"
print(r.context_coverage_ratio)  # how much of the answer is grounded in context
print(r.context_unused_ratio)    # how much retrieved context was dead weight

Code agents — catch phantom APIs and drift turn-over-turn

Chain turns with the opaque next_session_state handoff. The SDK never forces you to track session internals.

turn1 = client.experimental.trace.code(
    response_text="def add(a, b): return a + b",
    raw_context="# utils.py\ndef sub(a, b): return a - b",
    response_language_hint="python",
)

turn2 = client.experimental.trace.code(
    response_text="def mul(a, b): return a * b",
    raw_context="# utils.py\ndef sub(a, b): return a - b",
    response_language_hint="python",
    session_state=turn1.next_session_state,   # chain turns
)

print(turn2.band)
print(turn2.session_signals.recommendation)   # "continue" | "re_anchor" | "fresh_chat"

Hosted Trace pricing is $0.008/request by default. For higher-cost quality mode, pass profile="quality" to trace.rag(...) or trace.code(...); quality requests bill at $0.016/request.

Session rollup — one scoreboard for a live session

Stateless, CPU-only, sub-ms on the pod. Safe to call on every keystroke.

rollup = client.experimental.trace.rollup(turns=[turn1, turn2])

print(rollup.noise_pct)              # fraction of turns flagged as noise
print(rollup.retrieval_waste_pct)    # fraction of retrieved context left unused
print(rollup.model_drift_pct)        # fraction of turns with drift
print(rollup.reason_code_histogram)  # why the turns failed, aggregated
print(rollup.risk_band_trail)        # per-turn band, chronological
print(rollup.recommendations)        # actionable session-level advice

What the signals tell you to do next

The numbers above are not diagnostics. They are routing rules:

Signal Meaning Next step
band amber/red, low context_coverage_ratio The answer isn't grounded in what you retrieved. Upgrade data quality — your upstream documents are the bottleneck.
High context_unused_ratio, retrieval_waste_pct > 30% You retrieved the wrong chunks. Upgrade retrieval — your retriever is the bottleneck.
session_signals.recommendation = "re_anchor" / "fresh_chat" on the code lane Session drift is compounding. Reset the agent's context on the next turn.

Full reference: Trace docs and SDK tutorial §18.

Async

Every method above has an await-able twin under AsyncLatence:

from latence import AsyncLatence

async with AsyncLatence() as client:
    r = await client.experimental.trace.rag(
        response_text="Paris is the capital of France.",
        raw_context="France's capital city is Paris.",
    )

Step 2 — Upgrade data quality

Trace is showing low coverage or amber/red bands? The model is rarely the problem. It's usually the upstream data: un-OCR'd PDFs, missing entities, unresolved references. The Latence Data Intelligence Pipeline cleans that in one call.

job = client.pipeline.run(files=["contract.pdf"])
pkg = job.wait_for_completion()

print(pkg.document.markdown)                         # clean markdown
print(pkg.entities.summary)                          # {"total": 142, "by_type": {...}}
print(pkg.knowledge_graph.summary.total_relations)   # 87
pkg.download_archive("./results.zip")

Smart defaults: OCR → entity extraction → relation extraction. Configure any step explicitly:

job = client.pipeline.run(
    files=["contract.pdf"],
    steps={
        "ocr": {"mode": "performance"},
        "redaction": {"mode": "balanced", "redact": True},
        "extraction": {"label_mode": "hybrid", "threshold": 0.3},
        "relation_extraction": {"resolve_entities": True},
    },
)

Every run returns a structured DataPackage:

  • pkg.document — markdown + per-page layout (OCR)
  • pkg.entities — entity list + summary (extraction)
  • pkg.knowledge_graph — entities + relations + graph summary (relation extraction)
  • pkg.redaction — cleaned text + PII list (redaction)
  • pkg.compression — compressed text + ratio (compression)
  • pkg.quality — per-stage confidence, latency, cost

Power users: the typed PipelineBuilder accepts YAML and validates client-side. See docs/pipelines.md for the full orchestration reference (DAG execution, resumable jobs, progress callbacks).

Corpus-level: Dataset Intelligence

Feed pipeline outputs into client.experimental.dataset_intelligence_service to build corpus-wide knowledge graphs, ontologies, and enriched feature spaces with incremental ingestion:

Tier Method What it does
1 di.enrich() Semantic feature vectors (CPU-only, fast)
2 di.build_graph() Entity resolution, knowledge graph, link prediction
3 di.build_ontology() Concept clustering, hierarchy induction
Full di.run() All three tiers sequentially

See docs/dataset_intelligence.md.


Step 3 — Upgrade retrieval

If Trace keeps flagging a high context_unused_ratio, or the session rollup shows retrieval_waste_pct > 30%, your model isn't the problem — your retrieval engine is shipping the wrong chunks.

ColSearch — High Performance Late Interaction and multimodal search engine

ColSearch is our late-interaction retrieval engine: token-level ColBERT recall, native multimodal search over PDFs and images, and a drop-in replacement for the retrieval step in your RAG stack. Wire it in and context_unused_ratio collapses.


Error handling

from latence import (
    LatenceError, AuthenticationError, InsufficientCreditsError,
    RateLimitError, JobError, JobTimeoutError, TransportError,
)

try:
    r = client.experimental.trace.rag(
        response_text="Paris is the capital of France.",
        raw_context="France's capital city is Paris.",
    )
except AuthenticationError:
    ...  # 401
except InsufficientCreditsError:
    ...  # 402
except RateLimitError as e:
    ...  # 429, retry after e.retry_after
except JobError as e:
    ...  # pipeline job failed; check e.is_resumable
except TransportError:
    ...  # network / DNS

The SDK retries on 429 and 5xx with exponential backoff (default 2 retries, respects Retry-After).


Configuration

export LATENCE_API_KEY="lat_your_key"
from latence import Latence
import latence

client = Latence(
    api_key="lat_...",       # or LATENCE_API_KEY env var
    base_url="https://...",  # or LATENCE_BASE_URL env var
    timeout=60.0,            # request timeout (default: 60s)
    max_retries=2,           # retry attempts (default: 2)
)

latence.setup_logging("DEBUG")  # logs every HTTP request/response

Resources

Trace reference docs/trace.md — parameters and full response schema
Full tutorial SDK_TUTORIAL.md — every service, every parameter
API docs docs.latence.ai
Portal app.latence.ai

MIT License • latence.ai

About

Latence Python SDK

Topics

Resources

License

Contributing

Stars

Watchers

Forks

Packages

 
 
 

Contributors