Skip to content

wharfe/agent-trust-telemetry

agent-trust-telemetry

Trust telemetry middleware for inter-agent communication — makes instruction contamination observable across traces.

Why

In agent orchestration, a trusted sender does not imply trusted content.

Tool descriptions from MCP servers, messages relayed via sampling, forwarded content from upstream agents — all of these can carry contaminated instructions even when the sender is legitimate. Single-message input filtering cannot capture this chain contamination.

Human → Agent A (contaminated) → Agent B → Agent C
              ↑
    Passes through because sender is trusted
    But the content is contaminated

agent-trust-telemetry makes this visible by adding security semantics over traces to your existing observability stack.

What It Does

  • Normalized message envelope with provenance, execution phase, and content hash
  • Policy violation taxonomy — 5 violation classes + 2 anomaly indicators
  • One-hop risk inheritance — upstream contamination score propagation
  • OTel-compatible evidence export — integrates with your existing tracing infrastructure

What It Does NOT Do

  • Guarantee complete prompt injection defense
  • Provide model-internal interpretability
  • Prove agent capability authenticity
  • Automate legal liability boundaries

3-Minute Repro (LangGraph)

git clone https://github.com/wharfe/agent-trust-telemetry.git
cd agent-trust-telemetry
pip install -e ".[dev,langgraph]"
python examples/langgraph/multi_agent_contamination.py

Expected behavior:

  • agent_a is flagged with action=quarantine (poisoned instruction detected)
  • downstream nodes inherit contamination context and remain quarantined
  • ancestor chain is preserved for traceability across hops

LangGraph Minimal Integration

from att.integrations.langgraph import TrustTelemetryCallback

callback = TrustTelemetryCallback()
result = graph.invoke(
    {"messages": messages},
    config={"callbacks": [callback]},
)

CLI Quick Start

pip install agent-trust-telemetry

# Evaluate a single message
att evaluate --message message.json

# Evaluate a JSONL stream
att evaluate --stream messages.jsonl

# View evaluation report
att report --input evaluations.jsonl --format table

Output Example

{
  "risk_score": 82,
  "severity": "high",
  "policy_classes": [
    {"name": "instruction_override", "confidence": 0.91, "severity": "high"}
  ],
  "anomaly_indicators": [
    {"name": "provenance_or_metadata_drift", "subclass": "parent_flagged_propagation", "confidence": 0.70, "severity": "high"}
  ],
  "evidence": [
    "override phrase detected near end of long context",
    "parent message msg:xxx was flagged (risk_score: 65, action: warn)"
  ],
  "recommended_action": "quarantine"
}

quarantine does not simply block a single message — it pauses propagation to downstream agents.

How It Relates to Existing Tools

Tool Focus
Lakera Guard Point-in-time content screening
LangSmith Observability / debugging / tracing
This tool Security semantics over traces + inter-agent trust propagation

Not a competitor — a complement. Designed to layer on top of existing OTel stacks without requiring a proprietary monitoring platform.

Architecture

Layer 1  Cheap Deterministic    Known markers, pattern matching
Layer 2  Contextual Classifier  Maps to policy violation classes (LLM used only for refinement)
Layer 3  Risk Aggregation       Upstream inheritance, session accumulation, role-transition drift

MVP implements Layer 1 only. The core evaluation engine has zero external LLM API dependencies.

Status

Alpha — MVP complete

  • Message Envelope Schema (JSON Schema Draft 2020-12)
  • Policy Violation Taxonomy v0.1
  • Architectural decisions documented (execution phases, propagation, tool metadata)
  • JSON Schema validation (envelope + output contract)
  • Layer 1 evaluation engine (regex pattern matching)
  • One-hop risk inheritance
  • Tool metadata drift detection
  • CLI (att evaluate / att report / att quarantine)
  • OTel span/event export
  • End-to-end demo (Docker Compose + Jaeger)

Contributing

See CONTRIBUTING.md for development setup and guidelines.

Security

To report a vulnerability privately, use GitHub Security Advisories:

For non-sensitive security discussion, open an issue:

License

Apache License 2.0

About

Trust telemetry middleware for multi-agent systems — makes instruction contamination observable across agent-to-agent communication traces.

Topics

Resources

License

Contributing

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Languages