omega-walls is a stateful runtime defense for RAG and tool-using agents.
It is built for indirect, distributed, cocktail, and multi-step prompt injection attacks that arrive through untrusted content such as web pages, emails, tickets, and attachments.
Instead of treating each chunk in isolation, Omega Walls turns untrusted context into session-level risk state and emits deterministic runtime actions (Off, block, freeze, quarantine, attribution) before dangerous context formation or tool execution is allowed.
Quick links: What’s New | Problem | Why Omega Walls Is Different | Quickstart | How It Works | Integrate | Results | Benchmarks | Roadmap | OSS vs Enterprise | Limitations
- Framework connectors are now included for
LangChainandLlamaIndex(adapter layer + framework smoke scripts/tests). - README results are aligned to the run-frozen snapshot dated
2026-03-09. - WAInjectBench text charts are included as transparent external-anchor reporting (
partial_comparison).
RAG systems and agents consume untrusted text as if it were evidence.
That text may come from:
- web pages,
- emails,
- tickets,
- retrieved chunks,
- attachments,
- tool outputs.
Attackers exploit this by embedding instructions that are often not obvious in a single chunk or turn.
Real attacks are frequently:
- indirect — carried by external content rather than the user prompt,
- distributed — spread across multiple chunks or turns,
- cocktail-style — combining takeover, exfiltration, tool abuse, and evasion signals,
- multi-step — gradually shaping context before a dangerous action happens.
By the time a single turn looks clearly suspicious, the attack may already have influenced context assembly or triggered tools.
omega-walls addresses this as a stateful runtime trust boundary, not as a static prompt filter:
it accumulates risk across context and time, then emits deterministic actions before unsafe context or tool execution is allowed.
Omega Walls is not just a prompt classifier.
It is designed around three ideas:
-
Stateful accumulation
Weak signals can become dangerous when they repeat, combine, or persist across turns. -
Deterministic enforcement
Omega does not only score risk — it can block context, freeze tools, quarantine sources, and escalate. -
Attribution
When risk rises, Omega surfaces which documents or sources contributed most, so defenses can be selective instead of blind.
In practice, this makes Omega Walls a better fit for distributed, cocktail, and cross-session attack patterns than systems that only inspect one prompt or one chunk at a time.
Runs locally, no API keys required.
# For now (before PyPI release):
pip install .
# PyPI soon:
# pip install omega-walls
# Optional dev setup:
# pip install -e .[dev]
# Optional framework integrations (LangChain + LlamaIndex):
# pip install -e ".[integrations]"
python -m omega demo attack
python -m omega demo benign
python -m omega eval --suite quick --strictExpected demo output shape:
// attack
{
"off": true,
"reasons": {"reason_spike": true, "...": true},
"v_total": [0.0, 3.55, 1.55, 0.0],
"p": [0.0, 1.0, 1.0, 0.0],
"m_next": [0.0, 3.55, 1.55, 0.0],
"top_docs": ["..."],
"actions": [{"type": "SOFT_BLOCK"}, {"type": "TOOL_FREEZE"}],
"tool_executions_count": 0
}// benign
{
"off": false,
"actions": [],
"freeze_active": false
}Omega Walls sits in two runtime positions:
-
Before final context assembly to score, filter, or quarantine untrusted retrieved content.
-
At the tool execution chokepoint to freeze or constrain dangerous tool actions.
At each step:
- The retriever builds an evidence packet
X_tfrom untrusted content. pi0maps each item into a 4-wall risk vectorv(x)plus evidence.- Omega core runs
step()and accumulates those signals into session-level risk state. - Distributed and cocktail patterns increase risk when signals reinforce each other.
- Deterministic reason flags and actions are emitted.
- The tool gateway enforces the resulting decision.
Walls in v1:
override_instructions(instruction takeover)secret_exfiltration(secrets)tool_or_action_abuse(action abuse)policy_evasion(jailbreak/evasion)
Deep docs:
docs/math.mddocs/architecture.mddocs/interfaces.md
Insert Omega in two places:
- Before context builder: project/filter retrieved chunks.
- At tool execution chokepoint: enforce freeze and allowlist.
Minimal integration sketch:
from omega.config.loader import load_resolved_config
from omega.core.omega_core import OmegaCoreV1
from omega.core.params import omega_params_from_config
from omega.policy.off_policy_v1 import OffPolicyV1
from omega.projector.pi0_intent_v2 import Pi0IntentAwareV2
from omega.rag.harness import OmegaRAGHarness, MockLLM
from omega.tools.tool_gateway import ToolGatewayV1
cfg = load_resolved_config(profile="dev").resolved
harness = OmegaRAGHarness(
projector=Pi0IntentAwareV2(cfg),
omega_core=OmegaCoreV1(omega_params_from_config(cfg)),
off_policy=OffPolicyV1(cfg),
tool_gateway=ToolGatewayV1(cfg),
config=cfg,
llm_backend=MockLLM(),
)
# packet_items: retrieved ContentItem list from your retriever
out = harness.run_step(user_query=query, packet_items=packet_items, tool_requests=tool_requests)
allowed_docs = out["allowed_items"] # use only these for final context
tool_decisions = out["tool_decisions"] # enforce before any tool call
off_event = out["off_event"] # log for audit/replayCanonical source documents:
docs/implementation/30_reproducibility_snapshot_2026-03-09.mddocs/implementation/33_wainjectbench_text_eval_2026-03-09.md
The current OSS snapshot shows that Omega Walls is strongest on stateful session attacks, especially where risk emerges across multiple steps rather than one explicit malicious prompt.
In the frozen session benchmark:
- core session attack off-rate reaches
0.9792with0.0000benign off-rate, - cross-session attack off-rate reaches
0.8333, - the strongest slices are cocktail and multi-step session attacks.
See the frozen snapshot for exact run IDs, evaluation setup, and caveats.
| Slice | Run ID | attack_off_rate | benign_off_rate | Notes |
|---|---|---|---|---|
| Deepset hardening anchor | rb_iter3_tool_soft_20260306T153418Z_453a7fd3c715 |
0.7500 |
0.0000 |
rule-cycle milestone |
| Strict PI gate | strict_pi_eval_w202610_20260308T234103Z |
1.0000 |
0.0000 |
f1=1.0000, gate pass |
| Attachment core gate | attachment_eval_20260309T062851Z |
1.0000 |
0.0000 |
deferred policy bucket separated |
| Session canonical | session_eval_w202611_20260309T131634Z |
0.9792 |
0.0000 |
cross_session.attack_off_rate=0.8333 |
Run ID: wainject_eval_w202611_20260309T172201Z
attack_off_rate=0.462159benign_off_rate=0.015885precision=0.914172recall=0.462159
This benchmark is published as a partial external anchor, not as a headline leaderboard claim.
It is included for transparency, but Omega Walls should primarily be evaluated on its intended problem class: stateful, distributed, cocktail, and cross-session attacks, where static text-only comparisons capture only part of the behavior.
Default demo/eval backend is mock for deterministic local runs.
Optional backends:
--llm-backend local --model-path <local_model_dir>--llm-backend ollama --ollama-model <model_name>
Model weights are intentionally not stored in this repository.
For real local model smoke:
$env:OMEGA_MODEL_PATH="<path-to-local-model>"
powershell -ExecutionPolicy Bypass -File scripts/run_real_smoke.ps1| Area | OSS (this repo) | Enterprise layer |
|---|---|---|
Core runtime (step, Off reasons, attribution) |
Yes | Yes |
Rule-based baseline projector (pi0) |
Yes | Yes |
| Local demo + quick eval harness | Yes | Yes |
| Reference tool enforcement gateway | Yes | Yes |
| Control plane / policy UI / RBAC / SSO | No | Yes |
| SIEM integrations and managed audit pipelines | No | Yes |
| Hosted service / operator workflows / SLA | No | Yes |
- Baseline projector is rule-based (
pi0), not a trained classifier. - Quick suite is intentionally compact and local-first.
- Local retriever is reference-grade; production retrieval hardening is out of scope here.
- No enterprise control plane, identity, or SIEM integration in OSS.
- External benchmark comparability is still incomplete; some published results are intentionally labeled as partial anchors rather than leaderboard claims.
Current status is intentionally conservative:
| Benchmark | Status | Current note |
|---|---|---|
| BIPIA | available |
Validation/eval path exists in repo; full comparability depends on local context readiness. |
| PINT | available |
Eval path exists; direct comparison depends on local dataset readiness. |
| WAInjectBench (text) | available |
Reported as partial_comparison in current snapshot. |
| PromptShield | planned |
Not yet integrated in this repo snapshot. |
| NotInject | planned |
Not yet integrated in this repo snapshot. |
Important scope note:
WASP currently appears as a malicious file-level slice inside WAInjectBench text evaluation.
Standalone runtime-level benchmark claims for WASP/ASB are not made yet.
- BIPIA (
available) - PINT (
available) - WAInjectBench (text) (
available, currentlypartial_comparison) - PromptShield (
planned) - NotInject (
planned) - Continue rule-based hardening on distributed/context-required tails.
- AgentDojo (
planned) - LLMail-Inject (
planned) - BrowseSafe-Bench (
planned)
- ASB (
planned) - WASP (
plannedas standalone runtime-level benchmark stage) - When rule-based gains plateau, transition primary effort to the trainable projector track.
If you believe you found a security issue, see SECURITY.md.
For partnership, integration, or product questions, contact us:
- Website:
https://synqra.tech/ - LinkedIn:
https://www.linkedin.com/in/anvifedotov/ - Email:
anton.f@synqra.tech
Documentation index: docs/implementation/README.md
Apache-2.0 (LICENSE).