Releases: Shreyas582/WraithRun
v1.8.0
WraithRun v1.8.0
This release addresses 8 critical bugs discovered during a comprehensive 20-test live evaluation using Qwen2.5-0.5B and Llama-3.2-1B models. Every fix was verified against the original failing test scenarios.
Added
- Syslog analysis template — new
syslog-analysisinvestigation template triggered by keywords likelog,syslog,journal,event,audit. Runsread_syslog→audit_account_changes→inspect_persistence_locations. Use with--task-template syslog-summary. (#141) - SSH key enumeration tool — new
enumerate_ssh_keystool performs cross-platform scanning of.sshdirectories for authorized_keys, private keys, and public keys. (#141)
Changed
- Severity calibration — raised listener thresholds (Info <50, Low 50–149, Medium 150–249, High ≥250), lowered account severity, and raised persistence thresholds. Normal desktops no longer trigger spurious high-severity findings. (#139)
- Richer finding titles — finding titles now include specifics (account names, persistence entry text, SSH directory info) instead of bare counts. (#140)
- Quantization-aware parameter estimation — the hardcoded 2.2 divisor is replaced with a format-aware divisor: Q4 → 0.55, Q8 → 1.1, FP16 → 2.2, FP32 → 4.4. Detected automatically from model filename conventions. (#138)
- Template tool ordering —
file-integrity-checknow leads withhash_binary;ssh-key-investigationnow leads withenumerate_ssh_keys. (#141)
Fixed
- KV-cache attention mask crash — prefill attention length now accounts for forced cache padding when the model lacks a
use_cachetoggle, preventing shape broadcast errors on models like Qwen2.5 and Llama 3.2. (#136) - ReAct hallucination guard — when the model produces a
<final>tag at step 0 without calling any tools, the agent falls back to template-driven execution. Quality guard detects hallucinated<call>tags and[observation]markers and replaces them with a deterministic summary. (#137) - EP reporting —
detect_execution_provider()now recognises DirectML and CUDA backend overrides instead of always reporting CPU. (#142)
Stats
- 282 tests passing, 0 failures
- All 8 CI jobs green (Quality Gates, Cross-platform compile ×3, CLI stdin ×2, Live metrics benchmark, Live success e2e)
- Fixes validated live against both Qwen2.5-0.5B-Instruct and Llama-3.2-1B-Instruct
Full Changelog: v1.7.1...v1.8.0
v1.7.1: Dependency Updates
v1.7.1: Dependency Updates
A small patch release that brings all dependencies up to their latest major versions.
toml has been bumped from 0.8 to 1.1, picking up the TOML spec 1.1 support. thiserror moved from 1.0 to 2.0 with no API changes needed on our side. sha2 was already bumped to 0.11 in v1.7.0, but we had to fix a formatting issue where the new digest output type no longer implements LowerHex directly. The same fix was applied in both the CLI and cyber_tools hash functions.
All CI action dependencies were also updated: actions/checkout to v6, actions/upload-artifact to v7, actions/download-artifact to v8, actions/setup-python to v6, and release-drafter/release-drafter to v7.
282 tests passing, clean clippy.
Full Changelog: v1.7.0...v1.7.1
v1.7.0: Live Inference Fix
v1.7.0: Live Inference Fix
This release ships 16 improvements found during a thorough live-mode testing audit of v1.6.0 across CPU, NPU, and GPU backends.
Correctness
The KV-cache decode loop had an off-by-one error that could cause the attention mask length to drift during multi-token generation on the ONNX Vitis backend. That's fixed now. Model parameter estimation also wasn't accounting for external .onnx_data files, which meant some models were being classified a tier lower than they should have been. The CLI crate now properly forwards the directml, cuda, tensorrt, and qnn feature flags to inference_bridge, so building with those features actually works.
Dry-Run and Usability
Dry-run mode got a significant overhaul. The old keyword-matching approach for picking response templates has been replaced with a scored template routing system that picks the best match from 10 built-in templates. Dry-run also used to repeat the same tool in every iteration instead of rotating through the full template, which made multi-tool investigations look broken. That's fixed.
For live inference, the agent now extracts chain-of-thought reasoning from the LLM output before looking for tool-call tags, which means the model's reasoning is preserved in the turn history instead of being silently discarded. There's also a new stderr warning when the detected model is too small for its capability tier, so you'll know early if your 0.5B model is being asked to run a Moderate-tier ReAct loop.
Investigation Quality
Confidence scores now get a corroboration boost when multiple tools independently report related findings. Instead of each finding's confidence being purely formula-driven, findings backed by 2+ tools get a small bump, making the scores more reflective of actual evidence strength.
The Basic-tier deterministic summary is now task-aware, so instead of generic "2 findings detected" output you'll see something like Task "windows-triage" produced 3 findings across 2 tool(s).
Expanded Security Checks
The privilege escalation tool now checks 9 Windows token privileges (up from 4), queries for the AlwaysInstallElevated registry key, and scans for unquoted service paths. On Linux it also picks up setuid, setgid, and (root) indicators from sudo output.
Persistence scanning expanded significantly. On Windows it now covers RunOnce, Winlogon, Image File Execution Options, and AppInit_DLLs registry keys. On Linux it checks additional cron directories, /etc/xdg/autostart, user-level systemd units, and user crontab spools. The suspicious entry detection also got more markers including mshta, regsvr32, certutil, bitsadmin, and others commonly abused for persistence.
Tooling and Discovery
Tokenizer auto-discovery now searches the grandparent directory of the model file, which handles the common HuggingFace layout where model.onnx lives inside an onnx/ subdirectory.
The --models-list command now auto-discovers .onnx files in the ./models directory that aren't already referenced by a configured profile, so local models show up without needing manual config entries.
Model download checksum verification was always failing because the manifest contained placeholder SHA-256 strings. Downloads now detect placeholder checksums and report the actual hash instead of a misleading mismatch error.
Each tool execution now records elapsed_ms on its AgentTurn, giving you per-tool timing visibility in the run report output.
Resolved Issues
#114, #115, #116, #117, #118, #119, #120, #121, #122, #123, #124, #125, #126, #127, #128, #129
Full Changelog: v1.6.0...v1.7.0
v1.6.0: Agentic Investigation Engine
v1.6.0: Agentic Investigation Engine
Highlights
WraithRun's agent can now reason iteratively about investigations using a full ReAct (Reason + Act) loop. Moderate and Strong-tier investigations call tools dynamically based on LLM reasoning rather than following a fixed template, producing deeper and more relevant findings.
Features
ReAct Agent Loop (#92)
- Moderate/Strong investigation tiers now use an LLM-guided ReAct loop with dynamic tool dispatch
- The agent reasons about which tool to call next based on observations so far
- Basic tier retains fast template-driven execution for simple tasks
- Automatic fallback to template synthesis if the LLM exhausts its step budget
Task-aware LLM Synthesis (#93)
- Synthesis prompts now include the verbatim investigation task for better context
- Structured output sections (Summary, Key Findings, Risk Assessment, Recommendations)
- Evidence budget increased from 1,500 to 3,000 chars per observation
Temperature-scaled Sampling (#66)
- Configurable temperature parameter for LLM token generation
- Softmax probability sampling when temperature > 0; greedy decoding when temperature Γëñ 0
- Enables creative vs. deterministic output control per investigation
EP-aware Debug Logs (#67)
- All inference debug messages now include the active execution provider (DirectML, CoreML, CUDA, TensorRT, QNN, CPU)
- Replaces hardcoded "Vitis" labels with runtime-detected EP names
ONNX Session Caching (#64)
SessionCachestruct lazily initializes and reuses the ONNX session and tokenizer across investigation steps- Eliminates per-step session rebuild overhead for multi-step investigations
KV-cache Prefix Reuse (#65)
- Prefix detection framework compares current prompt tokens against previous invocation
- Hit/miss metrics tracked per session for observability
- Scaffolded for full KV-state reuse pending upstream DynValue clonability
Model Pack Download (#94)
--model-download <NAME>CLI command with curated model manifest--model-download listshows available packs (tinyllama-1.1b-chat, phi-2-2.7b, qwen2-0.5b)- SHA-256 checksum verification after download; skips if model already present
Testing
- 281 tests passing (up from 274 in v1.5.0)
- New tests: ReAct parsing, tier dispatch, prompt formatting, unknown tool handling
What's Changed
- feat: complete v1.3.0: doctor diagnostics, --backend flag, conformance tests by @Shreyas582 in #111
- feat: v1.5.0: Concrete Hardware Backends, Model Formats, Quantization Awareness by @Shreyas582 in #112
- feat: v1.6.0: Agentic Investigation Engine by @Shreyas582 in #113
Full Changelog: v1.3.1...v1.6.0
v1.5.0: Concrete Hardware Backends, Model Formats, Quantization Awareness
v1.5.0: Concrete Hardware Backends, Model Formats, Quantization Awareness
This release ships six concrete hardware backend implementations and foundational support for multi-format models and quantization awareness, completing the v1.5.0 Concrete Hardware Backends milestone.
New Backends (Feature-Gated)
All backends implement ExecutionProviderBackend with runtime availability probing, diagnostics, config keys, and dry-run session support.
| Backend | Feature Flag | Priority | Platform |
|---|---|---|---|
| DirectML | directml |
100 | Windows (any DX12 GPU) |
| CoreML | coreml |
100 | macOS / Apple Silicon |
| CUDA | cuda |
200 | Linux/Windows (NVIDIA GPU) |
| TensorRT | tensorrt |
250 | Linux/Windows (NVIDIA + TRT SDK) |
| QNN | qnn |
280 | Windows ARM64 (Snapdragon X) |
Enable one or more at build time:
cargo build --features directml # Windows GPU
cargo build --features cuda # NVIDIA GPU
cargo build --features tensorrt # NVIDIA + TensorRTModel Format Support (#60)
ModelFormatenum:Onnx,Gguf,SafeTensorsModelFormat::from_path()auto-detects format from file extensionExecutionProviderBackend::supported_formats()trait method (default:[Onnx])
Quantization Awareness (#61)
QuantFormatenum:Fp32,Fp16,Int8,Int4,BlockQuantized(name),UnknownQuantFormat::detect_from_path()infers quantization from filename conventionsExecutionProviderBackend::supported_quant_formats(): each backend declares its efficient formats- NPU backends (Vitis, QNN) support INT8/INT4; GPU backends support FP16/FP32/INT8
Testing
- 274 tests passing (up from 259)
- Backend conformance macro expanded with format/quant coverage tests
- All new backends have cfg-gated conformance suites ready
Closes
Closes #55, closes #56, closes #57, closes #58, closes #59, closes #60, closes #61
Full Changelog: v1.4.0...v1.5.0
What's Changed
- feat: complete v1.3.0: doctor diagnostics, --backend flag, conformance tests by @Shreyas582 in #111
- feat: v1.5.0: Concrete Hardware Backends, Model Formats, Quantization Awareness by @Shreyas582 in #112
- feat: v1.6.0: Agentic Investigation Engine by @Shreyas582 in #113
Full Changelog: v1.3.1...v1.5.0
v1.4.0: Doctor Diagnostics, Backend Selection, Conformance Tests
v1.4.0: Complete v1.3.0 Milestone: Doctor Diagnostics, Backend Selection, Conformance Tests
This release completes the v1.3.0 Multi-Backend Inference Abstraction milestone by shipping the final three issues: provider-aware doctor diagnostics, CLI backend selection, and a multi-backend conformance test harness.
New Features
Provider-Aware Doctor Diagnostics (#52)
wraithrun doctor now enumerates all registered inference backends and reports:
- Backend name, priority, and availability status
- Per-backend diagnostic entries (check name, status, detail message)
- Structured
backendsarray in the JSON doctor report
CLI --backend Flag and Auto-Select (#53)
Choose your inference backend explicitly or let the engine pick:
--backend <NAME>CLI flagWRAITHRUN_BACKENDenvironment variable[inference] backend = "..."in TOML config"auto"(default) selects the highest-priority available backend- Helpful error messages list available backends if an invalid name is given
Integration Test Harness (#54)
A backend_contract_tests! macro generates 9 contract tests per backend:
- Name, priority, availability, config keys, diagnostics, dry-run session
- 5 registry-level tests verify discovery, ordering, and fallback behavior
- CPU conformance always runs; Vitis conformance is feature-gated behind
vitis - 14 new tests bringing the total to 259 passing tests
Other Changes
RunReportnow includes an optionalbackendfield recording which backend was usedrun-report.schema.jsonanddoctor-introspection.schema.jsonupdated with new fieldswraithrun.example.tomlincludes new[inference]section
Milestone
Closes the v1.3.0 Multi-Backend Inference Abstraction milestone.
Full Changelog: v1.3.1...v1.4.0
What's Changed
- feat: complete v1.3.0: doctor diagnostics, --backend flag, conformance tests by @Shreyas582 in #111
- feat: v1.5.0: Concrete Hardware Backends, Model Formats, Quantization Awareness by @Shreyas582 in #112
- feat: v1.6.0: Agentic Investigation Engine by @Shreyas582 in #113
Full Changelog: v1.3.1...v1.4.0
v1.3.1
v1.3.1: Provider-Agnostic ModelConfig
This release refactors ModelConfig to use generic backend configuration fields, decoupling inference configuration from any specific hardware provider. This is the next step in the v1.3.0 Multi-Backend Inference Abstraction milestone.
What changed
Provider-agnostic ModelConfig (#49)
ModelConfig.vitis_config: Option<VitisEpConfig>replaced with:backend_override: Option<String>: optional backend name hint (e.g."vitis")backend_config: HashMap<String, String>: generic key-value config map
- Both fields use
#[serde(default)]for backward-compatible deserialization VitisEpConfigretained as a CLI-level helper withinto_backend_config()/from_backend_config()conversion methods
Vitis EP reads generic config (#50)
discover_ort_dylib_path(),build_base_session_builder_with_provider(), andbuild_session_with_vitis_cascade()now read from the genericbackend_configmap instead of the Vitis-specific struct
CPU EP decoupled from Vitis types (#51)
- All non-Vitis callers (
CpuBackend, API server, tests) usebackend_override: None, backend_config: Default::default() - Zero coupling to Vitis-specific types
Testing
245 tests passing across all 5 crates.
Migration
See upgrade notes for before/after code examples.
v1.3.0
v1.3.0: CI/CD Pipeline Integration & Multi-Backend Inference Foundation
This release closes the v1.2.0 milestone (#103) and begins the v1.3.0 Multi-Backend Inference Abstraction work (#47, #48).
CI/CD Pipeline Integration (#103)
- GitHub composite Action (
action.yml): first-partyShreyas582/wraithrun-action@v1with version resolution, binary caching (GitHub Actions cache), cross-platform install (Linux/macOS/Windows), scan execution, and JSON finding extraction via python3. - Example workflow (
.github/workflows/wraithrun-scan.example.yml): push/PR/schedule triggers, artifact upload, GitHub step summary. - GitLab CI template (
ci-templates/gitlab-ci.yml): ubuntu:22.04-based, configurable via CI variables. - Generic shell script (
ci-templates/wraithrun-scan.sh): environment-variable driven for Jenkins, CircleCI, and other platforms. - CI integration guide (
docs/ci-integration.md): step-by-step docs covering all platforms, exit code policy, output formats, scheduled scanning, and interpreting results.
ExecutionProviderBackend Trait (#47)
New inference_bridge::backend module introducing:
ExecutionProviderBackendtrait:name(),is_available(),priority(),config_keys(),diagnose(),build_session()InferenceSessiontrait for provider-created inference sessionsDiagnosticEntry/DiagnosticSeveritytypes for doctor integrationBackendOptionstype alias for provider-specific config passthrough
Built-in implementations:
CpuBackend: always available, priority 0, dry-run + ONNX CPU supportVitisBackend(cfg-gatedvitisfeature): AMD Vitis AI NPU, priority 300, environment-based availability detection
Provider Registry (#48)
ProviderRegistrywithdiscover(),best_available(),get(),list(),available_names()build_session_with_fallback(): tries preferred backend first, then cascades by descending priorityProviderInfostruct for backend metadata listing
Infrastructure
- 12 new unit tests (245 total, all passing)
- CHANGELOG and upgrade notes updated
- Version bump to 1.3.0
Full Changelog: v1.2.0...v1.3.0
WraithRun v1.2.0
WraithRun v1.2.0
Dashboard UX Overhaul (#99)
- 5-tab layout: Runs, Findings, Cases, Compare, Health: organized investigation workflow
- SVG donut severity charts per-run for at-a-glance risk assessment
- Clickable evidence chains with toggle visibility for finding details
- Run comparison diff view showing new, resolved, and changed-severity findings
- JSON/CSV export for both aggregate findings and per-run data
- Real-time progress spinners for in-progress runs
- Cases tab with case list and detail panel
Tool Plugin API (#102)
Extend WraithRun with external tool plugins via tool.toml manifests and subprocess JSON I/O.
--tools-dirand--allowed-pluginsCLI flags for plugin discovery- Automatic platform filtering, sandbox policy enforcement, and timeout support
- Plugin tools visible in
--doctoroutput and/api/v1/runtime/statusendpoint - Example plugin:
examples/tools/hello_world/ - Full documentation:
docs/plugin-api.md
Security Professional Documentation (#100)
- 4 investigation playbooks: SSH key compromise, Windows triage, credential leak audit, persistence sweep
- MITRE ATT&CK mapping for all 8 built-in tools with tactic coverage analysis
- Threat model with attack surface, trust boundaries, and security controls
- 2 sample investigation reports (anonymized Linux persistence and Windows triage)
Other
- Added
io-utilfeature to workspace tokio dependency for plugin subprocess I/O - 233 tests passing across all crates
Full Changelog: v1.1.0...v1.2.0
What's Changed
- feat: Dashboard UX, Security Docs, Tool Plugin API (v1.2.0) by @Shreyas582 in #107
- chore: bump version to 1.2.0 by @Shreyas582 in #108
Full Changelog: v1.0.0...v1.2.0
v1.1.0: Professional Workflow Depth
v1.1.0: Professional Workflow Depth
Building on v1.0.0's Local API and Web UI foundation, this release adds three features that bring WraithRun closer to professional security workflow integration.
What's New
Structured JSON Audit Logging (#98)
Every API action is now logged as a structured JSON event, including authentication attempts, run lifecycle events, case operations, and server lifecycle. Events are written to a JSON lines file and held in a ring buffer for real-time querying.
- 12 event types:
AuthSuccess,AuthFailure,RunCreated,RunCompleted,RunFailed,RunCancelled,CaseCreated,CaseUpdated,ToolExecuted,ToolPolicyDenied,ServerStarted,ServerStopped - CLI flag:
--audit-log <PATH>enables file-based audit trail - API endpoint:
GET /api/v1/audit/events?limit=Nreturns recent events
Case Management API (#97)
Group related investigation runs under cases for tracking and organization.
- Create cases:
POST /api/v1/caseswith title and optional description - List cases:
GET /api/v1/caseswith run count aggregates - View case:
GET /api/v1/cases/{id}with linked run statistics - Update case:
PATCH /api/v1/cases/{id}to change title, description, or status (open → investigating → closed) - List case runs:
GET /api/v1/cases/{id}/runs - Link runs to cases: Pass
case_idinPOST /api/v1/runsrequest body - SQLite schema v2 migration runs automatically on existing databases
Evidence-Backed Narrative Report Format (#96)
New --format narrative output produces analyst-ready investigation reports with structured sections:
- Executive Summary: task, case reference, finding count, max severity, duration
- Risk Assessment: severity distribution table (Critical/High/Medium/Low/Info)
- Investigation Timeline: step-by-step tool execution log with observation summaries
- Detailed Findings: each finding with confidence level, evidence chain, and recommended action
- Supplementary Findings: lower-relevance observations
- Conclusion: final analysis answer
- Metadata: model tier, inference mode, live metrics
Testing
228 tests pass across all crates (23 API server, 66 core engine, 15 inference bridge, 79 CLI, 45 integration).
Upgrade Notes
- Existing SQLite databases will be automatically migrated to schema v2 (adds
casestable andcase_idcolumn onruns) - No breaking API changes. All v1.0.0 endpoints continue to work unchanged
- New
narrativeformat option available alongside existingjson,summary, andmarkdownformats