Skip to content

Releases: Shreyas582/WraithRun

v1.8.0

05 Apr 10:01
5955ea0

Choose a tag to compare

WraithRun v1.8.0

This release addresses 8 critical bugs discovered during a comprehensive 20-test live evaluation using Qwen2.5-0.5B and Llama-3.2-1B models. Every fix was verified against the original failing test scenarios.

Added

  • Syslog analysis template — new syslog-analysis investigation template triggered by keywords like log, syslog, journal, event, audit. Runs read_syslogaudit_account_changesinspect_persistence_locations. Use with --task-template syslog-summary. (#141)
  • SSH key enumeration tool — new enumerate_ssh_keys tool performs cross-platform scanning of .ssh directories for authorized_keys, private keys, and public keys. (#141)

Changed

  • Severity calibration — raised listener thresholds (Info <50, Low 50–149, Medium 150–249, High ≥250), lowered account severity, and raised persistence thresholds. Normal desktops no longer trigger spurious high-severity findings. (#139)
  • Richer finding titles — finding titles now include specifics (account names, persistence entry text, SSH directory info) instead of bare counts. (#140)
  • Quantization-aware parameter estimation — the hardcoded 2.2 divisor is replaced with a format-aware divisor: Q4 → 0.55, Q8 → 1.1, FP16 → 2.2, FP32 → 4.4. Detected automatically from model filename conventions. (#138)
  • Template tool orderingfile-integrity-check now leads with hash_binary; ssh-key-investigation now leads with enumerate_ssh_keys. (#141)

Fixed

  • KV-cache attention mask crash — prefill attention length now accounts for forced cache padding when the model lacks a use_cache toggle, preventing shape broadcast errors on models like Qwen2.5 and Llama 3.2. (#136)
  • ReAct hallucination guard — when the model produces a <final> tag at step 0 without calling any tools, the agent falls back to template-driven execution. Quality guard detects hallucinated <call> tags and [observation] markers and replaces them with a deterministic summary. (#137)
  • EP reportingdetect_execution_provider() now recognises DirectML and CUDA backend overrides instead of always reporting CPU. (#142)

Stats

  • 282 tests passing, 0 failures
  • All 8 CI jobs green (Quality Gates, Cross-platform compile ×3, CLI stdin ×2, Live metrics benchmark, Live success e2e)
  • Fixes validated live against both Qwen2.5-0.5B-Instruct and Llama-3.2-1B-Instruct

Full Changelog: v1.7.1...v1.8.0

v1.7.1: Dependency Updates

05 Apr 08:28
v1.7.1
311ad3d

Choose a tag to compare

v1.7.1: Dependency Updates

A small patch release that brings all dependencies up to their latest major versions.

toml has been bumped from 0.8 to 1.1, picking up the TOML spec 1.1 support. thiserror moved from 1.0 to 2.0 with no API changes needed on our side. sha2 was already bumped to 0.11 in v1.7.0, but we had to fix a formatting issue where the new digest output type no longer implements LowerHex directly. The same fix was applied in both the CLI and cyber_tools hash functions.

All CI action dependencies were also updated: actions/checkout to v6, actions/upload-artifact to v7, actions/download-artifact to v8, actions/setup-python to v6, and release-drafter/release-drafter to v7.

282 tests passing, clean clippy.

Full Changelog: v1.7.0...v1.7.1

v1.7.0: Live Inference Fix

05 Apr 08:08
c7707fc

Choose a tag to compare

v1.7.0: Live Inference Fix

This release ships 16 improvements found during a thorough live-mode testing audit of v1.6.0 across CPU, NPU, and GPU backends.

Correctness

The KV-cache decode loop had an off-by-one error that could cause the attention mask length to drift during multi-token generation on the ONNX Vitis backend. That's fixed now. Model parameter estimation also wasn't accounting for external .onnx_data files, which meant some models were being classified a tier lower than they should have been. The CLI crate now properly forwards the directml, cuda, tensorrt, and qnn feature flags to inference_bridge, so building with those features actually works.

Dry-Run and Usability

Dry-run mode got a significant overhaul. The old keyword-matching approach for picking response templates has been replaced with a scored template routing system that picks the best match from 10 built-in templates. Dry-run also used to repeat the same tool in every iteration instead of rotating through the full template, which made multi-tool investigations look broken. That's fixed.

For live inference, the agent now extracts chain-of-thought reasoning from the LLM output before looking for tool-call tags, which means the model's reasoning is preserved in the turn history instead of being silently discarded. There's also a new stderr warning when the detected model is too small for its capability tier, so you'll know early if your 0.5B model is being asked to run a Moderate-tier ReAct loop.

Investigation Quality

Confidence scores now get a corroboration boost when multiple tools independently report related findings. Instead of each finding's confidence being purely formula-driven, findings backed by 2+ tools get a small bump, making the scores more reflective of actual evidence strength.

The Basic-tier deterministic summary is now task-aware, so instead of generic "2 findings detected" output you'll see something like Task "windows-triage" produced 3 findings across 2 tool(s).

Expanded Security Checks

The privilege escalation tool now checks 9 Windows token privileges (up from 4), queries for the AlwaysInstallElevated registry key, and scans for unquoted service paths. On Linux it also picks up setuid, setgid, and (root) indicators from sudo output.

Persistence scanning expanded significantly. On Windows it now covers RunOnce, Winlogon, Image File Execution Options, and AppInit_DLLs registry keys. On Linux it checks additional cron directories, /etc/xdg/autostart, user-level systemd units, and user crontab spools. The suspicious entry detection also got more markers including mshta, regsvr32, certutil, bitsadmin, and others commonly abused for persistence.

Tooling and Discovery

Tokenizer auto-discovery now searches the grandparent directory of the model file, which handles the common HuggingFace layout where model.onnx lives inside an onnx/ subdirectory.

The --models-list command now auto-discovers .onnx files in the ./models directory that aren't already referenced by a configured profile, so local models show up without needing manual config entries.

Model download checksum verification was always failing because the manifest contained placeholder SHA-256 strings. Downloads now detect placeholder checksums and report the actual hash instead of a misleading mismatch error.

Each tool execution now records elapsed_ms on its AgentTurn, giving you per-tool timing visibility in the run report output.

Resolved Issues

#114, #115, #116, #117, #118, #119, #120, #121, #122, #123, #124, #125, #126, #127, #128, #129

Full Changelog: v1.6.0...v1.7.0

v1.6.0: Agentic Investigation Engine

05 Apr 05:43

Choose a tag to compare

v1.6.0: Agentic Investigation Engine

Highlights

WraithRun's agent can now reason iteratively about investigations using a full ReAct (Reason + Act) loop. Moderate and Strong-tier investigations call tools dynamically based on LLM reasoning rather than following a fixed template, producing deeper and more relevant findings.

Features

ReAct Agent Loop (#92)

  • Moderate/Strong investigation tiers now use an LLM-guided ReAct loop with dynamic tool dispatch
  • The agent reasons about which tool to call next based on observations so far
  • Basic tier retains fast template-driven execution for simple tasks
  • Automatic fallback to template synthesis if the LLM exhausts its step budget

Task-aware LLM Synthesis (#93)

  • Synthesis prompts now include the verbatim investigation task for better context
  • Structured output sections (Summary, Key Findings, Risk Assessment, Recommendations)
  • Evidence budget increased from 1,500 to 3,000 chars per observation

Temperature-scaled Sampling (#66)

  • Configurable temperature parameter for LLM token generation
  • Softmax probability sampling when temperature > 0; greedy decoding when temperature Γëñ 0
  • Enables creative vs. deterministic output control per investigation

EP-aware Debug Logs (#67)

  • All inference debug messages now include the active execution provider (DirectML, CoreML, CUDA, TensorRT, QNN, CPU)
  • Replaces hardcoded "Vitis" labels with runtime-detected EP names

ONNX Session Caching (#64)

  • SessionCache struct lazily initializes and reuses the ONNX session and tokenizer across investigation steps
  • Eliminates per-step session rebuild overhead for multi-step investigations

KV-cache Prefix Reuse (#65)

  • Prefix detection framework compares current prompt tokens against previous invocation
  • Hit/miss metrics tracked per session for observability
  • Scaffolded for full KV-state reuse pending upstream DynValue clonability

Model Pack Download (#94)

  • --model-download <NAME> CLI command with curated model manifest
  • --model-download list shows available packs (tinyllama-1.1b-chat, phi-2-2.7b, qwen2-0.5b)
  • SHA-256 checksum verification after download; skips if model already present

Testing

  • 281 tests passing (up from 274 in v1.5.0)
  • New tests: ReAct parsing, tier dispatch, prompt formatting, unknown tool handling

What's Changed

  • feat: complete v1.3.0: doctor diagnostics, --backend flag, conformance tests by @Shreyas582 in #111
  • feat: v1.5.0: Concrete Hardware Backends, Model Formats, Quantization Awareness by @Shreyas582 in #112
  • feat: v1.6.0: Agentic Investigation Engine by @Shreyas582 in #113

Full Changelog: v1.3.1...v1.6.0

v1.5.0: Concrete Hardware Backends, Model Formats, Quantization Awareness

05 Apr 04:59

Choose a tag to compare

v1.5.0: Concrete Hardware Backends, Model Formats, Quantization Awareness

This release ships six concrete hardware backend implementations and foundational support for multi-format models and quantization awareness, completing the v1.5.0 Concrete Hardware Backends milestone.

New Backends (Feature-Gated)

All backends implement ExecutionProviderBackend with runtime availability probing, diagnostics, config keys, and dry-run session support.

Backend Feature Flag Priority Platform
DirectML directml 100 Windows (any DX12 GPU)
CoreML coreml 100 macOS / Apple Silicon
CUDA cuda 200 Linux/Windows (NVIDIA GPU)
TensorRT tensorrt 250 Linux/Windows (NVIDIA + TRT SDK)
QNN qnn 280 Windows ARM64 (Snapdragon X)

Enable one or more at build time:

cargo build --features directml    # Windows GPU
cargo build --features cuda        # NVIDIA GPU
cargo build --features tensorrt    # NVIDIA + TensorRT

Model Format Support (#60)

  • ModelFormat enum: Onnx, Gguf, SafeTensors
  • ModelFormat::from_path() auto-detects format from file extension
  • ExecutionProviderBackend::supported_formats() trait method (default: [Onnx])

Quantization Awareness (#61)

  • QuantFormat enum: Fp32, Fp16, Int8, Int4, BlockQuantized(name), Unknown
  • QuantFormat::detect_from_path() infers quantization from filename conventions
  • ExecutionProviderBackend::supported_quant_formats(): each backend declares its efficient formats
  • NPU backends (Vitis, QNN) support INT8/INT4; GPU backends support FP16/FP32/INT8

Testing

  • 274 tests passing (up from 259)
  • Backend conformance macro expanded with format/quant coverage tests
  • All new backends have cfg-gated conformance suites ready

Closes

Closes #55, closes #56, closes #57, closes #58, closes #59, closes #60, closes #61

Full Changelog: v1.4.0...v1.5.0

What's Changed

  • feat: complete v1.3.0: doctor diagnostics, --backend flag, conformance tests by @Shreyas582 in #111
  • feat: v1.5.0: Concrete Hardware Backends, Model Formats, Quantization Awareness by @Shreyas582 in #112
  • feat: v1.6.0: Agentic Investigation Engine by @Shreyas582 in #113

Full Changelog: v1.3.1...v1.5.0

v1.4.0: Doctor Diagnostics, Backend Selection, Conformance Tests

05 Apr 04:44

Choose a tag to compare

v1.4.0: Complete v1.3.0 Milestone: Doctor Diagnostics, Backend Selection, Conformance Tests

This release completes the v1.3.0 Multi-Backend Inference Abstraction milestone by shipping the final three issues: provider-aware doctor diagnostics, CLI backend selection, and a multi-backend conformance test harness.

New Features

Provider-Aware Doctor Diagnostics (#52)

wraithrun doctor now enumerates all registered inference backends and reports:

  • Backend name, priority, and availability status
  • Per-backend diagnostic entries (check name, status, detail message)
  • Structured backends array in the JSON doctor report

CLI --backend Flag and Auto-Select (#53)

Choose your inference backend explicitly or let the engine pick:

  • --backend <NAME> CLI flag
  • WRAITHRUN_BACKEND environment variable
  • [inference] backend = "..." in TOML config
  • "auto" (default) selects the highest-priority available backend
  • Helpful error messages list available backends if an invalid name is given

Integration Test Harness (#54)

A backend_contract_tests! macro generates 9 contract tests per backend:

  • Name, priority, availability, config keys, diagnostics, dry-run session
  • 5 registry-level tests verify discovery, ordering, and fallback behavior
  • CPU conformance always runs; Vitis conformance is feature-gated behind vitis
  • 14 new tests bringing the total to 259 passing tests

Other Changes

  • RunReport now includes an optional backend field recording which backend was used
  • run-report.schema.json and doctor-introspection.schema.json updated with new fields
  • wraithrun.example.toml includes new [inference] section

Milestone

Closes the v1.3.0 Multi-Backend Inference Abstraction milestone.

Full Changelog: v1.3.1...v1.4.0

What's Changed

  • feat: complete v1.3.0: doctor diagnostics, --backend flag, conformance tests by @Shreyas582 in #111
  • feat: v1.5.0: Concrete Hardware Backends, Model Formats, Quantization Awareness by @Shreyas582 in #112
  • feat: v1.6.0: Agentic Investigation Engine by @Shreyas582 in #113

Full Changelog: v1.3.1...v1.4.0

v1.3.1

05 Apr 03:52

Choose a tag to compare

v1.3.1: Provider-Agnostic ModelConfig

This release refactors ModelConfig to use generic backend configuration fields, decoupling inference configuration from any specific hardware provider. This is the next step in the v1.3.0 Multi-Backend Inference Abstraction milestone.

What changed

Provider-agnostic ModelConfig (#49)

  • ModelConfig.vitis_config: Option<VitisEpConfig> replaced with:
    • backend_override: Option<String>: optional backend name hint (e.g. "vitis")
    • backend_config: HashMap<String, String>: generic key-value config map
  • Both fields use #[serde(default)] for backward-compatible deserialization
  • VitisEpConfig retained as a CLI-level helper with into_backend_config() / from_backend_config() conversion methods

Vitis EP reads generic config (#50)

  • discover_ort_dylib_path(), build_base_session_builder_with_provider(), and build_session_with_vitis_cascade() now read from the generic backend_config map instead of the Vitis-specific struct

CPU EP decoupled from Vitis types (#51)

  • All non-Vitis callers (CpuBackend, API server, tests) use backend_override: None, backend_config: Default::default()
  • Zero coupling to Vitis-specific types

Testing

245 tests passing across all 5 crates.

Migration

See upgrade notes for before/after code examples.

v1.3.0

05 Apr 02:38
28c9ca7

Choose a tag to compare

v1.3.0: CI/CD Pipeline Integration & Multi-Backend Inference Foundation

This release closes the v1.2.0 milestone (#103) and begins the v1.3.0 Multi-Backend Inference Abstraction work (#47, #48).

CI/CD Pipeline Integration (#103)

  • GitHub composite Action (action.yml): first-party Shreyas582/wraithrun-action@v1 with version resolution, binary caching (GitHub Actions cache), cross-platform install (Linux/macOS/Windows), scan execution, and JSON finding extraction via python3.
  • Example workflow (.github/workflows/wraithrun-scan.example.yml): push/PR/schedule triggers, artifact upload, GitHub step summary.
  • GitLab CI template (ci-templates/gitlab-ci.yml): ubuntu:22.04-based, configurable via CI variables.
  • Generic shell script (ci-templates/wraithrun-scan.sh): environment-variable driven for Jenkins, CircleCI, and other platforms.
  • CI integration guide (docs/ci-integration.md): step-by-step docs covering all platforms, exit code policy, output formats, scheduled scanning, and interpreting results.

ExecutionProviderBackend Trait (#47)

New inference_bridge::backend module introducing:

  • ExecutionProviderBackend trait: name(), is_available(), priority(), config_keys(), diagnose(), build_session()
  • InferenceSession trait for provider-created inference sessions
  • DiagnosticEntry / DiagnosticSeverity types for doctor integration
  • BackendOptions type alias for provider-specific config passthrough

Built-in implementations:

  • CpuBackend: always available, priority 0, dry-run + ONNX CPU support
  • VitisBackend (cfg-gated vitis feature): AMD Vitis AI NPU, priority 300, environment-based availability detection

Provider Registry (#48)

  • ProviderRegistry with discover(), best_available(), get(), list(), available_names()
  • build_session_with_fallback(): tries preferred backend first, then cascades by descending priority
  • ProviderInfo struct for backend metadata listing

Infrastructure

  • 12 new unit tests (245 total, all passing)
  • CHANGELOG and upgrade notes updated
  • Version bump to 1.3.0

Full Changelog: v1.2.0...v1.3.0

WraithRun v1.2.0

05 Apr 00:38
7172a61

Choose a tag to compare

WraithRun v1.2.0

Dashboard UX Overhaul (#99)

  • 5-tab layout: Runs, Findings, Cases, Compare, Health: organized investigation workflow
  • SVG donut severity charts per-run for at-a-glance risk assessment
  • Clickable evidence chains with toggle visibility for finding details
  • Run comparison diff view showing new, resolved, and changed-severity findings
  • JSON/CSV export for both aggregate findings and per-run data
  • Real-time progress spinners for in-progress runs
  • Cases tab with case list and detail panel

Tool Plugin API (#102)

Extend WraithRun with external tool plugins via tool.toml manifests and subprocess JSON I/O.

  • --tools-dir and --allowed-plugins CLI flags for plugin discovery
  • Automatic platform filtering, sandbox policy enforcement, and timeout support
  • Plugin tools visible in --doctor output and /api/v1/runtime/status endpoint
  • Example plugin: examples/tools/hello_world/
  • Full documentation: docs/plugin-api.md

Security Professional Documentation (#100)

  • 4 investigation playbooks: SSH key compromise, Windows triage, credential leak audit, persistence sweep
  • MITRE ATT&CK mapping for all 8 built-in tools with tactic coverage analysis
  • Threat model with attack surface, trust boundaries, and security controls
  • 2 sample investigation reports (anonymized Linux persistence and Windows triage)

Other

  • Added io-util feature to workspace tokio dependency for plugin subprocess I/O
  • 233 tests passing across all crates

Full Changelog: v1.1.0...v1.2.0

What's Changed

Full Changelog: v1.0.0...v1.2.0

v1.1.0: Professional Workflow Depth

04 Apr 22:44
d9b573e

Choose a tag to compare

v1.1.0: Professional Workflow Depth

Building on v1.0.0's Local API and Web UI foundation, this release adds three features that bring WraithRun closer to professional security workflow integration.

What's New

Structured JSON Audit Logging (#98)

Every API action is now logged as a structured JSON event, including authentication attempts, run lifecycle events, case operations, and server lifecycle. Events are written to a JSON lines file and held in a ring buffer for real-time querying.

  • 12 event types: AuthSuccess, AuthFailure, RunCreated, RunCompleted, RunFailed, RunCancelled, CaseCreated, CaseUpdated, ToolExecuted, ToolPolicyDenied, ServerStarted, ServerStopped
  • CLI flag: --audit-log <PATH> enables file-based audit trail
  • API endpoint: GET /api/v1/audit/events?limit=N returns recent events

Case Management API (#97)

Group related investigation runs under cases for tracking and organization.

  • Create cases: POST /api/v1/cases with title and optional description
  • List cases: GET /api/v1/cases with run count aggregates
  • View case: GET /api/v1/cases/{id} with linked run statistics
  • Update case: PATCH /api/v1/cases/{id} to change title, description, or status (open ΓåÆ investigating ΓåÆ closed)
  • List case runs: GET /api/v1/cases/{id}/runs
  • Link runs to cases: Pass case_id in POST /api/v1/runs request body
  • SQLite schema v2 migration runs automatically on existing databases

Evidence-Backed Narrative Report Format (#96)

New --format narrative output produces analyst-ready investigation reports with structured sections:

  • Executive Summary: task, case reference, finding count, max severity, duration
  • Risk Assessment: severity distribution table (Critical/High/Medium/Low/Info)
  • Investigation Timeline: step-by-step tool execution log with observation summaries
  • Detailed Findings: each finding with confidence level, evidence chain, and recommended action
  • Supplementary Findings: lower-relevance observations
  • Conclusion: final analysis answer
  • Metadata: model tier, inference mode, live metrics

Testing

228 tests pass across all crates (23 API server, 66 core engine, 15 inference bridge, 79 CLI, 45 integration).

Upgrade Notes

  • Existing SQLite databases will be automatically migrated to schema v2 (adds cases table and case_id column on runs)
  • No breaking API changes. All v1.0.0 endpoints continue to work unchanged
  • New narrative format option available alongside existing json, summary, and markdown formats