Technical terms and concepts used in Vigil Guard.
High-performance string matching algorithm used in Branch A. Scans 993 keywords simultaneously in linear time, enabling O(n) detection regardless of pattern count.
The decision fusion component that combines scores from all three detection branches using weighted voting. Applies boost policies for high-confidence matches.
Decision status indicating content passed all checks. Score: 0-29 points.
Decision status indicating content was rejected. Score: 85+ points. User receives error message instead of forwarding to LLM.
Probabilistic data structure for fast negative lookups. Quickly eliminates clean content before expensive pattern matching.
Score multiplier applied when a branch reports high confidence. Ensures dangerous content isn't diluted by other branch scores.
Pattern-based detection using Aho-Corasick prefilter and regex patterns. Port 5005. Weight: 0.30.
Embedding-based detection using sentence transformers. Measures cosine similarity to known threat categories. Port 5006. Weight: 0.40.
Machine learning classification using Meta Llama Guard 2 model. Detects novel attacks that bypass pattern matching. Port 8000. Weight: 0.30.
44 threat categories including: SQL_XSS_ATTACKS, PROMPT_INJECTION, JAILBREAK_ATTEMPT, SOCIAL_ENGINEERING, PII_LEAK, etc.
Column-oriented analytics database storing detection events. Optimized for time-series queries and aggregations.
0.0-1.0 value indicating detection certainty. Higher values = more confident detection.
PII detection feature that uses surrounding text to improve entity recognition accuracy.
Final determination: ALLOW, SANITIZE, or BLOCK. Based on Arbiter's combined score.
PII detection capability processing both Polish and English entities simultaneously. Uses separate Presidio models for each language.
Personal information detected by Presidio: EMAIL, PHONE_NUMBER, PESEL, NIP, CREDIT_CARD, etc.
HTTP header for concurrency control. Prevents configuration overwrites when multiple users edit simultaneously.
Primary ClickHouse table storing detection events. Schema includes: timestamp, event_id, branch scores, categories, PII flags.
Clean content incorrectly flagged as malicious. Reported via Feedback API for system improvement.
Combined detection score (0-100) after Arbiter fusion. Determines final decision.
Monitoring dashboard system. Displays detection metrics, trends, and alerts.
Aggressive content cleaning for scores 65-84. Removes all detected patterns, inserts placeholders.
Pattern-based detection approach. Fast and deterministic. Branch A uses heuristics.
Language identification using both statistical analysis (langdetect) and entity-based hints (PESEL → Polish).
Web UI component for searching and analyzing past detections. Shows score breakdowns and matched patterns.
Attack attempting to bypass LLM safety guidelines. Common category: JAILBREAK_ATTEMPT.
Authentication mechanism for Web UI and API. 24-hour expiration, signed with JWT_SECRET.
Minimal content cleaning for scores 30-64. Removes obvious threats, preserves content structure.
Meta's safety classification model. Powers Branch C for ML-based threat detection.
ClickHouse table engine. Optimized for high-volume inserts and aggregation queries.
Workflow automation platform hosting the detection pipeline. Contains 24 nodes for processing.
Polish tax identification number. 10-digit format with checksum validation.
Open Web Application Security Project - AI Testing Guidelines. Framework for LLM security testing.
Polish national identification number. 11-digit format encoding birth date and gender.
Data that can identify an individual: names, emails, phone numbers, government IDs.
The 24-node detection workflow processing input from webhook to final decision.
Microsoft's PII detection framework. Supports 50+ entity types with ML and rule-based recognizers.
Attack where user input manipulates LLM behavior. Primary threat category.
Permission system: can_view_monitoring, can_view_configuration, can_manage_users.
Polish statistical number for businesses. 9 or 14 digits.
Pattern definition file. Contains 993 keywords across 44 categories. Edit via Web UI only.
Decision status indicating content was cleaned before forwarding. Light (30-64) or Heavy (65-84).
Per-branch scoring details showing which patterns matched and their individual contributions.
Cosine distance between text embeddings. Used by Branch B to match threat categories.
Unique identifier linking related requests. Used for conversation tracking.
Score boundary for decisions:
sanitize_light_threshold: 30 (default)sanitize_heavy_threshold: 65 (default)block_threshold: 50 (default, 85 for actual blocking)
Data retention period. Events expire after configured days (default: 90).
Main configuration file. Contains thresholds, weights, category settings, PII options.
The complete prompt injection detection and defense platform.
Docker network connecting all 11 services. Internal DNS resolution for container communication.
HTTP endpoint receiving prompts for analysis: /webhook/vigil-guard-2
Branch contribution to final score. A=0.30, B=0.40, C=0.30.
n8n automation containing the detection pipeline. File: Vigil Guard v2.1.0.json