|
| 1 | += Red-Teaming with Llama Stack Garak (Inline) |
| 2 | + |
| 3 | +This tutorial demonstrates how to perform comprehensive security testing of Large Language Models using the TrustyAI Garak provider in Llama Stack's inline mode making it ideal for development and testing. |
| 4 | + |
| 5 | +== What You'll Learn |
| 6 | + |
| 7 | +* How to set up Garak inline scanning for LLM security testing |
| 8 | +* Running predefined security benchmarks (OWASP LLM Top 10, AVID taxonomy) |
| 9 | +* Creating custom security probes and scanning profiles |
| 10 | +* Interpreting vulnerability scores and security reports |
| 11 | +* Accessing scan reports and logs |
| 12 | + |
| 13 | +== Prerequisites |
| 14 | + |
| 15 | +Before starting this tutorial, ensure you have: |
| 16 | + |
| 17 | +* Python 3.12+ installed |
| 18 | +* A running OpenAI-compatible LLM inference endpoint (e.g., vLLM) |
| 19 | + |
| 20 | +== Installation & Setup |
| 21 | + |
| 22 | +. Clone the repository and install dependencies: |
| 23 | ++ |
| 24 | +[source,bash] |
| 25 | +---- |
| 26 | +git clone https://github.com/trustyai-explainability/llama-stack-provider-trustyai-garak.git |
| 27 | +cd llama-stack-provider-trustyai-garak |
| 28 | +python3 -m venv .venv && source .venv/bin/activate |
| 29 | +pip install -e . |
| 30 | +---- |
| 31 | + |
| 32 | +. Configure your model endpoint: |
| 33 | ++ |
| 34 | +[source,bash] |
| 35 | +---- |
| 36 | +export VLLM_URL="http://your-model-endpoint/v1" |
| 37 | +export INFERENCE_MODEL="your-model-name" |
| 38 | +export BASE_URL="http://localhost:8321/v1" # Llama Stack server base url |
| 39 | +---- |
| 40 | + |
| 41 | +. Start the Llama Stack server with Garak provider: |
| 42 | ++ |
| 43 | +[source,bash] |
| 44 | +---- |
| 45 | +llama stack run run.yaml --image-type venv |
| 46 | +---- |
| 47 | + |
| 48 | +The server will start on `http://localhost:8321`. |
| 49 | + |
| 50 | +== Step by Step Guide |
| 51 | + |
| 52 | +=== Step 1: Initialize the Client |
| 53 | + |
| 54 | +[source,python] |
| 55 | +---- |
| 56 | +from llama_stack_client import LlamaStackClient |
| 57 | +from rich.pretty import pprint |
| 58 | +
|
| 59 | +BASE_URL = "http://localhost:8321" |
| 60 | +client = LlamaStackClient(base_url=BASE_URL) |
| 61 | +
|
| 62 | +# Verify the setup |
| 63 | +print("Available providers:") |
| 64 | +pprint(client.providers.list()) |
| 65 | +
|
| 66 | +print("\nAvailable models:") |
| 67 | +pprint(client.models.list()) |
| 68 | +---- |
| 69 | + |
| 70 | +=== Step 2: Explore Available Benchmarks |
| 71 | + |
| 72 | +List the predefined security benchmarks. Note all pre-defined trustyai garak benchmarks are prefixed with `trustyai_garak::` |
| 73 | + |
| 74 | +[source,python] |
| 75 | +---- |
| 76 | +benchmarks = client.benchmarks.list() |
| 77 | +print("Available security benchmarks:") |
| 78 | +for benchmark in benchmarks: |
| 79 | + # filter for trustyai garak benchmarks |
| 80 | + if "trustyai_garak" in benchmark.identifier: |
| 81 | + print(f"• Benchmark ID: {benchmark.identifier}") |
| 82 | + if hasattr(benchmark, 'metadata'): |
| 83 | + print(f" Description: {benchmark.metadata.get('description', 'N/A')}") |
| 84 | + print(f" Probes: {benchmark.metadata.get('probes', 'N/A')}") |
| 85 | + print(f" Timeout: {benchmark.metadata.get('timeout', 0)} seconds\n") |
| 86 | +---- |
| 87 | + |
| 88 | +=== Step 3: Run a Quick Security Scan |
| 89 | + |
| 90 | +Start with a quick 5-minute security assessment: |
| 91 | + |
| 92 | +[source,python] |
| 93 | +---- |
| 94 | +# Run the quick security profile |
| 95 | +quick_job = client.eval.run_eval( |
| 96 | + benchmark_id="trustyai_garak::quick", |
| 97 | + benchmark_config={ |
| 98 | + "eval_candidate": { |
| 99 | + "type": "model", |
| 100 | + "model": "your-model-name", # replace with your model name |
| 101 | + "sampling_params": {"max_tokens": 100} |
| 102 | + } |
| 103 | + } |
| 104 | +) |
| 105 | +
|
| 106 | +print(f"Started quick security scan: {quick_job.job_id}") |
| 107 | +print(f"Status: {quick_job.status}") |
| 108 | +---- |
| 109 | + |
| 110 | +=== Step 4: Monitor Scan Progress |
| 111 | + |
| 112 | +[source,python] |
| 113 | +---- |
| 114 | +def monitor_job(job_id, benchmark_id): |
| 115 | + """Monitor job progress with status updates""" |
| 116 | + while True: |
| 117 | + status = client.eval.jobs.status(job_id=job_id, benchmark_id=benchmark_id) |
| 118 | + print(f"Status: {status.status}") |
| 119 | + |
| 120 | + if status.status == "completed": |
| 121 | + print("✅ Scan completed successfully!") |
| 122 | + return status |
| 123 | + elif status.status == "failed": |
| 124 | + print("❌ Scan failed!") |
| 125 | + print(f"Error details: {status.metadata}") |
| 126 | + return status |
| 127 | + elif status.status in ["cancelled"]: |
| 128 | + print(f"⚠️ Scan {status.status}") |
| 129 | + return status |
| 130 | + |
| 131 | + time.sleep(10) # check every 10 seconds |
| 132 | +
|
| 133 | +# Monitor the quick scan |
| 134 | +final_status = monitor_job(quick_job.job_id, "trustyai_garak::quick") |
| 135 | +---- |
| 136 | +=== Step 5: Analyze Security Results |
| 137 | + |
| 138 | +[source,python] |
| 139 | +---- |
| 140 | +if final_status.status == "completed": |
| 141 | + # Retrieve detailed results |
| 142 | + job_result = client.eval.jobs.retrieve(job_id=job.job_id, benchmark_id="trustyai_garak::quick") |
| 143 | + results = job_result.scores |
| 144 | + aggregated_scores = {k: v.aggregated_results for k, v in results.items()} |
| 145 | + |
| 146 | + # Display aggregated vulnerability scores |
| 147 | + print("\n🔍 Security Assessment Results:") |
| 148 | + print("=" * 50) |
| 149 | + |
| 150 | + for probe, aggregated_result in aggregated_scores.items(): |
| 151 | + print(f"\n Probe: {probe}") |
| 152 | + for agg_detector, agg_score in aggregated_result.items(): |
| 153 | + # Interpret vulnerability level |
| 154 | + if agg_score >= 0.7: |
| 155 | + risk_level = "🔴 HIGH RISK" |
| 156 | + elif agg_score >= 0.5: |
| 157 | + risk_level = "🟠 MEDIUM RISK" |
| 158 | + elif agg_score >= 0.3: |
| 159 | + risk_level = "🟡 LOW RISK" |
| 160 | + else: |
| 161 | + risk_level = "🟢 SECURE" |
| 162 | + print(f" Score: {agg_score:.3f}") |
| 163 | + print(f" Risk Level: {risk_level}") |
| 164 | +
|
| 165 | + # Show sample generations that triggered vulnerabilities |
| 166 | + print(f"\n📝 Sample Interactions ({len(job_result.generations)} total):") |
| 167 | + print("=" * 50) |
| 168 | + |
| 169 | + import random |
| 170 | + sample_interactions = random.sample(job_result.generations, min(3, len(job_result.generations))) |
| 171 | + |
| 172 | + for i, generation in enumerate(sample_interactions, 1): |
| 173 | + print(f"\nSample {i}:") |
| 174 | + print(f" Prompt: {generation['prompt']}") |
| 175 | + print(f" Response: {generation['response']}") |
| 176 | + print("%"*100) |
| 177 | +---- |
| 178 | + |
| 179 | +=== Step 6: Create Custom Security Benchmark |
| 180 | + |
| 181 | +Register a custom benchmark focusing on specific vulnerabilities: |
| 182 | + |
| 183 | +[source,python] |
| 184 | +---- |
| 185 | +# Custom benchmark for prompt injection testing |
| 186 | +custom_benchmark_id = "custom_prompt_injection" |
| 187 | +
|
| 188 | +client.benchmarks.register( |
| 189 | + benchmark_id=custom_benchmark_id, |
| 190 | + dataset_id="garak", # placeholder |
| 191 | + scoring_functions=["garak_scoring"], # placeholder |
| 192 | + provider_benchmark_id=custom_benchmark_id, |
| 193 | + provider_id="trustyai_garak", |
| 194 | + metadata={ |
| 195 | + "probes": [ |
| 196 | + "promptinject.HijackHateHumans", |
| 197 | + "promptinject.HijackKillHumans", |
| 198 | + "latentinjection.LatentJailbreak" |
| 199 | + ], |
| 200 | + "timeout": 900, # 15 minutes |
| 201 | + } |
| 202 | +) |
| 203 | +
|
| 204 | +print(f"✅ Registered custom benchmark: {custom_benchmark_id}") |
| 205 | +---- |
| 206 | +[NOTE] |
| 207 | +==== |
| 208 | +Please refer to the Garak documentation for all the available probes: https://reference.garak.ai/en/latest/probes.html |
| 209 | +==== |
| 210 | + |
| 211 | +=== Step 7: Run Custom Security Scan |
| 212 | + |
| 213 | +[source,python] |
| 214 | +---- |
| 215 | +# Execute the custom benchmark |
| 216 | +custom_job = client.eval.run_eval( |
| 217 | + benchmark_id=custom_benchmark_id, |
| 218 | + benchmark_config={ |
| 219 | + "eval_candidate": { |
| 220 | + "type": "model", |
| 221 | + "model": "your-model-name", |
| 222 | + "sampling_params": { |
| 223 | + "max_tokens": 150 |
| 224 | + } |
| 225 | + } |
| 226 | + } |
| 227 | +) |
| 228 | +
|
| 229 | +print(f"Started custom prompt injection scan: {custom_job.job_id}") |
| 230 | +
|
| 231 | +# Monitor and analyze results |
| 232 | +custom_status = monitor_job(custom_job.job_id, custom_benchmark_id) |
| 233 | +
|
| 234 | +if custom_status.status == "completed": |
| 235 | + custom_results = client.eval.jobs.retrieve( |
| 236 | + job_id=custom_job.job_id, |
| 237 | + benchmark_id=custom_benchmark_id |
| 238 | + ) |
| 239 | + |
| 240 | + print("\n🎯 Custom Prompt Injection Results:") |
| 241 | + aggregated_scores = {k: v.aggregated_results for k, v in custom_results.scores.items()} |
| 242 | + pprint(aggregated_scores) |
| 243 | +
|
| 244 | +---- |
| 245 | + |
| 246 | +=== Step 8: Run Comprehensive OWASP Assessment |
| 247 | + |
| 248 | +For production readiness, run the full OWASP LLM Top 10 assessment: |
| 249 | + |
| 250 | +[source,python] |
| 251 | +---- |
| 252 | +# Note: This scan takes ~10 hours, suitable for overnight runs |
| 253 | +owasp_job = client.eval.run_eval( |
| 254 | + benchmark_id="trustyai_garak::owasp_llm_top10", |
| 255 | + benchmark_config={ |
| 256 | + "eval_candidate": { |
| 257 | + "type": "model", |
| 258 | + "model": "your-model-name", |
| 259 | + "sampling_params": {"max_tokens": 200} |
| 260 | + } |
| 261 | + } |
| 262 | +) |
| 263 | +---- |
| 264 | + |
| 265 | +=== Step 9: Access Detailed Reports |
| 266 | + |
| 267 | +Garak generates comprehensive reports in multiple formats. There are 4 files that are generated: |
| 268 | + |
| 269 | +* `scan.report.jsonl` - detailed report containing all attempts and their results |
| 270 | +* `scan.hitlog.jsonl` - hitlog containing only vulnerable interactions |
| 271 | +* `scan.log` - detailed log of the scan |
| 272 | +* `scan.report.html` - human-readable report |
| 273 | + |
| 274 | +You can access them using the Llama Stack `files` API. Here's an example to view the scan log: |
| 275 | + |
| 276 | +[source,python] |
| 277 | +---- |
| 278 | +log_content = client.files.content(job_status.metadata['scan.log']) |
| 279 | +log_lines = log_content.strip().split('\n') |
| 280 | +print(f"\n📋 Scan Log (last 5 lines):") |
| 281 | +for line in log_lines[-5:]: |
| 282 | + print(f" {line}") |
| 283 | +---- |
| 284 | + |
| 285 | +== Best Practices |
| 286 | + |
| 287 | +=== Security Testing Strategy |
| 288 | + |
| 289 | +1. **Development Phase**: Use `trustyai_garak::quick` for rapid iteration |
| 290 | +2. **Pre-production**: Run `trustyai_garak::standard` for necessary coverage |
| 291 | +3. **Production Readiness**: Execute full OWASP and AVID compliance scans |
| 292 | +4. **Continuous Monitoring**: Integrate security scans into CI/CD pipelines |
| 293 | + |
| 294 | +=== Performance Optimization |
| 295 | + |
| 296 | +[source,python] |
| 297 | +---- |
| 298 | +# Optimize scan performance with parallel execution |
| 299 | +optimized_metadata = { |
| 300 | + "probes": ["dan", "promptinject", "encoding"], |
| 301 | + "parallel_attempts": 8, # Increase parallelism |
| 302 | + "timeout": 3600 # 1 hour timeout |
| 303 | +} |
| 304 | +---- |
| 305 | + |
| 306 | +== Advanced Usage |
| 307 | + |
| 308 | +You can pass any of the following Garak command line arguments to the scan via the benchmark `metadata` parameter: |
| 309 | + |
| 310 | +* `parallel_attempts` |
| 311 | +* `generations` |
| 312 | +* `seed` |
| 313 | +* `deprefix` |
| 314 | +* `eval_threshold` |
| 315 | +* `probe_tags` |
| 316 | +* `probe_options` |
| 317 | +* `detectors` |
| 318 | +* `extended_detectors` |
| 319 | +* `detector_options` |
| 320 | +* `buffs` |
| 321 | +* `buff_options` |
| 322 | +* `harness_options` |
| 323 | +* `taxonomy` |
| 324 | +* `generate_autodan` |
| 325 | + |
| 326 | +Please refer to the Garak documentation for more details: https://reference.garak.ai/en/latest/cliref.html |
| 327 | + |
| 328 | +== Troubleshooting |
| 329 | + |
| 330 | +**Job stuck in 'scheduled' status:** |
| 331 | + |
| 332 | +* Check if the inference endpoint is accessible |
| 333 | +* Verify model name matches your deployment |
| 334 | +* Review server logs for connection errors |
| 335 | + |
| 336 | +**High memory usage during scans:** |
| 337 | + |
| 338 | +* Reduce `parallel_attempts` in metadata |
| 339 | +* Lower `max_tokens` in sampling parameters |
| 340 | +* Monitor system resources during long-running scans |
| 341 | + |
| 342 | +== Next Steps |
| 343 | + |
| 344 | +Explore xref:garak-lls-shields.adoc[shield testing] for guardrail evaluation. |
0 commit comments