Skip to content

Commit f90ea08

Browse files
Merge pull request #72 from saichandrapandraju/garak-lls
docs: Add tutorials for Red-Teaming, Garak inline, remote execution, and shield evaluation
2 parents 3f23396 + 8a865e6 commit f90ea08

File tree

6 files changed

+1183
-0
lines changed

6 files changed

+1183
-0
lines changed
86.1 KB
Loading

docs/modules/ROOT/nav.adoc

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -16,10 +16,14 @@
1616
*** xref:lm-eval-tutorial-toxicity.adoc[Toxicity Measurement]
1717
** xref:gorch-tutorial.adoc[]
1818
*** xref:hf-serving-runtime-tutorial.adoc[Using Hugging Face models with GuardrailsOrchestrator]
19+
** xref:red-teaming-introduction.adoc[Introduction to Red-Teaming]
1920
** xref:tutorials-llama-stack-section.adoc[]
2021
*** xref:lmeval-lls-tutorial.adoc[Getting Started with LM-Eval on Llama-Stack]
2122
*** xref:trustyai-fms-lls-tutorial.adoc[Getting started with trustyai_fms and llama-stack]
2223
*** xref:lmeval-lls-tutorial-custom-data.adoc[Running Custom Evaluations with LMEval Llama Stack External Eval Provider]
24+
*** xref:garak-lls-inline.adoc[Getting Started with Garak on Llama Stack]
25+
*** xref:garak-lls-shields.adoc[Shield/Guardrail Evaluation with Garak on Llama Stack]
26+
*** xref:garak-lls-remote.adoc[Garak Remote Execution with Llama Stack on Kubeflow Pipelines]
2327
** xref:vllm-judge-tutorial.adoc[]
2428
*** xref:vllm-judge-installation.adoc[Installation]
2529
*** xref:vllm-judge-quickstart.adoc[Quick Start]
Lines changed: 344 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,344 @@
1+
= Red-Teaming with Llama Stack Garak (Inline)
2+
3+
This tutorial demonstrates how to perform comprehensive security testing of Large Language Models using the TrustyAI Garak provider in Llama Stack's inline mode making it ideal for development and testing.
4+
5+
== What You'll Learn
6+
7+
* How to set up Garak inline scanning for LLM security testing
8+
* Running predefined security benchmarks (OWASP LLM Top 10, AVID taxonomy)
9+
* Creating custom security probes and scanning profiles
10+
* Interpreting vulnerability scores and security reports
11+
* Accessing scan reports and logs
12+
13+
== Prerequisites
14+
15+
Before starting this tutorial, ensure you have:
16+
17+
* Python 3.12+ installed
18+
* A running OpenAI-compatible LLM inference endpoint (e.g., vLLM)
19+
20+
== Installation & Setup
21+
22+
. Clone the repository and install dependencies:
23+
+
24+
[source,bash]
25+
----
26+
git clone https://github.com/trustyai-explainability/llama-stack-provider-trustyai-garak.git
27+
cd llama-stack-provider-trustyai-garak
28+
python3 -m venv .venv && source .venv/bin/activate
29+
pip install -e .
30+
----
31+
32+
. Configure your model endpoint:
33+
+
34+
[source,bash]
35+
----
36+
export VLLM_URL="http://your-model-endpoint/v1"
37+
export INFERENCE_MODEL="your-model-name"
38+
export BASE_URL="http://localhost:8321/v1" # Llama Stack server base url
39+
----
40+
41+
. Start the Llama Stack server with Garak provider:
42+
+
43+
[source,bash]
44+
----
45+
llama stack run run.yaml --image-type venv
46+
----
47+
48+
The server will start on `http://localhost:8321`.
49+
50+
== Step by Step Guide
51+
52+
=== Step 1: Initialize the Client
53+
54+
[source,python]
55+
----
56+
from llama_stack_client import LlamaStackClient
57+
from rich.pretty import pprint
58+
59+
BASE_URL = "http://localhost:8321"
60+
client = LlamaStackClient(base_url=BASE_URL)
61+
62+
# Verify the setup
63+
print("Available providers:")
64+
pprint(client.providers.list())
65+
66+
print("\nAvailable models:")
67+
pprint(client.models.list())
68+
----
69+
70+
=== Step 2: Explore Available Benchmarks
71+
72+
List the predefined security benchmarks. Note all pre-defined trustyai garak benchmarks are prefixed with `trustyai_garak::`
73+
74+
[source,python]
75+
----
76+
benchmarks = client.benchmarks.list()
77+
print("Available security benchmarks:")
78+
for benchmark in benchmarks:
79+
# filter for trustyai garak benchmarks
80+
if "trustyai_garak" in benchmark.identifier:
81+
print(f"• Benchmark ID: {benchmark.identifier}")
82+
if hasattr(benchmark, 'metadata'):
83+
print(f" Description: {benchmark.metadata.get('description', 'N/A')}")
84+
print(f" Probes: {benchmark.metadata.get('probes', 'N/A')}")
85+
print(f" Timeout: {benchmark.metadata.get('timeout', 0)} seconds\n")
86+
----
87+
88+
=== Step 3: Run a Quick Security Scan
89+
90+
Start with a quick 5-minute security assessment:
91+
92+
[source,python]
93+
----
94+
# Run the quick security profile
95+
quick_job = client.eval.run_eval(
96+
benchmark_id="trustyai_garak::quick",
97+
benchmark_config={
98+
"eval_candidate": {
99+
"type": "model",
100+
"model": "your-model-name", # replace with your model name
101+
"sampling_params": {"max_tokens": 100}
102+
}
103+
}
104+
)
105+
106+
print(f"Started quick security scan: {quick_job.job_id}")
107+
print(f"Status: {quick_job.status}")
108+
----
109+
110+
=== Step 4: Monitor Scan Progress
111+
112+
[source,python]
113+
----
114+
def monitor_job(job_id, benchmark_id):
115+
"""Monitor job progress with status updates"""
116+
while True:
117+
status = client.eval.jobs.status(job_id=job_id, benchmark_id=benchmark_id)
118+
print(f"Status: {status.status}")
119+
120+
if status.status == "completed":
121+
print("✅ Scan completed successfully!")
122+
return status
123+
elif status.status == "failed":
124+
print("❌ Scan failed!")
125+
print(f"Error details: {status.metadata}")
126+
return status
127+
elif status.status in ["cancelled"]:
128+
print(f"⚠️ Scan {status.status}")
129+
return status
130+
131+
time.sleep(10) # check every 10 seconds
132+
133+
# Monitor the quick scan
134+
final_status = monitor_job(quick_job.job_id, "trustyai_garak::quick")
135+
----
136+
=== Step 5: Analyze Security Results
137+
138+
[source,python]
139+
----
140+
if final_status.status == "completed":
141+
# Retrieve detailed results
142+
job_result = client.eval.jobs.retrieve(job_id=job.job_id, benchmark_id="trustyai_garak::quick")
143+
results = job_result.scores
144+
aggregated_scores = {k: v.aggregated_results for k, v in results.items()}
145+
146+
# Display aggregated vulnerability scores
147+
print("\n🔍 Security Assessment Results:")
148+
print("=" * 50)
149+
150+
for probe, aggregated_result in aggregated_scores.items():
151+
print(f"\n Probe: {probe}")
152+
for agg_detector, agg_score in aggregated_result.items():
153+
# Interpret vulnerability level
154+
if agg_score >= 0.7:
155+
risk_level = "🔴 HIGH RISK"
156+
elif agg_score >= 0.5:
157+
risk_level = "🟠 MEDIUM RISK"
158+
elif agg_score >= 0.3:
159+
risk_level = "🟡 LOW RISK"
160+
else:
161+
risk_level = "🟢 SECURE"
162+
print(f" Score: {agg_score:.3f}")
163+
print(f" Risk Level: {risk_level}")
164+
165+
# Show sample generations that triggered vulnerabilities
166+
print(f"\n📝 Sample Interactions ({len(job_result.generations)} total):")
167+
print("=" * 50)
168+
169+
import random
170+
sample_interactions = random.sample(job_result.generations, min(3, len(job_result.generations)))
171+
172+
for i, generation in enumerate(sample_interactions, 1):
173+
print(f"\nSample {i}:")
174+
print(f" Prompt: {generation['prompt']}")
175+
print(f" Response: {generation['response']}")
176+
print("%"*100)
177+
----
178+
179+
=== Step 6: Create Custom Security Benchmark
180+
181+
Register a custom benchmark focusing on specific vulnerabilities:
182+
183+
[source,python]
184+
----
185+
# Custom benchmark for prompt injection testing
186+
custom_benchmark_id = "custom_prompt_injection"
187+
188+
client.benchmarks.register(
189+
benchmark_id=custom_benchmark_id,
190+
dataset_id="garak", # placeholder
191+
scoring_functions=["garak_scoring"], # placeholder
192+
provider_benchmark_id=custom_benchmark_id,
193+
provider_id="trustyai_garak",
194+
metadata={
195+
"probes": [
196+
"promptinject.HijackHateHumans",
197+
"promptinject.HijackKillHumans",
198+
"latentinjection.LatentJailbreak"
199+
],
200+
"timeout": 900, # 15 minutes
201+
}
202+
)
203+
204+
print(f"✅ Registered custom benchmark: {custom_benchmark_id}")
205+
----
206+
[NOTE]
207+
====
208+
Please refer to the Garak documentation for all the available probes: https://reference.garak.ai/en/latest/probes.html
209+
====
210+
211+
=== Step 7: Run Custom Security Scan
212+
213+
[source,python]
214+
----
215+
# Execute the custom benchmark
216+
custom_job = client.eval.run_eval(
217+
benchmark_id=custom_benchmark_id,
218+
benchmark_config={
219+
"eval_candidate": {
220+
"type": "model",
221+
"model": "your-model-name",
222+
"sampling_params": {
223+
"max_tokens": 150
224+
}
225+
}
226+
}
227+
)
228+
229+
print(f"Started custom prompt injection scan: {custom_job.job_id}")
230+
231+
# Monitor and analyze results
232+
custom_status = monitor_job(custom_job.job_id, custom_benchmark_id)
233+
234+
if custom_status.status == "completed":
235+
custom_results = client.eval.jobs.retrieve(
236+
job_id=custom_job.job_id,
237+
benchmark_id=custom_benchmark_id
238+
)
239+
240+
print("\n🎯 Custom Prompt Injection Results:")
241+
aggregated_scores = {k: v.aggregated_results for k, v in custom_results.scores.items()}
242+
pprint(aggregated_scores)
243+
244+
----
245+
246+
=== Step 8: Run Comprehensive OWASP Assessment
247+
248+
For production readiness, run the full OWASP LLM Top 10 assessment:
249+
250+
[source,python]
251+
----
252+
# Note: This scan takes ~10 hours, suitable for overnight runs
253+
owasp_job = client.eval.run_eval(
254+
benchmark_id="trustyai_garak::owasp_llm_top10",
255+
benchmark_config={
256+
"eval_candidate": {
257+
"type": "model",
258+
"model": "your-model-name",
259+
"sampling_params": {"max_tokens": 200}
260+
}
261+
}
262+
)
263+
----
264+
265+
=== Step 9: Access Detailed Reports
266+
267+
Garak generates comprehensive reports in multiple formats. There are 4 files that are generated:
268+
269+
* `scan.report.jsonl` - detailed report containing all attempts and their results
270+
* `scan.hitlog.jsonl` - hitlog containing only vulnerable interactions
271+
* `scan.log` - detailed log of the scan
272+
* `scan.report.html` - human-readable report
273+
274+
You can access them using the Llama Stack `files` API. Here's an example to view the scan log:
275+
276+
[source,python]
277+
----
278+
log_content = client.files.content(job_status.metadata['scan.log'])
279+
log_lines = log_content.strip().split('\n')
280+
print(f"\n📋 Scan Log (last 5 lines):")
281+
for line in log_lines[-5:]:
282+
print(f" {line}")
283+
----
284+
285+
== Best Practices
286+
287+
=== Security Testing Strategy
288+
289+
1. **Development Phase**: Use `trustyai_garak::quick` for rapid iteration
290+
2. **Pre-production**: Run `trustyai_garak::standard` for necessary coverage
291+
3. **Production Readiness**: Execute full OWASP and AVID compliance scans
292+
4. **Continuous Monitoring**: Integrate security scans into CI/CD pipelines
293+
294+
=== Performance Optimization
295+
296+
[source,python]
297+
----
298+
# Optimize scan performance with parallel execution
299+
optimized_metadata = {
300+
"probes": ["dan", "promptinject", "encoding"],
301+
"parallel_attempts": 8, # Increase parallelism
302+
"timeout": 3600 # 1 hour timeout
303+
}
304+
----
305+
306+
== Advanced Usage
307+
308+
You can pass any of the following Garak command line arguments to the scan via the benchmark `metadata` parameter:
309+
310+
* `parallel_attempts`
311+
* `generations`
312+
* `seed`
313+
* `deprefix`
314+
* `eval_threshold`
315+
* `probe_tags`
316+
* `probe_options`
317+
* `detectors`
318+
* `extended_detectors`
319+
* `detector_options`
320+
* `buffs`
321+
* `buff_options`
322+
* `harness_options`
323+
* `taxonomy`
324+
* `generate_autodan`
325+
326+
Please refer to the Garak documentation for more details: https://reference.garak.ai/en/latest/cliref.html
327+
328+
== Troubleshooting
329+
330+
**Job stuck in 'scheduled' status:**
331+
332+
* Check if the inference endpoint is accessible
333+
* Verify model name matches your deployment
334+
* Review server logs for connection errors
335+
336+
**High memory usage during scans:**
337+
338+
* Reduce `parallel_attempts` in metadata
339+
* Lower `max_tokens` in sampling parameters
340+
* Monitor system resources during long-running scans
341+
342+
== Next Steps
343+
344+
Explore xref:garak-lls-shields.adoc[shield testing] for guardrail evaluation.

0 commit comments

Comments
 (0)