Skip to content

Commit d9a5ba6

Browse files
authored
k8s-bench | health probe troubleshooting eval task #145 (#163)
* feat: add health probe troubleshooting eval task * fix: modify path for probes and use kubectl wait * fix: refactor for consistency with other tasks, add probes exists check and decrease restarts to 1 * chore: decrease restart check to 1 * refactor: improve README & add missing flags for examples
1 parent 8ae3880 commit d9a5ba6

File tree

5 files changed

+174
-2
lines changed

5 files changed

+174
-2
lines changed

k8s-bench/README.md

+62-2
Original file line numberDiff line numberDiff line change
@@ -8,10 +8,54 @@
88
```sh
99
# build the k8s-bench binary
1010
go build
11+
```
12+
13+
#### Run Subcommand
14+
15+
The `run` subcommand executes the benchmark evaluations.
16+
17+
```sh
18+
# Basic usage with mandatory output directory
19+
./k8s-bench run --agent-bin <path/to/kubectl-ai/binary> --output-dir .build/k8sbench
1120

1221
# Run evaluation for scale related tasks
13-
./k8s-bench run --agent-bin <path/to/kubectl-ai/binary> --task-pattern scale --kubeconfig <path/to/kubeconfig>
22+
./k8s-bench run --agent-bin <path/to/kubectl-ai/binary> --task-pattern scale --kubeconfig <path/to/kubeconfig> --output-dir .build/k8sbench
23+
24+
# Run evaluation for a specific LLM provider and model with tool use shim disabled
25+
./k8s-bench run --llm-provider=grok --models=grok-3-beta --agent-bin ../kubectl-ai --task-pattern=fix-probes --enable-tool-use-shim=false --output-dir .build/k8sbench
26+
27+
# Run evaluation with all available options
28+
./k8s-bench run \
29+
--agent-bin <path/to/kubectl-ai/binary> \
30+
--kubeconfig ~/.kube/config \
31+
--tasks-dir ./tasks \
32+
--task-pattern fix \
33+
--llm-provider gemini \
34+
--models gemini-2.5-pro-preview-03-25,gemini-1.5-pro-latest \
35+
--enable-tool-use-shim true \
36+
--quiet true \
37+
--output-dir .build/k8sbench
38+
```
1439

40+
#### Available flags for `run` subcommand:
41+
42+
| Flag | Description | Default | Required |
43+
|------|-------------|---------|----------|
44+
| `--agent-bin` | Path to kubectl-ai binary | - | Yes |
45+
| `--output-dir` | Directory to write results to | - | Yes |
46+
| `--tasks-dir` | Directory containing evaluation tasks | ./tasks | No |
47+
| `--kubeconfig` | Path to kubeconfig file | ~/.kube/config | No |
48+
| `--task-pattern` | Pattern to filter tasks (e.g. 'pod' or 'redis') | - | No |
49+
| `--llm-provider` | Specific LLM provider to evaluate (e.g. 'gemini' or 'ollama') | gemini | No |
50+
| `--models` | Comma-separated list of models to evaluate | gemini-2.5-pro-preview-03-25 | No |
51+
| `--enable-tool-use-shim` | Enable tool use shim | true | No |
52+
| `--quiet` | Quiet mode (non-interactive mode) | true | No |
53+
54+
#### Analyze Subcommand
55+
56+
The `analyze` subcommand processes results from previous runs:
57+
58+
```sh
1559
# Analyze previous evaluation results and output in markdown format (default)
1660
./k8s-bench analyze --input-dir .build/k8sbench
1761

@@ -20,8 +64,24 @@ go build
2064

2165
# Save analysis results to a file
2266
./k8s-bench analyze --input-dir .build/k8sbench --results-filepath ./results.md
67+
68+
# Analyze with all available options
69+
./k8s-bench analyze \
70+
--input-dir .build/k8sbench \
71+
--output-format markdown \
72+
--ignore-tool-use-shim true \
73+
--results-filepath ./detailed-analysis.md
2374
```
2475

76+
#### Available flags for `analyze` subcommand:
77+
78+
| Flag | Description | Default | Required |
79+
|------|-------------|---------|----------|
80+
| `--input-dir` | Directory containing evaluation results | - | Yes |
81+
| `--output-format` | Output format (markdown or json) | markdown | No |
82+
| `--ignore-tool-use-shim` | Ignore tool use shim in result grouping | true | No |
83+
| `--results-filepath` | Optional file path to write results to | - | No |
84+
2585
Running the benchmark with the `run` subcommand will produce results as below:
2686

2787
```sh
@@ -37,4 +97,4 @@ Task: scale-down-deployment
3797
gemini-2.0-flash-thinking-exp-01-21: true
3898
```
3999

40-
The `analyze` subcommand will gather the results from previous runs and display them in a tabular format with emoji indicators for success (✅) and failure (❌).
100+
The `analyze` subcommand will gather the results from previous runs and display them in a tabular format with emoji indicators for success (✅) and failure (❌).

k8s-bench/tasks/fix-probes/cleanup.sh

+6
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
#!/bin/bash
2+
3+
# Delete the namespace which will remove all resources created for this task
4+
kubectl delete namespace health-check --ignore-not-found
5+
6+
echo "Cleanup completed"

k8s-bench/tasks/fix-probes/setup.sh

+49
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,49 @@
1+
#!/bin/bash
2+
3+
# Delete namespace if exists and create a fresh one
4+
kubectl delete namespace health-check --ignore-not-found
5+
kubectl create namespace health-check
6+
7+
# Create a deployment with problematic health checks
8+
cat <<YAML | kubectl apply -f -
9+
apiVersion: apps/v1
10+
kind: Deployment
11+
metadata:
12+
name: webapp
13+
namespace: health-check
14+
spec:
15+
replicas: 1
16+
selector:
17+
matchLabels:
18+
app: webapp
19+
template:
20+
metadata:
21+
labels:
22+
app: webapp
23+
spec:
24+
containers:
25+
- name: webapp
26+
image: nginx:latest
27+
ports:
28+
- containerPort: 80
29+
# The problem: incorrect health probes causing restarts
30+
livenessProbe:
31+
httpGet:
32+
path: /get_status # Path doesn't exist
33+
port: 80
34+
initialDelaySeconds: 5
35+
periodSeconds: 5
36+
readinessProbe:
37+
httpGet:
38+
path: /is_ready # Path doesn't exist
39+
port: 80
40+
initialDelaySeconds: 5
41+
periodSeconds: 5
42+
YAML
43+
44+
# Create a service for the webapp
45+
kubectl create service clusterip webapp -n health-check --tcp=80:80
46+
47+
# Wait for the pod to start and begin restarting due to failed probes
48+
echo "Waiting for pod to start and begin failing health checks..."
49+
kubectl wait --for=condition=Available=False --timeout=30s deployment/webapp -n health-check || true

k8s-bench/tasks/fix-probes/task.yaml

+6
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
script:
2+
- prompt: "Please fix the health check issue with the deployment named 'webapp' in namespace 'health-check'"
3+
setup: "setup.sh"
4+
verifier: "verify.sh"
5+
cleanup: "cleanup.sh"
6+
difficulty: "medium"

k8s-bench/tasks/fix-probes/verify.sh

+51
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,51 @@
1+
#!/bin/bash
2+
3+
# Check if the pod is in Running state with Ready status
4+
echo "Checking if the pod is running and ready..."
5+
6+
# Wait up to 30 seconds for pod to become ready using kubectl wait
7+
if kubectl wait --for=condition=Ready pod -l app=webapp -n health-check --timeout=30s; then
8+
echo "Success: Pod is now Ready"
9+
10+
# Check if probes exist at all
11+
LIVENESS_EXISTS=$(kubectl get deploy webapp -n health-check -o jsonpath='{.spec.template.spec.containers[0].livenessProbe}')
12+
READINESS_EXISTS=$(kubectl get deploy webapp -n health-check -o jsonpath='{.spec.template.spec.containers[0].readinessProbe}')
13+
14+
if [ -z "$LIVENESS_EXISTS" ] || [ -z "$READINESS_EXISTS" ]; then
15+
echo "Failure: One or both probes have been removed completely."
16+
echo "Probes should be fixed, not removed."
17+
exit 1
18+
fi
19+
20+
# Get the current probe configurations
21+
LIVENESS_PATH=$(kubectl get deploy webapp -n health-check -o jsonpath='{.spec.template.spec.containers[0].livenessProbe.httpGet.path}')
22+
READINESS_PATH=$(kubectl get deploy webapp -n health-check -o jsonpath='{.spec.template.spec.containers[0].readinessProbe.httpGet.path}')
23+
24+
echo "Current liveness probe path: $LIVENESS_PATH"
25+
echo "Current readiness probe path: $READINESS_PATH"
26+
27+
# Verify the probes are not using the nonexistent paths and have valid paths set
28+
if [ "$LIVENESS_PATH" != "/get_status" ] && [ "$READINESS_PATH" != "/is_ready" ] && \
29+
[ ! -z "$LIVENESS_PATH" ] && [ ! -z "$READINESS_PATH" ]; then
30+
echo "Success: Both probe paths have been fixed"
31+
32+
# Check if pod is stable with no recent restarts
33+
RESTARTS=$(kubectl get pods -n health-check -l app=webapp -o jsonpath='{.items[0].status.containerStatuses[0].restartCount}')
34+
if [ "$RESTARTS" -lt 1 ]; then
35+
echo "Success: Pod is stable with acceptable number of restarts"
36+
exit 0
37+
else
38+
echo "Failure: Pod has too many restarts: $RESTARTS"
39+
exit 1
40+
fi
41+
else
42+
echo "Failure: One or both probe paths are still incorrect or missing:"
43+
echo "Liveness path: $LIVENESS_PATH"
44+
echo "Readiness path: $READINESS_PATH"
45+
exit 1
46+
fi
47+
else
48+
echo "Failure: Pod is not Ready after waiting"
49+
kubectl get pods -n health-check -l app=webapp
50+
exit 1
51+
fi

0 commit comments

Comments
 (0)