You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
k8s-bench | health probe troubleshooting eval task #145 (#163)
* feat: add health probe troubleshooting eval task
* fix: modify path for probes and use kubectl wait
* fix: refactor for consistency with other tasks, add probes exists check and decrease restarts to 1
* chore: decrease restart check to 1
* refactor: improve README & add missing flags for examples
|`--output-format`| Output format (markdown or json) | markdown | No |
82
+
|`--ignore-tool-use-shim`| Ignore tool use shim in result grouping | true | No |
83
+
|`--results-filepath`| Optional file path to write results to | - | No |
84
+
25
85
Running the benchmark with the `run` subcommand will produce results as below:
26
86
27
87
```sh
@@ -37,4 +97,4 @@ Task: scale-down-deployment
37
97
gemini-2.0-flash-thinking-exp-01-21: true
38
98
```
39
99
40
-
The `analyze` subcommand will gather the results from previous runs and display them in a tabular format with emoji indicators for success (✅) and failure (❌).
100
+
The `analyze` subcommand will gather the results from previous runs and display them in a tabular format with emoji indicators for success (✅) and failure (❌).
0 commit comments