Help needed: We need more evals #145

droot · 2025-05-06T02:16:34Z

kubectl-ai has 10 eval tasks today covering different areas. We want to improve the evals coverage. The goal is to get the eval coverage closer to realistic scenario that users run into.

This is an area where we need input from the community. Share the scenarios that you or your team run into very regularly.

Adding an eval task to k8s-bench is very simple, you can see existing tasks.

Ask:

Share a scenario that we can add eval for
Better, send a PR with the eval task following an example under k8s-bench directory.

/cc @selimacerbas @rxinui @cwrau @hakman @mattn @DoctorLai @mschneider82 @tuannvm

The text was updated successfully, but these errors were encountered:

droot · 2025-05-06T02:25:09Z

/cc @justinsb

tuannvm · 2025-05-07T15:13:04Z

@droot I can take on evals for openai

#157

Dinesh-0813 · 2025-05-07T17:45:06Z

Hi @droot, I would like to contribute a new eval task: list-services-kube-system.
It will check if the AI correctly returns kubectl get svc -n kube-system.

Please let me know if I can go ahead. I will follow the task structure inside k8s-bench/tasks/.

Thanks!

tuannvm · 2025-05-08T05:44:20Z

It would be awesome if we could also add these features:

Automatically run an evaluation on GitHub Actions whenever new models are added, released, or updated.
Display the evaluation results back on k8s-bench.md.
Use GitHub Actions with Kind to run a Kubernetes cluster in the CI environment.

Ah it's here! https://github.com/GoogleCloudPlatform/kubectl-ai/pull/125/files

* feat: add health probe troubleshooting eval task * fix: modify path for probes and use kubectl wait * fix: refactor for consistency with other tasks, add probes exists check and decrease restarts to 1 * chore: decrease restart check to 1 * refactor: improve README & add missing flags for examples

…m#145 (GoogleCloudPlatform#163) * feat: add health probe troubleshooting eval task * fix: modify path for probes and use kubectl wait * fix: refactor for consistency with other tasks, add probes exists check and decrease restarts to 1 * chore: decrease restart check to 1 * refactor: improve README & add missing flags for examples

droot added the good first issue Good for newcomers label May 6, 2025

tuannvm mentioned this issue May 7, 2025

Eval: Add OpenAI Benchmark #157

Open

This was referenced May 7, 2025

[Feature Request] goroutines for k8s-bench task execution #158

Closed

Feature: Allow all k8s-bench tasks be executed sequentially #159

Closed

tuannvm mentioned this issue May 8, 2025

feat(k8s-bench): add scripts for HPA, rolling update, StatefulSet tasks #170

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Help needed: We need more evals #145

Help needed: We need more evals #145

droot commented May 6, 2025

droot commented May 6, 2025

tuannvm commented May 7, 2025 •

edited

Loading

Dinesh-0813 commented May 7, 2025

tuannvm commented May 8, 2025 •

edited

Loading

Help needed: We need more evals #145

Help needed: We need more evals #145

Comments

droot commented May 6, 2025

droot commented May 6, 2025

tuannvm commented May 7, 2025 • edited Loading

Dinesh-0813 commented May 7, 2025

tuannvm commented May 8, 2025 • edited Loading

tuannvm commented May 7, 2025 •

edited

Loading

tuannvm commented May 8, 2025 •

edited

Loading