Skip to content

Help needed: We need more evals #145

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
droot opened this issue May 6, 2025 · 4 comments
Open

Help needed: We need more evals #145

droot opened this issue May 6, 2025 · 4 comments
Labels
good first issue Good for newcomers

Comments

@droot
Copy link
Member

droot commented May 6, 2025

kubectl-ai has 10 eval tasks today covering different areas. We want to improve the evals coverage. The goal is to get the eval coverage closer to realistic scenario that users run into.

This is an area where we need input from the community. Share the scenarios that you or your team run into very regularly.

Adding an eval task to k8s-bench is very simple, you can see existing tasks.

Ask:

  • Share a scenario that we can add eval for
  • Better, send a PR with the eval task following an example under k8s-bench directory.

/cc @selimacerbas @rxinui @cwrau @hakman @mattn @DoctorLai @mschneider82 @tuannvm

@droot
Copy link
Member Author

droot commented May 6, 2025

/cc @justinsb

@droot droot added the good first issue Good for newcomers label May 6, 2025
@tuannvm
Copy link
Contributor

tuannvm commented May 7, 2025

@droot I can take on evals for openai

#157

@Dinesh-0813
Copy link

Hi @droot, I would like to contribute a new eval task: list-services-kube-system.
It will check if the AI correctly returns kubectl get svc -n kube-system.

Please let me know if I can go ahead. I will follow the task structure inside k8s-bench/tasks/.

Thanks!

@tuannvm
Copy link
Contributor

tuannvm commented May 8, 2025

It would be awesome if we could also add these features:

  • Automatically run an evaluation on GitHub Actions whenever new models are added, released, or updated.
  • Display the evaluation results back on k8s-bench.md.
  • Use GitHub Actions with Kind to run a Kubernetes cluster in the CI environment.

Ah it's here! https://github.com/GoogleCloudPlatform/kubectl-ai/pull/125/files

droot pushed a commit that referenced this issue May 8, 2025
* feat: add health probe troubleshooting eval task

* fix: modify path for probes and use kubectl wait

* fix: refactor for consistency with other tasks, add probes exists check and decrease restarts to 1

* chore: decrease restart check to 1

* refactor: improve README & add missing flags for examples
rxinui pushed a commit to rxinui/kubectl-ai that referenced this issue May 10, 2025
…m#145 (GoogleCloudPlatform#163)

* feat: add health probe troubleshooting eval task

* fix: modify path for probes and use kubectl wait

* fix: refactor for consistency with other tasks, add probes exists check and decrease restarts to 1

* chore: decrease restart check to 1

* refactor: improve README & add missing flags for examples
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests

3 participants