-
Notifications
You must be signed in to change notification settings - Fork 1.6k
🐛 (go/v4): Fix flaky metrics e2e tests when webhooks are scaffolded #5204
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
🐛 (go/v4): Fix flaky metrics e2e tests when webhooks are scaffolded #5204
Conversation
527e712 to
e6f2678
Compare
e6f2678 to
ff7ca56
Compare
ff7ca56 to
2441784
Compare
|
Hi @mayuka-c Could you help review this one? |
2441784 to
15a5512
Compare
mayuka-c
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
This looks good to me since the controller pod readiness guarantees that the webhook server is ready. The approach which I had taken directly checks for webhook readiness which has additional overhead of defining and maintaining markers so that the webhook readiness checks only runs on PROJECT which has webhooks configured.
Thank you so much @camilamacedo86 :)
|
@mayuka-c: changing LGTM is restricted to collaborators In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: camilamacedo86, mayuka-c The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
15a5512 to
a233eab
Compare
a233eab to
46d9e08
Compare
46d9e08 to
56839b4
Compare
| g.Expect(err).NotTo(HaveOccurred(), "Webhook endpoints should exist") | ||
| g.Expect(output).ShouldNot(BeEmpty(), "Webhook endpoints not yet ready") | ||
| } | ||
| Eventually(verifyWebhookEndpointsReady, 3*time.Minute, time.Second).Should(Succeed()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mayuka-c that is the code that we must to inject.
Without it we still risking the flakes
Why?
Without waiting for the webhook service to publish endpoints, the curl pod can hit the validating webhook while it’s still initializing, producing the same “connection refused” failure. The EndpointSlice check is what gives us a deterministic signal that the webhook server is actually accepting traffic before we launch the curl metrics pod.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes this would be required. I mean the approach which was taken here (To have the marker to check for webhook readiness).
Sorry, if I caused some misunderstandings here :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
Hi @mayuka-c Sorry, I changed it to do more tests and forgot to clarify. |
56839b4 to
dd71cd9
Compare
Projects with webhooks may experience flaky e2e tests due to webhook server not being ready when the metrics test creates the curl-metrics pod. Co-authored-by: Mayuka Channankaiah <github@users.noreply.github.com>
dd71cd9 to
129d438
Compare

Problem
When the default e2e suite is run against a project that scaffolds admission
webhooks, the "metrics" test can fail with errors such as:
The failure happens when the test launches the
curl-metricspod to verify themetrics endpoint.
Root Cause
The original test assumed that, once the controller pod reported
Running, thewebhook server was also ready to admit new pods. In reality:
Running.Checking the controller logs for "Serving metrics server" does not guarantee the webhook server is listening, so the race persisted.
Solution
Gate the curl pod creation on two readiness checks:
Readycondition.Controller readiness:
Webhook readiness (injected only when the project wires webhooks):
A new scaffold marker (
+kubebuilder:scaffold:e2e-metrics-webhooks-readiness) adds the EndpointSlice check only for webhook-enabled projects, keeping the base template generic.Why this works
Readycondition requires all containers, volumes (including the webhookcert secret), and readiness probes to succeed, so the controller and its webhook server have finished initialising.
Endpointsobjects.Notes on Kubernetes 1.33+
Kubernetes 1.33+ may exhibit a brief delay before the metrics endpoint becomes
available when controller-runtime's
WithAuthenticationAndAuthorization()isused with self-signed certificates. The readiness sequence above absorbs that
startup delay, so the curl pod is only launched after Kubernetes itself signals
that both the controller pod and webhook service are ready.
Closes: #5138 #5137