Skip to content

Support building e2e Kind node images for Kubernetes patch releases#752

Open
yankay wants to merge 1 commit intokubernetes-sigs:mainfrom
yankay:fix-Failing-1.35
Open

Support building e2e Kind node images for Kubernetes patch releases#752
yankay wants to merge 1 commit intokubernetes-sigs:mainfrom
yankay:fix-Failing-1.35

Conversation

@yankay
Copy link
Copy Markdown
Member

@yankay yankay commented Feb 4, 2026

What type of PR is this?

/kind failing-test

What this PR does / why we need it

Adds a Makefile target and E2E_KIND_BUILD_NODE_IMAGE_VERSION knob so e2e jobs can build a Kind node image on demand via kind build node-image, mirroring Kueue's pattern.

This is needed because the StatefulSet Parallel regression in v1.35.0/v1.35.1 (kubernetes/kubernetes#137409) is fixed in v1.35.4, but kindest/node:v1.35.4 is not published (kubernetes-sigs/kind#4131). When the new var is unset, behavior is unchanged.

Which issue(s) this PR fixes

Part of #751

Special notes for your reviewer

Follow-up kubernetes/test-infra PR can set lws Prow jobs to:

```yaml

  • name: E2E_KIND_VERSION
    value: kindest/node:v1.35.4
  • name: E2E_KIND_BUILD_NODE_IMAGE_VERSION
    value: v1.35.4
    ```

Validated with `make -n test-e2e` (root and `disaggregatedset`) and a local `kindest/node:v1.35.4` build.

Does this PR introduce a user-facing change?

```release-note
NONE
```

Copilot AI review requested due to automatic review settings February 4, 2026 12:21
@k8s-ci-robot k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. kind/failing-test Categorizes issue or PR as related to a consistently or frequently failing test. labels Feb 4, 2026
@netlify
Copy link
Copy Markdown

netlify Bot commented Feb 4, 2026

Deploy Preview for kubernetes-sigs-lws ready!

Name Link
🔨 Latest commit ae7025f
🔍 Latest deploy log https://app.netlify.com/projects/kubernetes-sigs-lws/deploys/69f0a43b03b4c100085c638d
😎 Deploy Preview https://deploy-preview-752--kubernetes-sigs-lws.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@k8s-ci-robot
Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: yankay
Once this PR has been reviewed and has the lgtm label, please assign kerthcet for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added size/S Denotes a PR that changes 10-29 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Feb 4, 2026
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates Kubernetes-related Go module dependencies to address failing tests when running against Kubernetes 1.35.

Changes:

  • Bumps sigs.k8s.io/controller-runtime from v0.23.0 to v0.23.1.
  • Bumps sigs.k8s.io/structured-merge-diff/v6 from v6.3.1 to v6.3.2-0.20260122202528-d9cc6641c482.
  • Updates go.sum entries to align with the new dependency versions.

Reviewed changes

Copilot reviewed 1 out of 2 changed files in this pull request and generated no comments.

File Description
go.mod Updates controller-runtime and structured-merge-diff dependencies to newer versions compatible with Kubernetes 1.35.
go.sum Synchronizes checksum entries with the updated dependency versions from go.mod.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@yankay yankay force-pushed the fix-Failing-1.35 branch 2 times, most recently from 7c94439 to 58b849a Compare February 5, 2026 03:12
@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Feb 5, 2026
@k8s-ci-robot k8s-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Feb 5, 2026
@k8s-ci-robot k8s-ci-robot added needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Feb 6, 2026
@k8s-ci-robot k8s-ci-robot added size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. and removed needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Feb 6, 2026
@yankay
Copy link
Copy Markdown
Member Author

yankay commented Feb 9, 2026

/retest

@yankay yankay mentioned this pull request Apr 7, 2026
@yankay yankay force-pushed the fix-Failing-1.35 branch from d06f3e6 to abdc4cb Compare April 7, 2026 05:53
@yankay yankay changed the title [WIP]Fix Failing Tests in kube 1.35 Disable MaxUnavailableStatefulSet feature gate in e2e Kind cluster Apr 7, 2026
@k8s-ci-robot k8s-ci-robot added size/S Denotes a PR that changes 10-29 lines, ignoring generated files. and removed size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. labels Apr 7, 2026
@yankay
Copy link
Copy Markdown
Member Author

yankay commented Apr 7, 2026

Updated the approach: instead of downgrading to kindest/node:v1.34.0, this now keeps v1.35.0 and explicitly disables the MaxUnavailableStatefulSet feature gate in the Kind cluster config.

Root cause: kubernetes/kubernetes#137409MaxUnavailableStatefulSet (Beta, on by default in 1.35) broke StatefulSet Parallel pod management. Upstream fix: kubernetes/kubernetes#137904.

/retest

@yankay yankay force-pushed the fix-Failing-1.35 branch 2 times, most recently from 0b5076f to 5066b8b Compare April 7, 2026 05:57
@yankay yankay changed the title Disable MaxUnavailableStatefulSet feature gate in e2e Kind cluster [WIP]Disable MaxUnavailableStatefulSet feature gate in e2e Kind cluster Apr 7, 2026
@k8s-ci-robot k8s-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Apr 7, 2026
@k8s-ci-robot k8s-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Apr 27, 2026
@yankay yankay changed the title [WIP]Disable MaxUnavailableStatefulSet feature gate in e2e Kind cluster Support building e2e Kind node images for Kubernetes patch releases Apr 27, 2026
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Apr 27, 2026
Lets e2e jobs build a Kubernetes patch-release Kind node image on
demand before running tests, by adding a Makefile-driven
kind-node-image-build target controlled via E2E_KIND_BUILD_NODE_IMAGE_VERSION.

When E2E_KIND_BUILD_NODE_IMAGE_VERSION is unset, behavior is unchanged
and the image referenced by E2E_KIND_VERSION is pulled as before. When
set, the target invokes "kind build node-image <version> --image
<E2E_KIND_VERSION>" only if the target image is not already present
locally, which lets CI test against a patch release for which no
kindest/node image has been published yet (for example v1.35.4, see
kubernetes-sigs/kind#4131).

This is wired into the existing test-e2e, test-e2e-cert-manager,
test-e2e-gang-scheduling-volcano, and disaggregatedset/test-e2e
targets, mirroring the on-demand build pattern used by Kueue.

Part of kubernetes-sigs#751

Signed-off-by: Kay Yan <kay.yan@daocloud.io>
@yankay yankay force-pushed the fix-Failing-1.35 branch from 0351fc9 to ae7025f Compare April 28, 2026 12:12
@yankay
Copy link
Copy Markdown
Member Author

yankay commented Apr 28, 2026

/retest

@k8s-ci-robot
Copy link
Copy Markdown
Contributor

k8s-ci-robot commented Apr 28, 2026

@yankay: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-lws-test-e2e-main-1-35 d06f3e6 link true /test pull-lws-test-e2e-main-1-35

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@yankay
Copy link
Copy Markdown
Member Author

yankay commented Apr 29, 2026

/test pull-lws-test-e2e-main-1-34

@yankay
Copy link
Copy Markdown
Member Author

yankay commented Apr 29, 2026

/test pull-lws-test-e2e-main-1-34

@yankay
Copy link
Copy Markdown
Member Author

yankay commented Apr 29, 2026

Hi @Edwinhr716, could you take a look when you have a chance? This unblocks e2e against the v1.35 StatefulSet Parallel regression fix. Thanks!

@yankay
Copy link
Copy Markdown
Member Author

yankay commented May 3, 2026

Friendly ping @ahg-g @Edwinhr716 — could one of you take a look when you have a moment?

All CI is green (1.32 / 1.33 / 1.34 e2e + integration + unit). This unblocks e2e against the upstream v1.35 StatefulSet Parallel regression by disabling the MaxUnavailableStatefulSet feature gate in the Kind cluster config (root cause: kubernetes/kubernetes#137409, fix: kubernetes/kubernetes#137904).

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/failing-test Categorizes issue or PR as related to a consistently or frequently failing test. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants