From ae2991bcff81f00a56a3afed7a2419592ef187fb Mon Sep 17 00:00:00 2001 From: Brandon Salmon Date: Mon, 6 Oct 2025 19:19:53 +0000 Subject: [PATCH 1/2] First draft of KEP-5620: Node resizing using balloons. --- keps/sig-node/5620-resizing-balloon/README.md | 812 ++++++++++++++++++ keps/sig-node/5620-resizing-balloon/kep.yaml | 45 + 2 files changed, 857 insertions(+) create mode 100644 keps/sig-node/5620-resizing-balloon/README.md create mode 100644 keps/sig-node/5620-resizing-balloon/kep.yaml diff --git a/keps/sig-node/5620-resizing-balloon/README.md b/keps/sig-node/5620-resizing-balloon/README.md new file mode 100644 index 00000000000..70810256ad4 --- /dev/null +++ b/keps/sig-node/5620-resizing-balloon/README.md @@ -0,0 +1,812 @@ + +# KEP-5620: Node resizing using balloons + + + + + + +- [Release Signoff Checklist](#release-signoff-checklist) +- [Summary](#summary) +- [Motivation](#motivation) + - [Goals](#goals) + - [Non-Goals](#non-goals) +- [Proposal](#proposal) + - [User Stories (Optional)](#user-stories-optional) + - [Story 1](#story-1) + - [Story 2](#story-2) + - [Notes/Constraints/Caveats (Optional)](#notesconstraintscaveats-optional) + - [Risks and Mitigations](#risks-and-mitigations) +- [Design Details](#design-details) + - [Test Plan](#test-plan) + - [Prerequisite testing updates](#prerequisite-testing-updates) + - [Unit tests](#unit-tests) + - [Integration tests](#integration-tests) + - [e2e tests](#e2e-tests) + - [Graduation Criteria](#graduation-criteria) + - [Upgrade / Downgrade Strategy](#upgrade--downgrade-strategy) + - [Version Skew Strategy](#version-skew-strategy) +- [Production Readiness Review Questionnaire](#production-readiness-review-questionnaire) + - [Feature Enablement and Rollback](#feature-enablement-and-rollback) + - [Rollout, Upgrade and Rollback Planning](#rollout-upgrade-and-rollback-planning) + - [Monitoring Requirements](#monitoring-requirements) + - [Dependencies](#dependencies) + - [Scalability](#scalability) + - [Troubleshooting](#troubleshooting) +- [Implementation History](#implementation-history) +- [Drawbacks](#drawbacks) +- [Alternatives](#alternatives) +- [Infrastructure Needed (Optional)](#infrastructure-needed-optional) + + +## Release Signoff Checklist + + + +Items marked with (R) are required *prior to targeting to a milestone / release*. + +- [ ] (R) Enhancement issue in release milestone, which links to KEP dir in [kubernetes/enhancements] (not the initial KEP PR) +- [ ] (R) KEP approvers have approved the KEP status as `implementable` +- [ ] (R) Design details are appropriately documented +- [ ] (R) Test plan is in place, giving consideration to SIG Architecture and SIG Testing input (including test refactors) + - [ ] e2e Tests for all Beta API Operations (endpoints) + - [ ] (R) Ensure GA e2e tests meet requirements for [Conformance Tests](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/conformance-tests.md) + - [ ] (R) Minimum Two Week Window for GA e2e tests to prove flake free +- [ ] (R) Graduation criteria is in place + - [ ] (R) [all GA Endpoints](https://github.com/kubernetes/community/pull/1806) must be hit by [Conformance Tests](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/conformance-tests.md) within one minor version of promotion to GA +- [ ] (R) Production readiness review completed +- [ ] (R) Production readiness review approved +- [ ] "Implementation History" section is up-to-date for milestone +- [ ] User-facing documentation has been created in [kubernetes/website], for publication to [kubernetes.io] +- [ ] Supporting documentation—e.g., additional design documents, links to mailing list discussions/SIG meetings, relevant PRs/issues, release notes + + + +[kubernetes.io]: https://kubernetes.io/ +[kubernetes/enhancements]: https://git.k8s.io/enhancements +[kubernetes/kubernetes]: https://git.k8s.io/kubernetes +[kubernetes/website]: https://git.k8s.io/website + +## Summary + +We provide a resizable node through the use of balloon pods, which acquire, but do not use, resources to ensure that pods on +the node cannot consume more than the current size of the node. This allows us to effectively use IPPR to resize +the node, in addition to using it to resize pods. To decrease the size of a given node, an agent in the system can +upsize the balloon pod on the given node. To increase the size of a given node, an agent in the sytem can downsize +the balloon pod on the given node, allowing other pods to consume more resources on the node itself. + +## Motivation + +There are several use cases for "resizable nodes". These include dynamic resizing used on top of cloud VMs and testing use cases +where users would like to easily place multiple kubelets on the same physical host without overloading the host. By using a balloon pod +we make it easy to quickly increase and decrease the resources allocate to a node in realtime. We also leverage our existing working on +pod resource resizing to handle node resizing; pod resizing already needs to handle the various races involved in resizing, we can +simply leverage that work here rather than needing to re-invent mechanism to handle it. + +Note that while this feature is superficially simlar to hotplug, it is orthogonal and complementary to hotplug functionality. Hotplug provides the ability to physically add new hardware to an existing node, which this approach does not provide, but hotplug does not provide an easy way to downsize, since hotunplug of memory in particular, is not easily supportable. Balloons and hotplug can easily coexist; hotplug updates the actual resources available on the underlying machine, and should transparently update the size of a node that is using balloons. Similarly, upsizing and downsizing a node with fixed hardware using a balloon should work transparently in parallel with hotplug operations. + +### Goals + + - Provide autoscalers the ability to upsize and downsize Kubernetes nodes in concert with the mechanisms provided by the underlying cloud provider. + - Provide an easy way for users to run multiple Kubelets on the same physical host without overloading the underlying host hardware. + +### Non-Goals + + - We do not attempt to provide a way to add underlying hardware in this KEP; this can be done either through hotplug or through mechanisms provided by the cloud provider themselves. + +## Proposal + +We propose that Kubernetes provide an official "balloon pod" concept. Balloon pods are deployed to each node in the sytem using a daemon set controller which can be enabled or disabled by administrators on a given cluster. Balloon pods claim resources using requests, but then do not consume them. This allows the underlying infrastructure to provide only the resources that are unused by the balloons, in whatever way the infrastructure feels appropriate. + +To resize a given node, a scaling component can simply invoke in-place-pod-updates on the resources for the balloon pod. To "upsize" the node the component would "downsize" the balloon pod to release more resources to the pods on the node. To "downsize" the node the component would "upsize" the ballon pod to remove resources from the pool available to the pods on the node. + +The only distinction between a balloon pod and a standard pod will be annnotations and priority. Balloon pods should get annotations that can be read by system components and used to distinguish between resources "in-use" by normal pods and resources that "don't exist" (used by balloon pods). In addition, balloon pods will have a special priority which makes them as unpreemptable as system-node-critical pods, but unable to preempt other pods when upsized, to ensure that it does not disrupt running customer pods. + +### User Stories (Optional) + +#### Autoscaling driven VM resizing + +A cloud provider would like to provide dynamically resizable Kubernetes nodes. The cloud provider creates a way to manage the resources provided to a particular Kubelet. By enabling the balloon pods and linking the management of the balloon pod sizes with the underlying resources available to the Kubelet host, the cloud provider can upsize and downsize the Kubernetes nodes to the Kubernetes system without having to involve Kubernetes in the specifics of the cloud resizing mechanism. + +#### Multiple Kubelets per node for testing + +A user who is testing some Kubernetes feature would like to run many Kubelets on the same host to decrease the amount of resources needed to test scenarios with large numbers of Kubelets. By enabling balloons and then resizing the balloons to ensure each Kubelet only consumes one Nth of the host, the customer can place N Kubelets on the same host without overloading the host itself. + +### Risks and Mitigations + + + +## Design Details + +In general balloon pods will look just like any other pod. They will be scheduled by a daemon set controller and have reservations and limits like any other pod. They will run in the kube-system namespace as system components. The two distinctions are as follows: + + - Balloon pods will be labelled so that monitoring tools can know that the space consumed by the balloon pod "doesn't exist" instead of being "space consumed by a workload". This will be a well known label and potentially require updates to the statistics collection in the Kubernetes master nodes. + - Balloon pods will run at a special priority level (system-balloon) which mostly acts like system-node-critical, but instead of preempting other pods on upsize, balloon pods will fail to upsize. This distinction is critical because while the balloon pod should never be preempted (since the underlying capacity "doesn't exist"), we upsize them when we are reclaiming unused space, so if another pod ends up using this space before we reclaim it, we don't want to pre-empt the pod, which would impact a runnin workload, we'd rather fail the reclaim, since we were clearly wrong about it not being needed. + +## Metrics details + +We will reach out the appropriate sigs for metrics to determine the pod labels should be consumed. We will start with sig-instruments + +## Priorities + +We will add a new priority (system-node-balloon?) for balloon pods and ensure the priority matches the use cases we care about for balloon pods. In general we would like: + + - For the balloon pods to be considered the same priority as system-node-critical in terms of preemption from other pods. + - For the balloon pods to not preempt other pods when we attempt to upsize them. + +### Test Plan + + + +[ ] I/we understand the owners of the involved components may require updates to +existing tests to make this code solid enough prior to committing the changes necessary +to implement this enhancement. + +##### Prerequisite testing updates + + + +##### Unit tests + + + + + +- ``: `` - `` + +##### Integration tests + + + + + +- [test name](https://github.com/kubernetes/kubernetes/blob/2334b8469e1983c525c0c6382125710093a25883/test/integration/...): [integration master](https://testgrid.k8s.io/sig-release-master-blocking#integration-master?include-filter-by-regex=MyCoolFeature), [triage search](https://storage.googleapis.com/k8s-triage/index.html?test=MyCoolFeature) + +##### e2e tests + + + +- [test name](https://github.com/kubernetes/kubernetes/blob/2334b8469e1983c525c0c6382125710093a25883/test/e2e/...): [SIG ...](https://testgrid.k8s.io/sig-...?include-filter-by-regex=MyCoolFeature), [triage search](https://storage.googleapis.com/k8s-triage/index.html?test=MyCoolFeature) + +### Graduation Criteria + + + +### Upgrade / Downgrade Strategy + + + +### Version Skew Strategy + + + +## Production Readiness Review Questionnaire + + + +### Feature Enablement and Rollback + + + +###### How can this feature be enabled / disabled in a live cluster? + + + +- [ ] Feature gate (also fill in values in `kep.yaml`) + - Feature gate name: + - Components depending on the feature gate: +- [ ] Other + - Describe the mechanism: + - Will enabling / disabling the feature require downtime of the control + plane? + - Will enabling / disabling the feature require downtime or reprovisioning + of a node? + +###### Does enabling the feature change any default behavior? + + + +###### Can the feature be disabled once it has been enabled (i.e. can we roll back the enablement)? + + + +###### What happens if we reenable the feature if it was previously rolled back? + +###### Are there any tests for feature enablement/disablement? + + + +### Rollout, Upgrade and Rollback Planning + + + +###### How can a rollout or rollback fail? Can it impact already running workloads? + + + +###### What specific metrics should inform a rollback? + + + +###### Were upgrade and rollback tested? Was the upgrade->downgrade->upgrade path tested? + + + +###### Is the rollout accompanied by any deprecations and/or removals of features, APIs, fields of API types, flags, etc.? + + + +### Monitoring Requirements + + + +###### How can an operator determine if the feature is in use by workloads? + + + +###### How can someone using this feature know that it is working for their instance? + + + +- [ ] Events + - Event Reason: +- [ ] API .status + - Condition name: + - Other field: +- [ ] Other (treat as last resort) + - Details: + +###### What are the reasonable SLOs (Service Level Objectives) for the enhancement? + + + +###### What are the SLIs (Service Level Indicators) an operator can use to determine the health of the service? + + + +- [ ] Metrics + - Metric name: + - [Optional] Aggregation method: + - Components exposing the metric: +- [ ] Other (treat as last resort) + - Details: + +###### Are there any missing metrics that would be useful to have to improve observability of this feature? + + + +### Dependencies + + + +###### Does this feature depend on any specific services running in the cluster? + + + +### Scalability + + + +###### Will enabling / using this feature result in any new API calls? + + + +###### Will enabling / using this feature result in introducing new API types? + + + +###### Will enabling / using this feature result in any new calls to the cloud provider? + + + +###### Will enabling / using this feature result in increasing size or count of the existing API objects? + + + +###### Will enabling / using this feature result in increasing time taken by any operations covered by existing SLIs/SLOs? + + + +###### Will enabling / using this feature result in non-negligible increase of resource usage (CPU, RAM, disk, IO, ...) in any components? + + + +###### Can enabling / using this feature result in resource exhaustion of some node resources (PIDs, sockets, inodes, etc.)? + + + +### Troubleshooting + + + +###### How does this feature react if the API server and/or etcd is unavailable? + +###### What are other known failure modes? + + + +###### What steps should be taken if SLOs are not being met to determine the problem? + +## Implementation History + + + +## Drawbacks + + + +## Alternatives + + + +## Infrastructure Needed (Optional) + + diff --git a/keps/sig-node/5620-resizing-balloon/kep.yaml b/keps/sig-node/5620-resizing-balloon/kep.yaml new file mode 100644 index 00000000000..79e42f484ad --- /dev/null +++ b/keps/sig-node/5620-resizing-balloon/kep.yaml @@ -0,0 +1,45 @@ +title: Node resizing via balloons +kep-number: 5620 +authors: + - "@bwsalmon" +owning-sig: sig-node +participating-sigs: +status: provisional +creation-date: 2025-10-06 +reviewers: + - "@liggitt" +approvers: + - "@dawnchen" + +see-also: + - keps/sig-node/1287-in-place-update-pod-resources + - keps/sig-node/3953-node-resource-hot-plug + +# The target maturity stage in the current dev cycle for this KEP. +# If the purpose of this KEP is to deprecate a user-visible feature +# and a Deprecated feature gates are added, they should be deprecated|disabled|removed. +stage: alpha + +# The most recent milestone for which work toward delivery of this KEP has been +# done. This can be the current (upcoming) milestone, if it is being actively +# worked on. +latest-milestone: "v1.35" + +# The milestone at which this feature was, or is targeted to be, at each stage. +milestone: + alpha: "v1.36" + beta: "v1.37" + stable: "v1.38" + +# The following PRR answers are required at alpha release +# List the feature gate name and the components for which it must be enabled +feature-gates: + - name: MyFeature + components: + - kube-apiserver + - kube-controller-manager +disable-supported: true + +# The following PRR answers are required at beta release +metrics: + - my_feature_metric From 9f84d09d4fe840abae63aff4dff357a6cbc2dd2e Mon Sep 17 00:00:00 2001 From: Brandon Salmon Date: Mon, 6 Oct 2025 19:30:04 +0000 Subject: [PATCH 2/2] Fix typeo. --- keps/sig-node/5620-resizing-balloon/README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/keps/sig-node/5620-resizing-balloon/README.md b/keps/sig-node/5620-resizing-balloon/README.md index 70810256ad4..ecd9179a244 100644 --- a/keps/sig-node/5620-resizing-balloon/README.md +++ b/keps/sig-node/5620-resizing-balloon/README.md @@ -227,7 +227,7 @@ In general balloon pods will look just like any other pod. They will be schedule ## Metrics details -We will reach out the appropriate sigs for metrics to determine the pod labels should be consumed. We will start with sig-instruments +We will reach out the appropriate sigs for metrics to determine the pod labels should be consumed. We will start with sig-instrumentation. ## Priorities