From 6bbff4372ed473ff8bcaaea9e5cf75c1816083f3 Mon Sep 17 00:00:00 2001 From: Riccardo Ravaioli Date: Thu, 9 Oct 2025 11:06:49 +0200 Subject: [PATCH] Enhancement for no-overlay mode in Openshift Signed-off-by: Riccardo Ravaioli --- enhancements/network/no_overlay_mode.md | 530 ++++++++++++++++++++++++ 1 file changed, 530 insertions(+) create mode 100644 enhancements/network/no_overlay_mode.md diff --git a/enhancements/network/no_overlay_mode.md b/enhancements/network/no_overlay_mode.md new file mode 100644 index 0000000000..74e81f01a4 --- /dev/null +++ b/enhancements/network/no_overlay_mode.md @@ -0,0 +1,530 @@ +--- +title: no-overlay-mode +authors: + - Riccardo Ravaioli +reviewers: + - Peng Liu +approvers: + - TBD +api-approvers: # In case of new or modified APIs or API extensions (CRDs, aggregated apiservers, webhooks, finalizers). If there is no API change, use "None" + - TBD +creation-date: 2025-07-21 +last-updated: 2025-07-21 +tracking-link: # link to the tracking ticket (for example: Jira Feature or Epic ticket) that corresponds to this enhancement + - https://issues.redhat.com/browse/CORENET-6133 +see-also: + - https://github.com/ovn-kubernetes/ovn-kubernetes/pull/5289 + +--- + +# No-overlay mode + +## Summary + +This enhancement describes how the no-overlay mode for ovn-kubernetes integrates in openshift. The feature allows pods in selected networks to communicate using the underlay network, without the overhead of Geneve encapsulation that we use to build the overlay network. The no-overlay mode is described in detail in an OVN-Kubernetes upstream enhancement: https://github.com/ovn-kubernetes/ovn-kubernetes/pull/5289 . This document outlines the necessary API changes, the interaction with the Cluster Network Operator (CNO) and our test plan for this feature. + +## Motivation + +The motivations for this feature are to be found in the original upstream enhancement: https://github.com/ovn-kubernetes/ovn-kubernetes/pull/5289 + + +### User Stories +Replace commit hash with "master" once the PR merges: +https://github.com/ovn-kubernetes/ovn-kubernetes/blob/bf3ec6cc1288971a95d95ca22646249e5f19fb6b/docs/okeps/okep-5259-no-overlay.md#user-storiesuse-cases + +### Goals +https://github.com/ovn-kubernetes/ovn-kubernetes/blob/bf3ec6cc1288971a95d95ca22646249e5f19fb6b/docs/okeps/okep-5259-no-overlay.md#goals + +Currently, BGP is only supported on Bare Metal clusters. Since no-overlay mode requires BGP, it shares this limitation for now. As BGP support expands to more platforms, so will support for no-overlay mode. + +### Non-Goals +https://github.com/ovn-kubernetes/ovn-kubernetes/blob/bf3ec6cc1288971a95d95ca22646249e5f19fb6b/docs/okeps/okep-5259-no-overlay.md#goals + +## Proposal + + + + + + + + + + + + + +The no-overlay feature largely leverages the existing BGP functionality in OVN-Kubernetes and only needs few API changes. +The feature can be applied to: +- the default network, at cluster installation time +- Cluster User Defined Networks (CUDNs) + +For each network we are going to need a transport parameter that takes "Geneve" (default) or "NoOverlay". +Then if the transport is set to "NoOverlay", we need the following parameters to configure the no-overlay mode: +- outboundSNAT: + - "enable": apply source NAT to egress traffic, allowing only the node IP to be exposed, which is today's expected behaviour unless EgressIP is used; + - "disable": do not apply any SNAT to egress traffic, thus exposing the pod subnet outside the cluster. +- routing: + - "managed": delegate to OCP the configuration of the BGP routers or reflectors necessary to advertise the network pod subnet to all cluster nodes; + - "unmanaged": use the FRRConfig and RouteAdvertisements provided by the cluster administrator to implement the no-overlay mode + +For CUDNs these parameters will be added to the CUDN CRD and can be configured by the cluster administrator when creating a CUDN. For the default network, these parameters must be input by the cluster administrator at installation time and passed over to ovn-kubernetes by the Cluster Network Operator. + +There are also two global parameters specific to the way that the no-overlay mode is to be implemented when NoOverlayOptions.routing="Managed", affecting the generated BGP configuration: +- asNumber: the Autonomous System (AS) number to be used in the generated FRRConfig +- bgpTopology: + - "fullMesh": every node deploys a BGP router, thus forming a BGP full mesh. + + +The resulting FRRConfig and RouteAdvertisements will be generated by OVN-Kubernetes and are described in detail in the upstream enhancement. +In this enhancement we are going to define how the Openshift API is to be extended to include these new parameters, which will be passed by CNO to OVN-Kubernetes at installation time. + + +### Workflow Description + +No-overlay mode is a day-0 feature, configured at cluster installation time. +The cluster administrator is expected to provided the OCP configuration and necessary manifests to the OpenShift installer, which then deploys them as part of the cluster installation process. + +Specifically, in operator.openshift.io/v1 Network the cluster administrator should: +- enable BGP for the cluster: + - spec.additionalRoutingCapabilities.providers: "FRR" + - spec.defaultNEtwork.ovnKubernetesConfig.routeAdvertisements: "enable" +- enable no-overlay mode for the default network (if desired) and configure it with the per-network parameters: + - outboundSNAT: "enable","disable" + - "routing": "managed", "unmanaged" +- provide the necessary manifests for the "unmanaged" scenario, if that's the preferred routing mode for the default network (as configured in the previous step): + - FRRConfig CR + - RouteAdvertisements CR +- configure parameters for the "managed" scenario, if that's the preferred routing mode for either the default network (as configured above) or for CUDNs (created later on, after cluster installation): + - asNumber + - bgpTopology: "fullMesh" + +It's important to note that there can be networks in no-overlay mode running in "managed" mode, and networks in no-overlay mode running in "unmanaged" mode, coexisting in the same cluster. The manifests that are necessary for "unmanaged" mode, whether it is for the default network or for CUDNs, must be provided on day 0 by the cluster administrator. Similarly, the parameters for "managed" mode must be provided on day 0, even if the CUDNs to which no-overlay mode will be applied will be created later on. + +On day 1, during installation, CNO will then propagate these configuration parameters to the ovn-kubernetes components (ovnkube-control-plane and ovnkube-node) via the existing configmap. OVN-Kubernetes will then be responsible for implementing no-overlay mode according to the provided parameters; in particular, for "managed" mode, OVN-Kubernetes will generate the necessary FRRConfig and RouteAdvertisements based on the provided parameters. + +Extra care must be taken to ensure that the total time taken by CNO to deploy the network is not significantly increased by the additional steps necessary to configure no-overlay mode. The time taken by the network to converge should be well within the time window allocated to CNO by the Cluster Version Operator (CVO). + +Changes to the no-overlay mode configuration after the initial cluster installation are not currently supported and are therefore forbidden by the newly introduced API. + +As a consequence, when a network is created in no-overlay mode, be it the default network or a CUDN, it is not possible to revert it to overlay mode. The opposite is also true: a network created in overlay mode cannot be switched to no-overlay mode. In particular: +- For the default network, the feature can only be enabled at cluster installation time and cannot be disabled afterwards. +- For CUDNs, the cluster administrator can choose between overlay and no-overlay mode at creation time, in the CUDN spec. For an existing CUDN, the mode cannot be changed unless the CUDN is deleted and recreated with the desired overlay / no-overlay mode. + + +### API Extensions + +The following changes are to be added to the operator network configuration that is ready by CNO: + +```diff +diff --git a/operator/v1/types_network.go b/operator/v1/types_network.go +index 111240eec..f4314dcf4 100644 +--- a/operator/v1/types_network.go ++++ b/operator/v1/types_network.go +@@ -399,6 +399,11 @@ type OpenShiftSDNConfig struct { + + // ovnKubernetesConfig contains the configuration parameters for networks + // using the ovn-kubernetes network project ++// +kubebuilder:validation:XValidation:rule='self.defaultNetworkTransport == "NoOverlay" || !has(self.defaultNetworkNoOverlayOptions)',message="defaultNetworkNoOverlayOptions is only supported for no-overlay networks" ++// +kubebuilder:validation:XValidation:rule='!(self.defaultNetworkTransport == "NoOverlay" && has(self.defaultNetworkNoOverlayOptions) && self.defaultNetworkNoOverlayOptions.routing == "Managed") || has(self.bgpManagedConfig)',message="bgpManagedConfig is required when DefaultNetworkTransport is NoOverlay, DefaultNetworkNoOverlayOptions is set and DefaultNetworkNoOverlayOptions.Routing=\"Managed\"" ++// +kubebuilder:validation:XValidation:rule='self.defaultNetworkTransport == oldSelf.defaultNetworkTransport',message="DefaultNetworkTransport field is immutable" ++// +kubebuilder:validation:XValidation:rule='self.defaultNetworkNoOverlayOptions == oldSelf.defaultNetworkNoOverlayOptions',message="defaultNetworkNoOverlayOptions field is immutable" ++// +kubebuilder:validation:XValidation:rule='self.bgpManagedConfig == oldSelf.bgpManagedConfig',message="bgpManagedConfig field is immutable" + type OVNKubernetesConfig struct { + // mtu is the MTU to use for the tunnel interface. This must be 100 + // bytes smaller than the uplink mtu. +@@ -468,6 +473,44 @@ type OVNKubernetesConfig struct { + // +openshift:enable:FeatureGate=RouteAdvertisements + // +optional + RouteAdvertisements RouteAdvertisementsEnablement `json:"routeAdvertisements,omitempty"` ++ ++ // DefaultNetworkTransport describes the transport protocol for east-west traffic for the default network. ++ // Allowed values are "NoOverlay" and "Geneve". ++ // - "NoOverlay": The default network operates in no-overlay mode. ++ // - "Geneve": The default network uses Geneve overlay. ++ // Defaults to "Geneve". ++ // +kubebuilder:validation:Enum=NoOverlay;Geneve ++ // +kubebuilder:default=Geneve ++ // +optional ++ DefaultNetworkTransport TransportOption `json:"defaultNetworkTransport,omitempty"` ++ // DefaultNetworkNoOverlayOptions contains configuration for no-overlay mode for the default network. ++ // It is required when DefaultNetworkTransport is "NoOverlay". ++ // +optional ++ DefaultNetworkNoOverlayOptions *NoOverlayOptions `json:"defaultNetworkNoOverlayOptions,omitempty"` ++ ++ // NoOverlayManagedConfig configures the BGP properties for networks (default network or CUDNs) ++ // in no-overlay mode that specify routing="managed" in their NoOverlayOptions. ++ // It is required when DefaultNetworkNoOverlayOptions.Routing is set to "Managed". ++ // +optional ++ BGPManagedConfig *bgpManagedConfig `json:"BGPManagedConfig,omitempty"` ++} ++ ++// NoOverlayManagedConfig contains configuration options for BGP when routing is "Managed". ++type bgpManagedConfig struct { ++ // ASNumber is the 2-byte or 4-byte Autonomous System Number (ASN) ++ // to be used in the generated FRR configuration. It is required ++ // when NoOverlayOptions.Routing is "Managed". ++ // +kubebuilder:validation:Minimum=1 ++ // +kubebuilder:validation:Maximum=4294967295 ++ ASNumber *uint32 `json:"asNumber,omitempty"` ++ ++ // BGPTopology defines the BGP topology to be used. Allowed values ++ // are "fullMesh". ++ // - "fullMesh": Every node deploys a BGP router, forming a BGP full mesh. ++ // Defaults to "fullMesh". ++ // +kubebuilder:validation:Enum=fullMesh ++ // +optional ++ BGPTopology BGPTopology `json:"bgpTopology,omitempty"` + } + + type IPv4OVNKubernetesConfig struct { +@@ -898,3 +941,34 @@ type AdditionalRoutingCapabilities struct { + // +kubebuilder:validation:XValidation:rule="self.all(x, self.exists_one(y, x == y))" + Providers []RoutingCapabilitiesProvider `json:"providers"` + } ++ ++type TransportOption string ++type SnatOption string ++type RoutingOption string ++type BGPTopology string ++ ++const ( ++ TransportOptionNoOverlay TransportOption = "NoOverlay" ++ TransportOptionGeneve TransportOption = "Geneve" ++ ++ SnatEnable SnatOption = "Enable" ++ SnatDisable SnatOption = "Disable" ++ ++ RoutingManaged RoutingOption = "Managed" ++ RoutingUnmanaged RoutingOption = "Unmanaged" ++ ++ // BGPTopologyRouteReflector BGPTopology = "routeReflector" // TODO: Enable when route reflector is implemented in FRR-Kubernetes (and OVN-Kubernetes) ++ BGPTopologyFullMesh BGPTopology = "fullMesh" ++) ++ ++// NoOverlayOptions contains configuration options for networks operating in no-overlay mode. ++type NoOverlayOptions struct { ++ // OutboundSNAT defines the SNAT behavior for outbound traffic from pods. ++ // +kubebuilder:validation:Enum=Enable;Disable ++ // +required ++ OutboundSNAT SnatOption `json:"outboundSNAT,omitempty"` ++ // Routing specifies whether the pod network routing is managed by OVN-Kubernetes or users. ++ // +kubebuilder:validation:Enum=Managed;Unmanaged ++ // +required ++ Routing RoutingOption `json:"routing,omitempty"` ++} +``` +### Topology Considerations + +#### Hypershift / Hosted Control Planes + + + + + + + + + + +**TO BE DISCUSSED** + +#### Standalone Clusters + +No special considerations for standalone clusters. + +#### Single-node Deployments or MicroShift + + + + + + + + +The goal being to not encapsulate traffic between any two cluster nodes, there is no use case for this feature in true single-node deployments. For deployments with a single-node control plane and multiple worker nodes, the feature is usable in the same way as standalone clusters. + +### Implementation Details/Notes/Constraints + + + + + + + +### Risks and Mitigations + + + + + + + + + + + +### Drawbacks + + + + + + + + + + + + +## Alternatives (Not Implemented) + +N/A + +## Open Questions [optional] +N/A + +## Test Plan + + + + + + + + + + + + + + + +E2E tests and QE tests should ensure that the no-overlay mode works as expected in all supported configurations and that we fully support: +- conformance kubernetes tests on the default network in no-overlay mode +- ovn-kubernetes E2E tests, excluding features not supported in no-overlay mode (e.g. EgressService, EgressIP, Multicast, IPSEC), for the default network and for CUDNs in no-overlay mode. + +### E2E tests + +We already have two dual stack CI lanes for BGP, one for shared gateway and one for local gateway. +- e2e-metal-ipi-ovn-dualstack-bgp +- e2e-metal-ipi-ovn-dualstack-bgp-local-gw + +These two lanes already cover the main OVN-Kubernetes configuration settings and will be extended to cover the no-overlay mode in the unmanaged scenario for CUDNs. Since a CUDN can be added after cluster installation, we can extend the two existing CI lanes to enable the no-overlay feature gate at installation time and then create a CUDN in no-overlay mode and test its connectivity east-west and north-south. CUDNs can be created on the fly with outboundSnat=enable or outboundSnat=disable, so both cases can be covered in the same CI lane. +This would give us a good coverage of the no-overlay mode for CUDNs in the unmanaged scenario for both gateway modes. + +Testing no-overlay mode in the two BGP lanes above would require the feature gate to be enabled at installation time. This is a reasonable trade-off that will help us minimize the number of CI lanes and reduce CI costs without affecting the existing CI coverage for the BGP feature. + +For the default network, we need a separate CI lane where we can enable the no-overlay mode at installation time. This lane will cover only the unmanaged scenario (with, say, outboundSNAT=enable) and will run upstream conformance tests on the default network: +- e2e-metal-ipi-ovn-dualstack-no-overlay-unmanaged-techpreview + +For the managed scenario, we need yet one CI lanes, where we can test no-overlay mode with the full mesh topology, for both the default network and CUDNs. In order to keep the number of lanes manageable, we can have one lane run ovn-kubernetes in shared gateway mode and the other in local gateway mode. In both lanes the default network is in no-overlay mode and we can create CUDNs in no-overlay mode on the fly, with outboundSNAT=enable and then outboundSNAT=disable, thus covering both cases in the same lane. + +- e2e-metal-ipi-ovn-dualstack-shared-gw-no-overlay-managed-full-mesh-techpreview +- e2e-metal-ipi-ovn-dualstack-local-gw-no-overlay-managed-full-mesh-techpreview + +TODO: if we can mix managed and unmanaged, we can reduce the number of CI lanes further. DISCUSS THIS. + +### QE Testing +When testing a managed full mesh topology, we should pay some special attention to resource consumption as we scale up the cluster. The number of links in a full mesh topology grows as N*(N-1)/2, where N is the number of nodes in the cluster. We need to ensure that ovn-kubernetes and FRR can handle the number of BGP connections in a large cluster without excessive resource consumption. +In our scale testing we should explicitly test for two separate aspects: +- cluster node scale up functionally: does adding a new node to a cluster already in no-overlay mode work as expected? Is the new node correctly configured and does it join the BGP mesh? +- cluster scale performance wise: in our scale testing we have good coverage for clusters of size: small (24 nodes), medium (120 nodes), large (250 nodes); x-large (500 nodes) clusters are less frequently tested. We should aim to have coverage for no-overlay mode for at least medium-size clusters (120 nodes) and verify that a full mesh topology is sustainable at this scale. + +## Graduation Criteria + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +We are going to develop this feature in two phases: +- phase 1: we will support no-overlay mode in the unmanaged configuration: an FRRConfig and a RouteAdvertisements to configure the BGP topology are provided by the cluster administrator. +- phase 2: we will add support for the managed configuration: the FRRConfig and RouteAdvertisements are generated by OVN-Kubernetes based on the new API parameters. + +Until phase 2 is complete, the feature will be considered in Tech Preview and will be enabled by a feature gate. + +An extra goal, not required for GA, will be to add support for the managed network topology "routeReflector". In this managed topology, every node labeled with `k8s.ovn.org/internal-bgp-role=route-reflector` will run a BGP route reflector; all other nodes will only peer with the route reflectors. This is particularly useful for large clusters, as it reduces the total number of BGP connections compared to a full-mesh topology. Given that it requires to first add native support for route reflectors in FRR-Kubernetes and, in the context of no-overlay mode, it is an optimization rather than a requirement, we will consider it as a future goal after GA. + +### Dev Preview -> Tech Preview + +- Unmanaged no-overlay mode implemented +- Sufficient test coverage (E2E, QE) + +### Tech Preview -> GA + +- Unmanaged and managed no-overlay mode implemented +- More testing (upgrade, downgrade, scale, end to end) +- Sufficient time for feedback +- Available by default +- User facing documentation created in [openshift-docs](https://github.com/openshift/openshift-docs/) + +## Upgrade / Downgrade Strategy + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +There are no special concerns about upgrades, since the feature can only be turned on for the default network at installation time and for CUDNs at the time of creation of the affected CUDNs. Configuration changes should not be allowed by CNO during upgrades. +QA coverage will include testing the upgrade of a cluster that already runs in no-overlay mode. + +## Version Skew Strategy + + + + + + + + + + + + + +## Operational Aspects of API Extensions + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +## Support Procedures + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +## Infrastructure Needed [optional] + +N/A