|
| 1 | +--- |
| 2 | +title: OVN-Kubernetes support for EVPN |
| 3 | +authors: |
| 4 | + - @jcaamano |
| 5 | +reviewers: |
| 6 | + - @arghosh93 |
| 7 | + - @asood-rh |
| 8 | + - @jechen0648 |
| 9 | + - @martinkennelly |
| 10 | + - @maiqueb |
| 11 | + - @pperiyasamy |
| 12 | + - @tssurya |
| 13 | + - @zhaozhanqi |
| 14 | +approvers: |
| 15 | + - @tssurya |
| 16 | +api-approvers: |
| 17 | + - None |
| 18 | +creation-date: 2025-10-14 |
| 19 | +last-updated: 2025-10-15 |
| 20 | +tracking-link: |
| 21 | + - https://issues.redhat.com/browse/CORENET-6429 |
| 22 | +see-also: |
| 23 | + - https://github.com/ovn-kubernetes/ovn-kubernetes/pull/5089 |
| 24 | + |
| 25 | +--- |
| 26 | + |
| 27 | +# OVN-Kubernetes support for EVPN |
| 28 | + |
| 29 | +## Summary |
| 30 | + |
| 31 | +This feature allows exposing primary Cluster User Defined Networks (P-CUDNs) |
| 32 | +externally via a VPN to other entities either inside, or outside the cluster; |
| 33 | +using BGP and EVPN as the common and native networking standard that will enable |
| 34 | +integration into user networks without SDN specific network protocol |
| 35 | +integration, and providing an industry standardized way to achieve network |
| 36 | +segmentation between sites. |
| 37 | + |
| 38 | +This enhancement is aligned and being worked on in tandem to a corresponding |
| 39 | +OVN-Kubernetes upstream [enhancement][1]. As such, for much of the content there |
| 40 | +will be references to it. The intention of this enhancement is to outline the |
| 41 | +necessary changes to consume and integrate that functionality in OCP, including |
| 42 | +the interaction with the Cluster Network Operator (CNO) and our test plan for |
| 43 | +this feature. However, in case the upstream enhancement is found to be |
| 44 | +inadequate, one of the following outcomes is possible depending on the |
| 45 | +circumstances of the inadequacy: |
| 46 | +* Amend the upstream enhancement if still open. |
| 47 | +* Work on a follow up upstream enhancement while keeping this one open. |
| 48 | +* Work on a new downstream enhancement that either replaces or follows this one. |
| 49 | + |
| 50 | +[1]: https://github.com/ovn-kubernetes/ovn-kubernetes/pull/5089 |
| 51 | + |
| 52 | + |
| 53 | +## Motivation |
| 54 | + |
| 55 | +The motivations for this feature are aligned to the ones described in the |
| 56 | +upstream [enhancement][2]. |
| 57 | + |
| 58 | +[2]: https://github.com/ovn-kubernetes/ovn-kubernetes/blob/8461a5526377488bc643bd4eb7024b0b735e8830/docs/okeps/okep-50088-evpn.md#introduction |
| 59 | + |
| 60 | + |
| 61 | +### User Stories |
| 62 | + |
| 63 | +The user stories for this feature are aligned to the ones described in the |
| 64 | +upstream [enhancement][3]. |
| 65 | + |
| 66 | +[3]: https://github.com/ovn-kubernetes/ovn-kubernetes/blob/8461a5526377488bc643bd4eb7024b0b735e8830/docs/okeps/okep-50088-evpn.md#user-storiesuse-cases |
| 67 | + |
| 68 | +### Goals |
| 69 | + |
| 70 | +The goals for this feature are aligned to the ones described in the upstream |
| 71 | +[enhancement][4]. |
| 72 | + |
| 73 | +[4]: https://github.com/ovn-kubernetes/ovn-kubernetes/blob/8461a5526377488bc643bd4eb7024b0b735e8830/docs/okeps/okep-50088-evpn.md#goals |
| 74 | + |
| 75 | +### Non-Goals |
| 76 | + |
| 77 | +The non-goals for this feature are aligned to the ones described in the upstream |
| 78 | +[enhancement][5]. |
| 79 | + |
| 80 | +[5]: https://github.com/ovn-kubernetes/ovn-kubernetes/blob/8461a5526377488bc643bd4eb7024b0b735e8830/docs/okeps/okep-50088-evpn.md#non-goals |
| 81 | + |
| 82 | +## Proposal |
| 83 | + |
| 84 | +Ths section requires a general understanding of the overall proposal described |
| 85 | +in the upstream[enhancement][6]. |
| 86 | + |
| 87 | +[6]: https://github.com/ovn-kubernetes/ovn-kubernetes/blob/8461a5526377488bc643bd4eb7024b0b735e8830/docs/okeps/okep-50088-evpn.md#proposed-solution |
| 88 | + |
| 89 | +The EVPN feature is mainly driven by the already existing OVN-Kubernetes |
| 90 | +RouteAdvertisements CRD. The `routeAdvertisements` configuration flag in the |
| 91 | +OVN-Kubernetes CNO configuration will need to be set to `Enabled` to be able to |
| 92 | +use the feature. |
| 93 | + |
| 94 | +Additionally, the EVPN feature is used through OVN-Kubernetes specific APIs in |
| 95 | +the form of new and updated CRDs. A new feature gate will be introduced for the |
| 96 | +EVPN feature and CNO will deploy these CRD updates only if the feature gate is |
| 97 | +enabled making it impossible to use the feature if the feature gate is not |
| 98 | +enabled. |
| 99 | + |
| 100 | +The EVPN feature is only supported when `routingViaHost` is set to false, also |
| 101 | +known as local gateway mode. CNO will perform no validation, OVN-Kubernetes will |
| 102 | +[reject][13] invalid configurations in this regard. |
| 103 | + |
| 104 | +[13]: TODO add reference |
| 105 | + |
| 106 | +The EVPN feature requires FRR-k8s. FRR-k8s is deployed by CNO when the |
| 107 | +`additionalRoutingCapabilities` `providers` includes `FRR` in the operator |
| 108 | +configuration, which is required to enable `routeAdvertisements`. |
| 109 | + |
| 110 | +The upstream enhancement may introduce changes to the FRR-k8s APIs. These API |
| 111 | +changes need not to be gated by the feature gate introduced above. Worth noting |
| 112 | +that FRR-k8s APIs allow the possibility to inject raw FRR configuration and that |
| 113 | +this capability might be used during the development process until proper |
| 114 | +structured APIs are introduced. |
| 115 | + |
| 116 | +Currently, CNO FRR-k8s includes FRR v8 but the EVPN feature needs FRR v9+ so it |
| 117 | +will need to be updated. The problem is that FRR-k8s consumes the FRR version |
| 118 | +provided by RHEL, specifically RHEL9 which is the current version in use. RHEL9 |
| 119 | +only provides FRR v8. RHEL10 will provide FRR v10. RHEL10 was to be |
| 120 | +[available][7] as tech preview in OCP 4.21 and fully available in OCP 4.22 but |
| 121 | +currently its real timeline is uncertain. These are our options in order of |
| 122 | +preference: |
| 123 | + |
| 124 | +[7]: https://docs.google.com/spreadsheets/d/1VO00pWkWf8Fr30PHl8mZFTK9ZnJO51BGXH4FT6efwp4/edit?gid=1551125754#gid=1551125754 |
| 125 | + |
| 126 | +* Request RHEL to package FRR v10 for RHEL 9. |
| 127 | +* If RHEL 10 is available, build a FRR-k8s image against a RHEL10 stream. |
| 128 | +* Consume FRR v10 from the FDP (Fast DataPath) project as we currently do with |
| 129 | + OVN and Libreswan until RHEL 10 is generally available for OCP. |
| 130 | + |
| 131 | +A user of the EVPN feature may leverage the internal BGP fabric introduced in |
| 132 | +the No-Overlay mode [enhancement][8], particularly for East/West L2 EVPN |
| 133 | +configurations. No additional changes with respect to what is described in that |
| 134 | +enhancement are required. |
| 135 | + |
| 136 | +[8]: https://github.com/openshift/enhancements/pull/1859 |
| 137 | + |
| 138 | + |
| 139 | +### Workflow Description |
| 140 | + |
| 141 | +In no particular order, a cluster administrator enables FRR and |
| 142 | +RouteAdvertisements, and the EVPN feature gate if not available through the |
| 143 | +default feature set: |
| 144 | + |
| 145 | +```shell |
| 146 | +oc patch featuregate cluster --type=merge -p='{"spec"{"featureSet":"TechPreviewNoUpgrade"}' |
| 147 | +... |
| 148 | +oc patch Network.operator.openshift.io cluster --type=merge -p='{"spec":{"additionalRoutingCapabilities": {"providers": ["FRR"]}, "defaultNetwork":{"ovnKubernetesConfig":{"routeAdvertisements":"Enabled"}}}}' |
| 149 | +``` |
| 150 | + |
| 151 | +Then, a cluster administrator enables the internal BGP fabric if intended to be |
| 152 | +used: |
| 153 | +```shell |
| 154 | +TODO |
| 155 | +``` |
| 156 | + |
| 157 | +Then, a cluster administrator follows the OVN-Kubernetes [workflow][9] to configure EVPN. |
| 158 | + |
| 159 | +[9]: https://github.com/ovn-kubernetes/ovn-kubernetes/blob/8461a5526377488bc643bd4eb7024b0b735e8830/docs/okeps/okep-50088-evpn.md#workflow-description |
| 160 | + |
| 161 | +TODO: add example workflow |
| 162 | + |
| 163 | + |
| 164 | +### API Extensions |
| 165 | + |
| 166 | +There are no changes required to OCP specific APIs. |
| 167 | + |
| 168 | + |
| 169 | +### Topology Considerations |
| 170 | + |
| 171 | +#### Hypershift / Hosted Control Planes |
| 172 | + |
| 173 | +No special considerations for hosted clusters. |
| 174 | + |
| 175 | +#### Standalone Clusters |
| 176 | + |
| 177 | +No special considerations for standalone clusters. |
| 178 | + |
| 179 | +#### Single-node Deployments or MicroShift |
| 180 | + |
| 181 | +No special considerations for single-node clusters. |
| 182 | + |
| 183 | +### Implementation Details/Notes/Constraints |
| 184 | + |
| 185 | +As a recap, these are the changes proposed by this enhancement: |
| 186 | + |
| 187 | +* Introduce all relevant updates of OVN-Kubernetes and FRR-k8s APIs. |
| 188 | +* Introduce a CNO feature gate for the EVPN feature that will be required to |
| 189 | + deploy the specific bits of the OVN-Kuberentes APIs that implement EVPN. |
| 190 | +* Introduce changes to the OCP FRR-k8s build process to consume FRR v10. |
| 191 | + |
| 192 | +### Risks and Mitigations |
| 193 | + |
| 194 | +There is a risk related to the availability of FRR v9+ that has been described |
| 195 | +in an earlier section of the enhancement. Not other mitigations proposed other |
| 196 | +than the alternatives outline there. |
| 197 | + |
| 198 | +Otherwise, the risk and mitigations for this feature are aligned to the ones |
| 199 | +described in the upstream [enhancement][15]. |
| 200 | + |
| 201 | +[15]: https://github.com/ovn-kubernetes/ovn-kubernetes/blob/8461a5526377488bc643bd4eb7024b0b735e8830/docs/okeps/okep-50088-evpn.md#risks-known-limitations-and-mitigations |
| 202 | + |
| 203 | +### Drawbacks |
| 204 | + |
| 205 | +The drawbacks of this feature are aligned to the ones described in the upstream |
| 206 | +[enhancement][15]. |
| 207 | + |
| 208 | +Worth mentioning is the lack of support for `RoutingViaHost=false` |
| 209 | +configuration, also known as shared gateway mode. Support for this configuration |
| 210 | +will be introduced in a future enhancement. |
| 211 | + |
| 212 | +## Alternatives (Not Implemented) |
| 213 | + |
| 214 | +The alternatives of this feature are aligned to the ones described in the |
| 215 | +upstream [enhancement][16]. |
| 216 | + |
| 217 | +[16]: https://github.com/ovn-kubernetes/ovn-kubernetes/blob/8461a5526377488bc643bd4eb7024b0b735e8830/docs/okeps/okep-50088-evpn.md#alternatives |
| 218 | + |
| 219 | +## Open Questions [optional] |
| 220 | + |
| 221 | +N/A |
| 222 | + |
| 223 | + |
| 224 | +## Test Plan |
| 225 | + |
| 226 | +### E2E tests |
| 227 | + |
| 228 | +There already exists a dual stack CI lane for BGP in local gateway mode, |
| 229 | +`e2e-metal-ipi-ovn-dualstack-bgp-local-gw`, mostly defined with the appropriate |
| 230 | +configuration to run EVPN test cases. The job should be modified to: |
| 231 | +* Enable the EVPN feature gate. |
| 232 | +* Enable the internal BGP fabric. |
| 233 | + |
| 234 | +[OpenShift Testing Extensions (OTE)][14] will be used for the implementation |
| 235 | +combined with the use of the appropriate infrastructure provider so that |
| 236 | +upstream test cases can be used downstream as well. Ideally the coverage should |
| 237 | +be the same as the existing coverage for P-CUDNs when configured normally except |
| 238 | +for those features explicitly called out as not supported in the EVPN upstream |
| 239 | +[enhancement][10]. |
| 240 | + |
| 241 | +[10]: TODO add link |
| 242 | + |
| 243 | +[14]: https://github.com/openshift-eng/openshift-tests-extension |
| 244 | + |
| 245 | +To test with the internal BGP fabric it would be enough to run selected |
| 246 | +East/West test cases for a L2 EVPN as for the most part the internal BGP fabric |
| 247 | +is no different from an externally provided one in terms of data plane. |
| 248 | + |
| 249 | + |
| 250 | +### QE Testing |
| 251 | + |
| 252 | +In a similar vein to E2E tests, QE coverage should constitute a regression of |
| 253 | +the existing P-CUDN coverage under a EVPN configuration except for those features |
| 254 | +explicitly called out as not supported in the EVPN upstream [enhancement][10]. |
| 255 | + |
| 256 | +QE testing should include testing upgrades from a cluster already making use of |
| 257 | +the EVPN feature. |
| 258 | + |
| 259 | +QE testing will need to emulate an BGP/EVPN fabric. While any kind of custom and |
| 260 | +simplified setup is acceptable, there might be interest in using third party |
| 261 | +projects like [containerlab][11]. |
| 262 | + |
| 263 | +[11]: https://github.com/srl-labs/containerlab |
| 264 | + |
| 265 | +## Graduation Criteria |
| 266 | + |
| 267 | +The EVPN feature is planned to be provided with technical preview availability |
| 268 | +first and then with general availability in a later release. While the main |
| 269 | +development effort will take place in context of the upstream project, the |
| 270 | +following graduation criteria includes suggestions to prioritize that |
| 271 | +development effort in case it can be accommodated. |
| 272 | + |
| 273 | +### Dev Preview -> Tech Preview |
| 274 | + |
| 275 | +- EVPN support for L2 and L3 P-CUDNs |
| 276 | +- Use of raw configuration API in FRR-k8s |
| 277 | +- Use of node IPs as VTEP IPs |
| 278 | +- Sufficient test coverage (E2E, QE) |
| 279 | + |
| 280 | +### Tech Preview -> GA |
| 281 | + |
| 282 | +- Formal FRR-k8s APIs introduced |
| 283 | +- Configurable VTEP IPs |
| 284 | +- Complete testing, including upgrades |
| 285 | +- User facing documentation available in [openshift-docs](https://github.com/openshift/openshift-docs/) |
| 286 | + |
| 287 | +## Upgrade / Downgrade Strategy |
| 288 | + |
| 289 | +This feature has no impacts on upgradeability. |
| 290 | + |
| 291 | +## Version Skew Strategy |
| 292 | + |
| 293 | +N/A |
| 294 | + |
| 295 | +## Operational Aspects of API Extensions |
| 296 | + |
| 297 | +N/A |
| 298 | + |
| 299 | +## Support Procedures |
| 300 | + |
| 301 | +In general, support procedures will be based on the status reported on the |
| 302 | +resource instances involved in the EVPN feature which are similar to those |
| 303 | +already involved in the existing BGP features, including the status for |
| 304 | +`RouteAdvertisements`, `ClusterUserDefinedNetwork` and FRR-k8s resources: any |
| 305 | +invalid configuration shall be reported through appropriate status conditions on |
| 306 | +those resources. |
| 307 | + |
| 308 | +Those status conditions should have metrics associated, allowing the |
| 309 | +configuration of alerts based on those metrics. |
| 310 | + |
| 311 | +Other than that, and given the distributed nature of OVN-Kubernetes, the next |
| 312 | +best troubleshoot method relies on the use tools like `iproute2`, `tcpdump`, |
| 313 | +`ovn-trace`, `ovs-trace` and `vtysh`, mixing existing knowledge with additional |
| 314 | +understanding of FRR and host configuration specific to EVPN that is detailed in |
| 315 | +the upstream [enhancement][12] |
| 316 | + |
| 317 | +[12]: https://github.com/ovn-kubernetes/ovn-kubernetes/blob/8461a5526377488bc643bd4eb7024b0b735e8830/docs/okeps/okep-50088-evpn.md#implementation-details |
| 318 | + |
| 319 | +In the future, NetObserv might introduce insights into that host configuration |
| 320 | +and facilitate troubleshooting. |
| 321 | + |
| 322 | +## Infrastructure Needed [optional] |
| 323 | + |
| 324 | +The EVPN feature is only supported in baremetal platforms. |
0 commit comments