Skip to content

Commit c62a1c9

Browse files
committed
CORENET-6429: OVN-Kubernetes support for EVPN
Signed-off-by: Jaime Caamaño Ruiz <jcaamano@redhat.com>
1 parent df1fada commit c62a1c9

File tree

1 file changed

+324
-0
lines changed

1 file changed

+324
-0
lines changed
Lines changed: 324 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,324 @@
1+
---
2+
title: OVN-Kubernetes support for EVPN
3+
authors:
4+
- @jcaamano
5+
reviewers:
6+
- @arghosh93
7+
- @asood-rh
8+
- @jechen0648
9+
- @martinkennelly
10+
- @maiqueb
11+
- @pperiyasamy
12+
- @tssurya
13+
- @zhaozhanqi
14+
approvers:
15+
- @tssurya
16+
api-approvers:
17+
- None
18+
creation-date: 2025-10-14
19+
last-updated: 2025-10-15
20+
tracking-link:
21+
- https://issues.redhat.com/browse/CORENET-6429
22+
see-also:
23+
- https://github.com/ovn-kubernetes/ovn-kubernetes/pull/5089
24+
25+
---
26+
27+
# OVN-Kubernetes support for EVPN
28+
29+
## Summary
30+
31+
This feature allows exposing primary Cluster User Defined Networks (P-CUDNs)
32+
externally via a VPN to other entities either inside, or outside the cluster;
33+
using BGP and EVPN as the common and native networking standard that will enable
34+
integration into user networks without SDN specific network protocol
35+
integration, and providing an industry standardized way to achieve network
36+
segmentation between sites.
37+
38+
This enhancement is aligned and being worked on in tandem to a corresponding
39+
OVN-Kubernetes upstream [enhancement][1]. As such, for much of the content there
40+
will be references to it. The intention of this enhancement is to outline the
41+
necessary changes to consume and integrate that functionality in OCP, including
42+
the interaction with the Cluster Network Operator (CNO) and our test plan for
43+
this feature. However, in case the upstream enhancement is found to be
44+
inadequate, one of the following outcomes is possible depending on the
45+
circumstances of the inadequacy:
46+
* Amend the upstream enhancement if still open.
47+
* Work on a follow up upstream enhancement while keeping this one open.
48+
* Work on a new downstream enhancement that either replaces or follows this one.
49+
50+
[1]: https://github.com/ovn-kubernetes/ovn-kubernetes/pull/5089
51+
52+
53+
## Motivation
54+
55+
The motivations for this feature are aligned to the ones described in the
56+
upstream [enhancement][2].
57+
58+
[2]: https://github.com/ovn-kubernetes/ovn-kubernetes/blob/8461a5526377488bc643bd4eb7024b0b735e8830/docs/okeps/okep-50088-evpn.md#introduction
59+
60+
61+
### User Stories
62+
63+
The user stories for this feature are aligned to the ones described in the
64+
upstream [enhancement][3].
65+
66+
[3]: https://github.com/ovn-kubernetes/ovn-kubernetes/blob/8461a5526377488bc643bd4eb7024b0b735e8830/docs/okeps/okep-50088-evpn.md#user-storiesuse-cases
67+
68+
### Goals
69+
70+
The goals for this feature are aligned to the ones described in the upstream
71+
[enhancement][4].
72+
73+
[4]: https://github.com/ovn-kubernetes/ovn-kubernetes/blob/8461a5526377488bc643bd4eb7024b0b735e8830/docs/okeps/okep-50088-evpn.md#goals
74+
75+
### Non-Goals
76+
77+
The non-goals for this feature are aligned to the ones described in the upstream
78+
[enhancement][5].
79+
80+
[5]: https://github.com/ovn-kubernetes/ovn-kubernetes/blob/8461a5526377488bc643bd4eb7024b0b735e8830/docs/okeps/okep-50088-evpn.md#non-goals
81+
82+
## Proposal
83+
84+
Ths section requires a general understanding of the overall proposal described
85+
in the upstream[enhancement][6].
86+
87+
[6]: https://github.com/ovn-kubernetes/ovn-kubernetes/blob/8461a5526377488bc643bd4eb7024b0b735e8830/docs/okeps/okep-50088-evpn.md#proposed-solution
88+
89+
The EVPN feature is mainly driven by the already existing OVN-Kubernetes
90+
RouteAdvertisements CRD. The `routeAdvertisements` configuration flag in the
91+
OVN-Kubernetes CNO configuration will need to be set to `Enabled` to be able to
92+
use the feature.
93+
94+
Additionally, the EVPN feature is used through OVN-Kubernetes specific APIs in
95+
the form of new and updated CRDs. A new feature gate will be introduced for the
96+
EVPN feature and CNO will deploy these CRD updates only if the feature gate is
97+
enabled making it impossible to use the feature if the feature gate is not
98+
enabled.
99+
100+
The EVPN feature is only supported when `routingViaHost` is set to false, also
101+
known as local gateway mode. CNO will perform no validation, OVN-Kubernetes will
102+
[reject][13] invalid configurations in this regard.
103+
104+
[13]: TODO add reference
105+
106+
The EVPN feature requires FRR-k8s. FRR-k8s is deployed by CNO when the
107+
`additionalRoutingCapabilities` `providers` includes `FRR` in the operator
108+
configuration, which is required to enable `routeAdvertisements`.
109+
110+
The upstream enhancement may introduce changes to the FRR-k8s APIs. These API
111+
changes need not to be gated by the feature gate introduced above. Worth noting
112+
that FRR-k8s APIs allow the possibility to inject raw FRR configuration and that
113+
this capability might be used during the development process until proper
114+
structured APIs are introduced.
115+
116+
Currently, CNO FRR-k8s includes FRR v8 but the EVPN feature needs FRR v9+ so it
117+
will need to be updated. The problem is that FRR-k8s consumes the FRR version
118+
provided by RHEL, specifically RHEL9 which is the current version in use. RHEL9
119+
only provides FRR v8. RHEL10 will provide FRR v10. RHEL10 was to be
120+
[available][7] as tech preview in OCP 4.21 and fully available in OCP 4.22 but
121+
currently its real timeline is uncertain. These are our options in order of
122+
preference:
123+
124+
[7]: https://docs.google.com/spreadsheets/d/1VO00pWkWf8Fr30PHl8mZFTK9ZnJO51BGXH4FT6efwp4/edit?gid=1551125754#gid=1551125754
125+
126+
* Request RHEL to package FRR v10 for RHEL 9.
127+
* If RHEL 10 is available, build a FRR-k8s image against a RHEL10 stream.
128+
* Consume FRR v10 from the FDP (Fast DataPath) project as we currently do with
129+
OVN and Libreswan until RHEL 10 is generally available for OCP.
130+
131+
A user of the EVPN feature may leverage the internal BGP fabric introduced in
132+
the No-Overlay mode [enhancement][8], particularly for East/West L2 EVPN
133+
configurations. No additional changes with respect to what is described in that
134+
enhancement are required.
135+
136+
[8]: https://github.com/openshift/enhancements/pull/1859
137+
138+
139+
### Workflow Description
140+
141+
In no particular order, a cluster administrator enables FRR and
142+
RouteAdvertisements, and the EVPN feature gate if not available through the
143+
default feature set:
144+
145+
```shell
146+
oc patch featuregate cluster --type=merge -p='{"spec"{"featureSet":"TechPreviewNoUpgrade"}'
147+
...
148+
oc patch Network.operator.openshift.io cluster --type=merge -p='{"spec":{"additionalRoutingCapabilities": {"providers": ["FRR"]}, "defaultNetwork":{"ovnKubernetesConfig":{"routeAdvertisements":"Enabled"}}}}'
149+
```
150+
151+
Then, a cluster administrator enables the internal BGP fabric if intended to be
152+
used:
153+
```shell
154+
TODO
155+
```
156+
157+
Then, a cluster administrator follows the OVN-Kubernetes [workflow][9] to configure EVPN.
158+
159+
[9]: https://github.com/ovn-kubernetes/ovn-kubernetes/blob/8461a5526377488bc643bd4eb7024b0b735e8830/docs/okeps/okep-50088-evpn.md#workflow-description
160+
161+
TODO: add example workflow
162+
163+
164+
### API Extensions
165+
166+
There are no changes required to OCP specific APIs.
167+
168+
169+
### Topology Considerations
170+
171+
#### Hypershift / Hosted Control Planes
172+
173+
No special considerations for hosted clusters.
174+
175+
#### Standalone Clusters
176+
177+
No special considerations for standalone clusters.
178+
179+
#### Single-node Deployments or MicroShift
180+
181+
No special considerations for single-node clusters.
182+
183+
### Implementation Details/Notes/Constraints
184+
185+
As a recap, these are the changes proposed by this enhancement:
186+
187+
* Introduce all relevant updates of OVN-Kubernetes and FRR-k8s APIs.
188+
* Introduce a CNO feature gate for the EVPN feature that will be required to
189+
deploy the specific bits of the OVN-Kuberentes APIs that implement EVPN.
190+
* Introduce changes to the OCP FRR-k8s build process to consume FRR v10.
191+
192+
### Risks and Mitigations
193+
194+
There is a risk related to the availability of FRR v9+ that has been described
195+
in an earlier section of the enhancement. Not other mitigations proposed other
196+
than the alternatives outline there.
197+
198+
Otherwise, the risk and mitigations for this feature are aligned to the ones
199+
described in the upstream [enhancement][15].
200+
201+
[15]: https://github.com/ovn-kubernetes/ovn-kubernetes/blob/8461a5526377488bc643bd4eb7024b0b735e8830/docs/okeps/okep-50088-evpn.md#risks-known-limitations-and-mitigations
202+
203+
### Drawbacks
204+
205+
The drawbacks of this feature are aligned to the ones described in the upstream
206+
[enhancement][15].
207+
208+
Worth mentioning is the lack of support for `RoutingViaHost=false`
209+
configuration, also known as shared gateway mode. Support for this configuration
210+
will be introduced in a future enhancement.
211+
212+
## Alternatives (Not Implemented)
213+
214+
The alternatives of this feature are aligned to the ones described in the
215+
upstream [enhancement][16].
216+
217+
[16]: https://github.com/ovn-kubernetes/ovn-kubernetes/blob/8461a5526377488bc643bd4eb7024b0b735e8830/docs/okeps/okep-50088-evpn.md#alternatives
218+
219+
## Open Questions [optional]
220+
221+
N/A
222+
223+
224+
## Test Plan
225+
226+
### E2E tests
227+
228+
There already exists a dual stack CI lane for BGP in local gateway mode,
229+
`e2e-metal-ipi-ovn-dualstack-bgp-local-gw`, mostly defined with the appropriate
230+
configuration to run EVPN test cases. The job should be modified to:
231+
* Enable the EVPN feature gate.
232+
* Enable the internal BGP fabric.
233+
234+
[OpenShift Testing Extensions (OTE)][14] will be used for the implementation
235+
combined with the use of the appropriate infrastructure provider so that
236+
upstream test cases can be used downstream as well. Ideally the coverage should
237+
be the same as the existing coverage for P-CUDNs when configured normally except
238+
for those features explicitly called out as not supported in the EVPN upstream
239+
[enhancement][10].
240+
241+
[10]: TODO add link
242+
243+
[14]: https://github.com/openshift-eng/openshift-tests-extension
244+
245+
To test with the internal BGP fabric it would be enough to run selected
246+
East/West test cases for a L2 EVPN as for the most part the internal BGP fabric
247+
is no different from an externally provided one in terms of data plane.
248+
249+
250+
### QE Testing
251+
252+
In a similar vein to E2E tests, QE coverage should constitute a regression of
253+
the existing P-CUDN coverage under a EVPN configuration except for those features
254+
explicitly called out as not supported in the EVPN upstream [enhancement][10].
255+
256+
QE testing should include testing upgrades from a cluster already making use of
257+
the EVPN feature.
258+
259+
QE testing will need to emulate an BGP/EVPN fabric. While any kind of custom and
260+
simplified setup is acceptable, there might be interest in using third party
261+
projects like [containerlab][11].
262+
263+
[11]: https://github.com/srl-labs/containerlab
264+
265+
## Graduation Criteria
266+
267+
The EVPN feature is planned to be provided with technical preview availability
268+
first and then with general availability in a later release. While the main
269+
development effort will take place in context of the upstream project, the
270+
following graduation criteria includes suggestions to prioritize that
271+
development effort in case it can be accommodated.
272+
273+
### Dev Preview -> Tech Preview
274+
275+
- EVPN support for L2 and L3 P-CUDNs
276+
- Use of raw configuration API in FRR-k8s
277+
- Use of node IPs as VTEP IPs
278+
- Sufficient test coverage (E2E, QE)
279+
280+
### Tech Preview -> GA
281+
282+
- Formal FRR-k8s APIs introduced
283+
- Configurable VTEP IPs
284+
- Complete testing, including upgrades
285+
- User facing documentation available in [openshift-docs](https://github.com/openshift/openshift-docs/)
286+
287+
## Upgrade / Downgrade Strategy
288+
289+
This feature has no impacts on upgradeability.
290+
291+
## Version Skew Strategy
292+
293+
N/A
294+
295+
## Operational Aspects of API Extensions
296+
297+
N/A
298+
299+
## Support Procedures
300+
301+
In general, support procedures will be based on the status reported on the
302+
resource instances involved in the EVPN feature which are similar to those
303+
already involved in the existing BGP features, including the status for
304+
`RouteAdvertisements`, `ClusterUserDefinedNetwork` and FRR-k8s resources: any
305+
invalid configuration shall be reported through appropriate status conditions on
306+
those resources.
307+
308+
Those status conditions should have metrics associated, allowing the
309+
configuration of alerts based on those metrics.
310+
311+
Other than that, and given the distributed nature of OVN-Kubernetes, the next
312+
best troubleshoot method relies on the use tools like `iproute2`, `tcpdump`,
313+
`ovn-trace`, `ovs-trace` and `vtysh`, mixing existing knowledge with additional
314+
understanding of FRR and host configuration specific to EVPN that is detailed in
315+
the upstream [enhancement][12]
316+
317+
[12]: https://github.com/ovn-kubernetes/ovn-kubernetes/blob/8461a5526377488bc643bd4eb7024b0b735e8830/docs/okeps/okep-50088-evpn.md#implementation-details
318+
319+
In the future, NetObserv might introduce insights into that host configuration
320+
and facilitate troubleshooting.
321+
322+
## Infrastructure Needed [optional]
323+
324+
The EVPN feature is only supported in baremetal platforms.

0 commit comments

Comments
 (0)