Skip to content

Commit c5dec1c

Browse files
committed
MCO: update bootimage enhancement for marketplace images
Update the bootimage enhancement to account for AWS/GCP/Azure marketplace images, as we as ARO/ROSA offerings.
1 parent 23e4882 commit c5dec1c

File tree

1 file changed

+194
-1
lines changed

1 file changed

+194
-1
lines changed

enhancements/machine-config/manage-boot-images.md

Lines changed: 194 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -37,6 +37,8 @@ For `MachineSet` managed clusters, the end goal is to create automated mechanism
3737

3838
For clusters that are not managed by `MachineSets`, the end goal is to create a document(KB or otherwise) that a cluster admin would follow to update their boot images to be compliant with the acceptable skew. In such cases, the admin will be expected to record their cluster's boot image in the skew enforcement API object.
3939

40+
Aside from standard self-managed OpenShift clusters using ART-shipped builds from the CoreOS team, managed OpenShift offerings create their own boot media on supported cloud platforms (AWS, GCP, Azure) for marketplace offerings (AWS Marketplace, GCP Marketplace, Azure Marketplace), managed services (ROSA, ARO), or specialized variants (OCP, OPP, OKE). The MCO will also detect which "stream" a boot image belongs to and ensure updates occur within the same stream, and will have platform-specific detection logic and coordination between Installer, CoreOS, and Managed Services teams to properly identify and track both historical and future boot images across all supported platforms.
41+
4042

4143
## Motivation
4244

@@ -65,6 +67,8 @@ This is also a soft pre-requisite for both dual-stream RHEL support in OpenShift
6567

6668
* As a cluster administrator, having to keep track of a "boot" vs "live" image for a given cluster is not intuitive or user friendly. In the worst case scenario, I will have to reset a cluster(or do a lot of manual steps with rh-support in recovering the node) simply to be able to scale up nodes after an upgrade. If I'm managing a `MachineSet` managed cluster, once opted in, this feature will be a "switch on and forget" mechanism for me. If I'm managing a non `Machineset` managed cluster, this would provide me with documentation that I could follow after an upgrade to ensure my cluster has the latest bootimages.
6769

70+
* As a cluster administrator running OpenShift on marketplace offerings (AWS Marketplace, GCP Marketplace, or Azure Marketplace) or managed services (ROSA, ARO), I need my boot images to stay within the same billing and licensing stream when they're automatically updated. I should be able to opt into automatic boot image management with a simple configuration change, without worrying about whether my marketplace or managed service cluster will be correctly identified. The MCO should automatically detect my deployment type and update my boot images to the appropriate marketplace or managed service variant, ensuring I maintain compliance with my licensing agreements and billing arrangements.
71+
6872
### Goals
6973

7074
The MCO will take over management of the boot image references and the stub Ignition configuration. The installer is still responsible for creating the `MachineSet` at cluster bring-up, but once cluster installation is complete the MCO will ensure that boot images are in sync with the latest payload. From the user standpoint, this should cause less compatibility issues as nodes will no longer need to pivot to a substantially different version of RHCOS during node scaleup.
@@ -110,13 +114,14 @@ It is important to note that there would be two "opt-in" knobs while this featur
110114

111115
See the API extension section for examples of how this feature can be turned on and off.
112116

113-
#### Variation and form factor considerations [optional]
117+
#### Variation and form factor considerations
114118

115119
Any form factor using the MCO and `MachineSets` will be impacted by this proposal. So case by case:
116120
- Standalone OpenShift: Yes, this is the main target form factor.
117121
- microshift: No, as it does [not](https://github.com/openshift/microshift/blob/main/docs/contributor/enabled_apis.md) use `MachineSets`.
118122
- Hypershift: No, Hypershift does not have this issue.
119123
- Hive: Hive manages `MachineSets` via `MachinePools`. The MachinePool controller generates the `MachineSets` manifests (by invoking vendored installer code) which include the `providerSpec`. Once a `MachineSet` has been created on the spoke, the only things that will be reconciled on it are replicas, labels, and taints - [unless a backdoor is enabled](https://github.com/openshift/hive/blob/0d5507f91935701146f3615c990941f24bd42fe1/pkg/constants/constants.go#L518). If the `providerSpec` ever goes out of sync, a warning will be logged by the MachinePool controller but otherwise this discrepancy is ignored. In such cases, the MSBIC will not have any issue reconciling the `providerSpec` to the correct boot image. However, if the backdoor is enabled, both the MSBIC and the MachinePool Controller will attempt to reconcile the `providerSpec` field, causing churn. The Hive team has [updated the comment](https://github.com/openshift/hive/pull/2596/files) on the backdoor annotation to indicate that it is mutually exclusive with this feature.
124+
- Marketplace and Managed Services clusters: Yes. We will need to additionally introduce "streams" of metadata, such that each marketplace (OCP/OPP/OKE) and managed cluster (ROSA/ARO) can update to the right image for its cluster type. This will require us to both detect the origin stream/cluster type of an existing bootimage, as well as ship a manifest for the latest image reference per stream. See [Boot Image Stream Management](#boot-image-stream-management) section for details.
120125

121126
##### Supported platforms
122127

@@ -319,6 +324,194 @@ When [MachineDeployments](https://cluster-api.sigs.k8s.io/developer/architecture
319324

320325
Much of the existing design regarding architecture & platform detection, opt-in, degradation and storing boot image history can remain the same.
321326

327+
### Boot Image Stream Management
328+
329+
To properly manage boot images across different deployment scenarios and platforms, a stream-based approach is necessary. Each stream represents a distinct boot image variant that serves different use cases (e.g., standard IPI installations, marketplace offerings, managed service deployments).
330+
331+
Each team that currently owns bootimage creation for these platforms will now have the additional responsibility of tagging the image accordingly with stream metadata, as well as update the installer's bootimage data, preferrably at least once per y-stream.
332+
333+
#### Stream Definitions
334+
335+
The following streams will need to be detectable and shipped with corresponding metadata such that the MCO can detect and find the latest image for the stream. This will require coordination between Installer, RHCOS, MCO, Marketplace and ARO/ROSA teams:
336+
337+
**AWS Streams:**
338+
- IPI - Standard installer-provisioned infrastructure on AWS
339+
- Marketplace - AWS Marketplace published images
340+
- ROSA - Red Hat OpenShift Service on AWS
341+
342+
**GCP Streams:**
343+
- IPI - Standard installer-provisioned infrastructure on GCP
344+
- Marketplace - GCP Marketplace published images (OCP, OPP, OKE variants)
345+
346+
**Azure Streams:**
347+
- IPI/ARO - Standard installations and Azure Red Hat OpenShift (both HyperV Gen1 and Gen2)
348+
- Marketplace - Azure Marketplace published images (paid offerings with OCP, OPP, OKE variants)
349+
350+
**OKD Streams:**
351+
- SCOS-OKD - Singular SCOS stream on AWS
352+
353+
The MCO examines the current MachineSet boot image, determines its stream, and then determines the desired image from the appropriate stream metadata. Once all known streams are handled, the MCO should enter a degraded state if it expects to manage a boot image but cannot determine its stream.
354+
355+
#### Platform-Specific Stream Detection and Update Strategy
356+
357+
##### AWS
358+
359+
###### History Bootimage Tracking (Existing Clusters)
360+
361+
For existing upgrade clusters, determining the boot image source requires examining multiple data sources:
362+
363+
**AWS MachineSet backed installer-shipped image:**
364+
- Create a historical list of all installer-shipped AMIs from RHCOS.json files
365+
- Match current MachineSet AMI against this historical data
366+
367+
**ROSA:**
368+
- Create a list of all ROSA AMIs based on GitLab cluster image set metadata (see Red Hat internal Gitlab instance - service/clusterimagesets)
369+
370+
**AWS Marketplace:**
371+
Two potential approaches:
372+
1. Filter out all non-ROSA marketplace published images and curate a list
373+
2. Default to not supported - require user to update to a "new" marketplace image once skew enforcement is in place
374+
375+
**Detection fallback:**
376+
If the boot image is not found in any historical list:
377+
- Use AWS SDK to check publisher and RHCOS version:
378+
- **Deregistered/Not Found**: We will estimate the skew based on the install version of the cluster, and have the skew enforcement rules apply to it as normal
379+
- **Marketplace-published**: Could be UPI or ROSA - apply normal skew enforcement rules, but require update to future images with proper metadata
380+
- **RHCOS**: Follow regular skew rules (consider raising warning since it should be in the historical list)
381+
382+
###### Future Bootimage Tracking (New Images)
383+
384+
**Installer changes:**
385+
386+
Create multiple JSON files of AMI variant metadata for different streams:
387+
- rhcos (standard IPI)
388+
- rosa
389+
- marketplace-ocp
390+
- marketplace-oke
391+
- marketplace-opp
392+
393+
**Image creation:**
394+
395+
- All variants are tagged with `variantType` metadata to enable deterministic stream detection
396+
- The tagging will be done with AWS's [tagSet](https://docs.aws.amazon.com/AWSEC2/latest/APIReference/API_Image.html) field, with `variantType as the Key and stream name as the value
397+
398+
##### GCP
399+
400+
###### History Bootimage Tracking (Existing Clusters)
401+
402+
GCP boot images use format: `projects/<project-name>/global/images/<image-name>`, which we can use to parse the historical stream where possible.
403+
404+
**GCP IPI installs:**
405+
- Installer images have project name `rhcos-cloud`
406+
407+
**GCP Marketplace:**
408+
- Marketplace images have project name `redhat-marketplace-public`
409+
- The image name should contain the variant.
410+
411+
**GCP UPI:**
412+
- UPI installations typically upload their own images
413+
- These fall into the non-managed case and require manual updates
414+
415+
**Managed:**
416+
- No GCP managed service currently exists (see Open Questions regarding Dedicated)
417+
418+
**Detection fallback:**
419+
420+
If the image project parsing does not match any known streams or variants, we will default to requiring the user to do so manually.
421+
422+
###### Future Bootimage Tracking (New Images)
423+
424+
**Installer changes:**
425+
426+
Create stream metadata files for:
427+
- rhcos (GCP IPI)
428+
- marketplace-ocp
429+
- marketplace-oke
430+
- marketplace-opp
431+
432+
**Image creation:**
433+
434+
While the image project can presently be used to detect stream, we should nevertheless do the same variantType tagging to make it explicit.
435+
436+
For GCP, we will leverage [labels](https://cloud.google.com/compute/docs/labeling-resources) with `varianttype` as key and the stream name as value.
437+
438+
###### Open Questions
439+
440+
While GCP does not have a managed service offering, there is OpenShift Dedicated on GCP. We should check if they have special boot media in use.
441+
442+
##### Azure
443+
444+
###### History Bootimage Tracking (Existing Clusters)
445+
446+
**Azure IPI installs and ARO (Azure Red Hat OpenShift)**
447+
448+
ARO and regular IPI installs use the same images.
449+
450+
Check the ProviderSpec `image` field structure:
451+
```yaml
452+
image:
453+
offer: ''
454+
publisher: ''
455+
resourceID: /resourceGroups/...
456+
sku: ''
457+
version: ''
458+
```
459+
460+
If `offer`, `publisher`, `sku`, and `version` are already set, this is already using the new unpaid marketplace image for IPI. We can check `publisher` for `azureopenshift`.
461+
462+
If only `resourceID` is set (other fields empty), this is a pre-4.20 RHCOS IPI boot image.
463+
464+
We would also need to check HyperV generation. For images that use ResourceID, "" is gen1, and "gen2" is gen2, whereas for newer images, "gen1" indicates gen1, and otherwise we default to gen2.
465+
466+
**Azure paid marketplace:**
467+
468+
Check ProviderSpec for populated `offer` and `publisher` field:
469+
```yaml
470+
image:
471+
offer: 'rh-ocp-worker'
472+
publisher: 'redhat'
473+
resourceID:
474+
sku: 'rh-ocp-worker'
475+
version: '...'
476+
```
477+
478+
This will have 6 combinations:
479+
{"redhat", "rh-ocp-worker"}
480+
{"redhat", "rh-opp-worker"}
481+
{"redhat", "rh-oke-worker"}
482+
{"redhat-limited", "rh-ocp-worker"}
483+
{"redhat-limited", "rh-opp-worker"}
484+
{"redhat-limited", "rh-oke-worker"}
485+
486+
The last 3 are for the EMEA region specifically.
487+
488+
###### Future Bootimage Tracking (New Images)
489+
490+
**Installer changes:**
491+
- There is already [implemented stream metadata for Azure](https://github.com/openshift/installer/pull/9329)
492+
- Azure IPI/ARO HyperV Gen1
493+
- Azure IPI/ARO HyperV Gen2
494+
- Marketplace ocp-gen1
495+
- Marketplace ocp-gen2
496+
- Marketplace opp-gen1
497+
- Marketplace opp-gen2
498+
- Marketplace oke-gen1
499+
- Marketplace oke-gen2
500+
- Marketplace EMEA ocp-gen1
501+
- Marketplace EMEA ocp-gen2
502+
- Marketplace EMEA opp-gen1
503+
- Marketplace EMEA opp-gen2
504+
- Marketplace EMEA oke-gen1
505+
- Marketplace EMEA oke-gen2
506+
507+
**Image creation:**
508+
509+
Images are already created with necessary metadata, via publisher, offer, sku, and HyperV generation metadata. No changes should be required.
510+
511+
##### OKD
512+
513+
Special mention for OKD: we only have AWS images updated via SCOS.json, and only supported in us-east-1 (OKD only publishes to us-east-1 and uses live-replication for other regions).
514+
322515
### API Extensions
323516

324517
#### Opt-in Mechanism

0 commit comments

Comments
 (0)