diff --git a/enhancements/monitoring/assets/optional-monitoring-capability-cluster-settings-am-page.png b/enhancements/monitoring/assets/optional-monitoring-capability-cluster-settings-am-page.png new file mode 100644 index 0000000000..5612160610 Binary files /dev/null and b/enhancements/monitoring/assets/optional-monitoring-capability-cluster-settings-am-page.png differ diff --git a/enhancements/monitoring/assets/optional-monitoring-capability-deployment-page.png b/enhancements/monitoring/assets/optional-monitoring-capability-deployment-page.png new file mode 100644 index 0000000000..6cfb85a305 Binary files /dev/null and b/enhancements/monitoring/assets/optional-monitoring-capability-deployment-page.png differ diff --git a/enhancements/monitoring/assets/optional-monitoring-capability-devconsole-observer-page.png b/enhancements/monitoring/assets/optional-monitoring-capability-devconsole-observer-page.png new file mode 100644 index 0000000000..061a723668 Binary files /dev/null and b/enhancements/monitoring/assets/optional-monitoring-capability-devconsole-observer-page.png differ diff --git a/enhancements/monitoring/assets/optional-monitoring-capability-devconsole-topology-page.png b/enhancements/monitoring/assets/optional-monitoring-capability-devconsole-topology-page.png new file mode 100644 index 0000000000..782930d10b Binary files /dev/null and b/enhancements/monitoring/assets/optional-monitoring-capability-devconsole-topology-page.png differ diff --git a/enhancements/monitoring/assets/optional-monitoring-capability-metrics-page.png b/enhancements/monitoring/assets/optional-monitoring-capability-metrics-page.png new file mode 100644 index 0000000000..5d00779b04 Binary files /dev/null and b/enhancements/monitoring/assets/optional-monitoring-capability-metrics-page.png differ diff --git a/enhancements/monitoring/assets/optional-monitoring-capability-overview-page.png b/enhancements/monitoring/assets/optional-monitoring-capability-overview-page.png new file mode 100644 index 0000000000..7e88a3eede Binary files /dev/null and b/enhancements/monitoring/assets/optional-monitoring-capability-overview-page.png differ diff --git a/enhancements/monitoring/assets/optional-monitoring-capability-pod-page.png b/enhancements/monitoring/assets/optional-monitoring-capability-pod-page.png new file mode 100644 index 0000000000..87b020b893 Binary files /dev/null and b/enhancements/monitoring/assets/optional-monitoring-capability-pod-page.png differ diff --git a/enhancements/monitoring/optional-monitoring-capability.md b/enhancements/monitoring/optional-monitoring-capability.md new file mode 100644 index 0000000000..209b48fc54 --- /dev/null +++ b/enhancements/monitoring/optional-monitoring-capability.md @@ -0,0 +1,967 @@ +--- +title: optional-monitoring-capability +authors: + - @rexagod +reviewers: # Include a comment about what domain expertise a reviewer is expected to bring and what area of the enhancement you expect them to focus on. For example: - "@networkguru, for networking aspects, please look at IP bootstrapping aspect" + - @simonpasquier # For monitoring aspects, please look at the overall design and the impact on existing monitoring stack + - @jan--f # For monitoring aspects, please look at the overall design and the impact on existing monitoring stack + - @wking # For CVO aspects, please look at the upgrade strategy and the overall impact and integration with CVO + - @everettraven # For API and enhancement process aspects, please look at the API design and the enhancement process followed +approvers: # A single approver is preferred, the role of the approver is to raise important questions, help ensure the enhancement receives reviews from all applicable areas/SMEs, and determine when consensus is achieved such that the EP can move forward to implementation. Having multiple approvers makes it difficult to determine who is responsible for the actual approval. + - @everettraven +api-approvers: # In case of new or modified APIs or API extensions (CRDs, aggregated apiservers, webhooks, finalizers). If there is no API change, use "None" + - @simonpasquier # For CMO API aspects, please look at the API design and the impact on existing CMO API + - @jan--f # For CMO API aspects, please look at the API design and the impact on existing CMO API +creation-date: 2025-10-26 +last-updated: 2025-10-26 +tracking-link: # link to the tracking ticket (for example: Jira Feature or Epic ticket) that corresponds to this enhancement + - https://issues.redhat.com/browse/MON-4310 +see-also: + - https://issues.redhat.com/browse/MON-4311 # "telemetry" collection profile epic +--- + + + +# Optional Monitoring Capability + + + +## Summary + + + +This enhancement proposes introducing a [cluster capability] to +allow disabling optional components of the in-cluster monitoring +stack. Note that: +(a) by "optional" we mean components, or parts of components, that +are not required for [telemetry operations], and, +(b) the capability will be enabled by default (implicity enabled), +to preserve the historical UX. + +[cluster-capability]: /enhancements/installer/component-selection.md +[telemetry operations]: https://docs.redhat.com/en/documentation/openshift_container_platform/4.18/html/support/remote-health-monitoring-with-connected-clusters + +## Motivation + + + +Cluster admins that do not require a full-fledged monitoring solution +can choose to save resources by disabling optional components of +the in-cluster monitoring stack. Components responsible for telemetry +data collection and forwarding (i.e., the non-optional components) +will remain enabled to ensure that telemetry operations are not +affected. + +Note that while it is possible to disable certain components of the +in-cluster monitoring stack through the configuration in a on-off +fashion, implementing this behavior through a cluster capability +will help us differentiate between a use-case that does not require +individual monitoring components and a use-case that requires +telemetry operations only. + +Based on this information, we can code certain behaviors in the +monitoring operator that wouldn't otherwise make sense, and would +help reduce the overall monitoring footprint not just across the +stack, but the cluster itself (since we'd be sure of the intent), +such as [setting the `METRICS_SET` for hypershift clusters] to +`Telemetry` or disabling support for the [PromQL cluster condition +type] in CVO, when the capability is disabled. + +[setting the `METRICS_SET` for hypershift clusters]: https://hypershift-docs.netlify.app/how-to/metrics-sets/ +[PromQL cluster condition type]: enhancements/update/targeted-update-edge-blocking.md#cluster-condition-type-registry + +### User Stories + + + +> As a cluster administrator, I want to be able to disable as much +of the monitoring footprint as possible without breaking the cluster, +including any managed manifests, so that I can minimize the resource +consumption of monitoring on my cluster. +> As a cluster administrator, I want to be able to disable as much +of the in-cluster monitoring stack as possible right from the +install-time, so I don't have to manually configure the in-cluster +monitoring stack, as well as other components that depend on it, +after the cluster is up and running. + +### Goals + + + +* Allow reducing the monitoring footprint on clusters that do not + require a full-fledged monitoring solution, while ensuring that + telemetry operations are not affected (admins can still disable + telemetry if they want to). +* Teach other components about the monitoring capability, so they + can adjust their behavior accordingly (e.g., CVO, hypershift). +* Introduce the ability to harness this feature at install-time, + so that admins don't have to manually configure the in-cluster + monitoring stack, as well as other components that depend on it, + after the cluster is up and running. This pattern additionally + allows for more components to be "taught" about the capability + to become more resource-aware from a monitoring perspective from + the get-go. +* Allow other components to have a clear signal about whether there's + a monitoring stack deployed, by building over cluster capabilities, + rather than relying on heuristics (e.g., checking for the presence + of certain resources). +* Preserve the historical UX by enabling the capability by default + (implicitly enabled). +* Ensure that disabling the capability does not affect telemetry + operations. +* Owing to the reduced volume of monitoring data collected, move + away from the HA Prometheus configuration to a single replica + configuration, when the capability is disabled. +* Disable all optional components of the in-cluster monitoring stack, + when the capability is disabled. This includes: + - Alertmanager + - Thanos Ruler + - Thanos Querier + - The User Workload Monitoring Stack +* Expose the capability's status through logs, an info metric, as + well as `ClusterOperator`'s status. + +### Non-Goals + +What is out of scope for this proposal? Listing non-goals helps to +focus discussion and make progress. Highlight anything that is being +deferred to a later phase of implementation that may call for its own +enhancement. + +* Monitoring components that are not required for telemetry operations, + but still valuable for monitoring use-cases, will not be disabled + when the capability is disabled. For example, metrics-server will + remain enabled to ensure that the autoscaling pipelines are not + affected. +* Monitoring components that other capabilities rely on, and disabling + them would essentially break those capabilities. For example, the + monitoring plugin is a part of the `Console` capability, and + disabling it would break the `Observe` section of the console's + UI. This non-goal is in-line with the [capability's expectations], + and follows the same pattern as monitoring dashboards, which are + a part of the same capability, and depend on the plugin to be + functional. +* Disabling the deployment of any CRDs from the monitoring operator + that may relate to one or more optional components of the in-cluster + monitoring stack. This is to ensure that the monitoring as well as + Prometheus operators do not break. + +[capability's expectations]: https://www.google.com/url?q=https://redhat-internal.slack.com/archives/C6A3NV5J9/p1753979917745059?thread_ts%3D1753965750.365569%26cid%3DC6A3NV5J9&sa=D&source=docs&ust=1761426482836619&usg=AOvVaw0JWQa8VgR3KWQeMtjQXrR_ + +## Proposal + + + +### Workflow Description + + + +The capability will be implicitly enabled to preserve the UX, i.e., +deploy all monitoring components, optional or otherwise, by default. + +There are two possible workflows for triggering capability: + +1. During install-time, the capability is explicitly disabled by +setting the `baselineCapabilitySet` to `None`, followed by adding +desired capabilities to the `additionalEnabledCapabilities` list +in the `install-config.yaml` file. The capability can be enabled +later on, during run-time. +2. During install-time, the capability is implicitly enabled by +pointing the `baselineCapabilitySet` to `ClusterVersionCapabilitySet4_22` +(or `ClusterVersionCapabilitySetCurrent` on 4.22+ clusters). Once +done, the capability cannot be disabled. + +Once enabled, the workflow for the capability is as follows: + +1. CVO manages the lifecycle of all [manifests opted-into the +capability], housed under the monitoring operator, and deploys them. +2. The monitoring operator detects the status of the capability +through the `client-go/config` client-set and deploys all assets, +including the ones opted-into the capability. +3. Components external to the monitoring stack, such as hypershift +and CVO, detect the status of the capability and adjust their +behavior accordingly (usually no change in behavior as the UX is +preserved when the capability is enabled). + +Once disabled, the workflow for the capability is as follows: + +1. CVO manages the lifecycle of all [manifests opted-into the +capability], housed under the monitoring operator, and removes them. +2. The monitoring operator detects the status of the capability +through the `client-go/config` client-set and ignores deploying all +assets that are opted-into the capability. +3. Components external to the monitoring stack, such as hypershift +and CVO, detect the status of the capability and adjust their +behavior accordingly, i.e., deploy telemetry-specific assets +(`PrometheusRules` and `ServiceMonitors`), modify features ([setting +`METRICS_SET`] to `Telemetry` for hypershift clusters, or disabling +support for the [PromQL cluster condition type] in CVO) that +complement the capability and only build on the telemetry-only +mindset. + +[manifests opted-into the capability]: https://github.com/openshift/cluster-monitoring-operator/pull/2675/files#diff-26fe2f1c8593ae2dfb847204c98f43175784a23ea434ed197882645d294eeba3R5 +[setting `METRICS_SET`]: https://hypershift-docs.netlify.app/how-to/metrics-sets/ +[PromQL cluster condition type]: enhancements/update/targeted-update-edge-blocking.md#cluster-condition-type-registry + +### API Extensions + + + +Introduces a new cluster capability named `OptionalMonitoring` under +`config/v1` (`ClusterVersionCapabilityOptionalMonitoring`). + +### Topology Considerations + +#### Hypershift / Hosted Control Planes + + + +Disabling the capability translates to hypershift clusters pointing +[`METRICS_SET` environment variable] to `Telemetry`, in order to +minimize the monitoring footprint while ensuring that telemetry +operations are not affected. + +[`METRICS_SET` environment variable]: https://github.com/openshift/hypershift/blob/9e76b0a736a85e29fd69a76bca2b2968aa8db8b9/hack/dev/aws/run-operator-locally-aws-dev.sh#L4 + +#### Standalone Clusters + + + +For standalone clusters, the behavior is the same as described in the +**Workflow Description** section. + +#### Single-node Deployments or MicroShift + + + +Since MicroShift does not directly rely on the components deployed +by CMO for its monitoring needs (for e.g., [metrics-server]), the +capability will not have any direct effect on MicroShift deployments. +MicroShift [remote-writes] it's telemetry data to the `metrics/v1/receive` +endpoint, where the rules are validated against the [telemetry +whitelist], in addition to other operations. Note that these rules +are the ones used to draw its dashboards, which is limited to +[grafana-dashboard-microshift-telemetry] at this time. + +[telemetry whitelist]: https://github.com/openshift/cluster-monitoring-operator/blob/main/manifests/0000_50_cluster-monitoring-operator_04-config.yaml +[remote-writes]: https://github.com/openshift/microshift/blob/c35ae12248d1c94e45f73d81ceba79b3c1967bcf/docs/user/howto_config.md?plain=1#L281 +[metrics-server]: https://github.com/openshift/microshift/blob/c35ae12248d1c94e45f73d81ceba79b3c1967bcf/docs/user/howto_metrics_server.md +[grafana-dashboard-microshift-telemetry]: https://github.com/openshift/microshift/blob/c35ae12248d1c94e45f73d81ceba79b3c1967bcf/dashboards/grafana-dashboard-microshift-telemetry.configmap.yaml + +### Implementation Details/Notes/Constraints + + + +The implementation details are as follows: +* Annotate optional monitoring resources: CMO's metric rules and +metrics exporters have not been opted-in to keep the telemetry rules +functioning. All annotated resources under `manifests/` will be added +or dropped by CVO based on the capabilities that are applied on the +cluster (currently effected by `Console` and `OptionalMonitoring` +capabilities only). The optional components include: + - Alertmanager + - Thanos Ruler + - Thanos Querier + - The User Workload Monitoring Stack +* Add optional monitoring logic in CMO: Enabling the `OptionalMonitoring` +capability translates to enabling all optional monitoring components +under CMO, in addition to any change in behavior that components +in the wider OpenShift ecosystem may exhibit based on that. Note +that since capabilities cannot be disabled once enabled, **cleanup +for optional monitoring resources is not necessary**. + * Expose capability status: + - CMO logs: Expose whether the capability is enabled or disabled + through logs. + - CMO info metric: Expose whether the capability is enabled or + disabled through an info metric. + - `ClusterOperator` status: Expose whether the capability is + enabled or disabled through a `ClusterOperator` condition. +* Teach other components about the capability: + - Hypershift: Setting the `METRICS_SET` environment variable + to `Telemetry`, when the capability is disabled. + - Hypershift: Allow configuring the capability [from the CLI]. + - CVO: Disabling support for the [PromQL cluster condition type], + when the capability is disabled. +* Validation webhook: Add validation webhook checks for the capability + in CMO, to ensure that irrelevant fields are not respected when + the capability is disabled. For example, the operator should + reject any user-workload monitoring configurations when the + capability is disabled, since all user-workload monitoring + components will be disabled in this case. + +[from the CLI]: https://github.com/openshift/hypershift/blob/main/cmd/cluster/core/create.go#L111-L112 +[PromQL cluster condition type]: enhancements/update/targeted-update-edge-blocking.md#cluster-condition-type-registry + +> Note that the capability may be accompanied by switching to the +`telemetry` collection profile, which is responsible for narrowing +down Prometheus' service discovery to only telemetry-related targets +through dedicated `ServiceMonitor`s (as well as `PrometheusRule`s, +to not evaluate rules that don't exist anymore). See [MON-4311] +for more info. Furthermore, `telemeter-client` may be dropped in +light of the fact that telemetry data can be forwarded through +remote write to the `metrics/v1/receive` endpoint directly. Users +will still continue to update the set of allowed rules, to ensure +existing validation processes are still enforced from the server-side. + +[MON-4311]: https://issues.redhat.com/browse/MON-4311 + +### Risks and Mitigations + + + +* On Hypershift, the outgoing CVO instance is not aware if it's + about to be updated. This leads to CVO not being able to + compare the incoming and outgoing manifests to gauge if any + capabilities should be [implicitly enabled]. See [OTA-823] for + more info. +* Capabilities may rely on each other, leading to complex + inter-dependencies. For example, the `Console` capability relies + on the monitoring plugin to be functional, which was earlier + considered an optional component. However, since we do not want the + UI to break, as well as support all dashboards (targets are only + narrowed down when the collection profile is set to `telemetry`), + we decided to not disable the monitoring plugin when the + `OptionalMonitoring` capability is disabled. +* Enabling the capability requires "teaching" other components about it. These components may fall into a range of categories: + * Operators + * [Insights Operator](https://github.com/openshift/insights-operator/blob/cd1b8582f62385d67eabc823547a44dce4a3d938/pkg/gatherers/clusterconfig/recent_metrics.go#L63) + * [Cluster Management Metrics Operator](https://github.com/project-koku/koku-metrics-operator/blob/1b11260ee000ae07ae57e8c0f3eb4896b8c70dab/api/v1beta1/defaults.go#L22) + * [Cluster Kube Controller Manager Operator](https://github.com/openshift/cluster-kube-controller-manager-operator/blob/5e9fe6765b8a4b363d8e355e8b68fbe2ed9c3fda/pkg/operator/gcwatchercontroller/gcwatcher_controller.go#L154-L192) + * [Cluster etcd Operator](https://github.com/openshift/cluster-etcd-operator/blob/main/pkg/operator/metriccontroller/client.go#L46) + * [Cluster Version Operator](https://github.com/openshift/cluster-version-operator/blob/420e6d07d80f501cbed3d5ce6f6596323d4fdce5/pkg/clusterconditions/clusterconditions.go#L123) + * [Cincinnati Graph Data](https://github.com/openshift/cincinnati-graph-data/blob/bab100b5a88ad22039da9c795e8a7c9a10ba1a63/blocked-edges/4.19.9-NoCloudConfConfigMap.yaml) + * [Cluster Logging Operator](https://github.com/openshift/cluster-logging-operator/blob/28c8b02e041fbe8c978ddf20cb06400f97f07b8e/test/e2e/flowcontrol/utils.go#L68) + * [Grafana Tempo Operator](https://github.com/openshift/grafana-tempo-operator/blob/main/internal/manifests/queryfrontend/query_frontend.go#L413) + * [Console Operator](https://github.com/openshift/console/blob/main/pkg/server/server.go#L423-L442) + * Tools (some of these may be outdated) + * [Cluster Health Analyzer](https://github.com/openshift/cluster-health-analyzer/blob/9b6518688ab76a219e838bf5de103661ba7ec74f/manifests/mcp/02_deployment.yaml#L29) + * [Incluster Anomaly Detection](https://github.com/openshift/incluster-anomaly-detection/blob/main/src/common/config.py) + * [Predictive VPA Recommenders](https://github.com/openshift/predictive-vpa-recommenders/blob/3d209317a76560d315acdf654a6c6e9e330eb271/recommender_config.yaml#L3) + * [oc](https://github.com/openshift/oc/blob/9ae657dff111d36d75300c4823b7aae4b504c7e4/pkg/cli/admin/inspectalerts/inspectalerts.go#L107) + * AI-ingested documentation (which may not be valid once the capability is in effect) + * [Lightspeed RAG Content](https://github.com/search?q=repo%3Aopenshift%2Flightspeed-rag-content%20thanos&type=code), etc. + +[implicitly enabled]: https://github.com/openshift/cluster-version-operator/blob/main/lib/manifest/manifest.go#L46 +[OTA-823]: https://issues.redhat.com/browse/OTA-823 + +### Drawbacks + + + +* Introducing this capability requires "teaching" other components + about it, in order to handle their monitoring assets accordingly, + which may lead to a fragmented UX if not done consistently across + the board. +* Telemetry will loose the `ALERTS` signal. All components querying + that signal (for e.g., the [Insights operator] will need to be + updated to not rely on it anymore, once the capability is in + effect. + +[Insights operator]: https://github.com/openshift/insights-operator/blob/cd1b8582f62385d67eabc823547a44dce4a3d938/pkg/gatherers/clusterconfig/recent_metrics.go#L81 + +## Alternatives (Not Implemented) + + + +* One could argue that there's no need for a capability, and that + its enforcement could be done through the monitoring operator's + configuration itself. However, this approach would not allow other + components in the OpenShift ecosystem to adjust their behavior based + on whether the monitoring stack is deployed or not, leading to a + fragmented UX. + +## Open Questions [optional] + + + +1. Opt `Console` capability dashboards housed under CMO into +`OptionalMonitoring` as well? + > No, since exporters are not disabled, and the targets are left + untouched, so we can continue supporting all dashboards even under + optional monitoring (unless the collection profile is set to + `telemetry`). Note that this is in-line with the [Console team's expectations]. + +2. Do we want to make the plugin reliant on the monitoring capability? +Doing so would make sense to not degrade the UI by dropping it when +the capability is in effect, but on the other hand, dashboards that +would otherwise be supported in optional monitoring won't be shown +anymore. + > It might be a good idea to loop in the UI team and have the + plugin drop the "degraded" UI elements only (AM pages) + once it detects the capability has been enabled? See below for + a non-exhaustive list of affected areas. +
+ Console areas affected by disabling the monitoring plugin + + - Overview + ![Overview](./assets/optional-monitoring-capability-overview-page.png) + - Pods + ![Pods](./assets/optional-monitoring-capability-pod-page.png) + - Deployments + ![Deployments](./assets/optional-monitoring-capability-deployment-page.png) + - Alertmanager + ![Alertmanager](./assets/optional-monitoring-capability-cluster-settings-am-page.png) + - Metrics + ![Metrics](./assets/optional-monitoring-capability-metrics-page.png) + - Developer Perspective + - Topology + ![Topology](./assets/optional-monitoring-capability-devconsole-topology-page.png) + - Dashboards + ![Monitoring Dashboards](./assets/optional-monitoring-capability-devconsole-observer-page.png) + - [More](https://github.com/openshift/cluster-monitoring-operator/blob/main/manifests/0000_90_cluster-monitoring-operator_01-dashboards.yaml) [Dashboards](https://github.com/openshift/cluster-monitoring-operator/blob/main/manifests/0000_90_cluster-monitoring-operator_02-dashboards.yaml) + +
+ +3. Should we make the "telemetry" collection profile (a) internal-only, +and (b) enabled for optional monitoring? + > Since optional monitoring is indeed focused on telemetry operations + only, it makes sense to enable the "telemetry" collection profile + when the capability is disabled. This allows us to actually regulate + the metrics ingestion from exporters that would otherwise push data + into Prometheus that is not telemetry-related. + +4. Should we downscale Prometheus to a single replica when the +capability is disabled? + > Yes, since the monitoring footprint is reduced significantly + when the capability is disabled, moving away from an HA setup + to a single replica setup makes sense from a resource consumption + perspective. All components across OpenShift that rely on Thanos + will need to be "taught" to query the single Prometheus replica + directly instead of going through Thanos Querier. + +5. When users opt-out of telemetry while the capability is disabled, +what should be the behavior? + > Since telemetry is orthogonal to the capability itself, opting + out of telemetry should lead to disabling all telemetry-related + components (e.g., exporters, telemetry-specific `PrometheusRules` + and `ServiceMonitors`), while keeping the optional monitoring + components disabled as well. This essentially leads to a + monitoring-less cluster. Note that `metrics-server` is an + exception, and will remain enabled to ensure that the autoscaling + pipelines are not affected. + +6. Should the capability be named `OptionalMonitoring` or just `Monitoring`? + > `OptionalMonitoring` makes it clear that the monitoring stack + is optional, and can be disabled to reduce the monitoring footprint. + Naming it just `Monitoring` may lead to confusion, as it may + imply that the monitoring stack is always deployed. Broadly, + this could also curb our ability to name future capabilities. + I believe this makes sense for `Console` since it uses its + capability to mark all Console-related resources OpenShift-wide, + however, we will use `Monitoring` to mark all telemetry-only-related + resources, and we may want to do the same for X-only-related + resources in the future. + +[Console team's expectations]: https://redhat-internal.slack.com/archives/C6A3NV5J9/p1753979917745059?thread_ts=1753965750.365569&cid=C6A3NV5J9 + +## Test Plan + + + +* Ensure that the monitoring operator deploys or ignores optional + monitoring components based on the capability's status. +* Ensure the capability is applied correctly during install-time, + both when enabled and disabled. This entails checking not just + the monitoring operator's relevant resources and signals, but + also all external resources that modify their behavior based on + the capability's status, so we can catch if those break in the + future. + +## Graduation Criteria + + + +### Dev Preview -> Tech Preview + + + +* Introduce a mirroring feature-gate for the capability, to + allow toggling the capability during run-time for testing + purposes. +* Implement monitoring operator logic to deploy or ignore + optional monitoring components based on the capability's + status. +* Introduce or vendor the capability wherever needed (CVO, installer, + api). + +### Tech Preview -> GA + + + +* TBD + +### Removing a deprecated feature + +Dropping the capability from any component (manifest) should be +straightforward, as long as the component is not relied upon by any +other capability or component. Essentially, the opposite of [adding +a new manifest under the capability]. + +[adding a new manifest under the capability]: https://github.com/openshift/enhancements/blob/23e4882a018c4cb07afbbc7a9bb5c92760f2f0d8/enhancements/installer/component-selection.md#:~:text=If%20you%20forget,implicitly%20enable%20it. + +## Upgrade / Downgrade Strategy + + + +Same upgrade and downgrade strategy as other capabilities apply, PTAL [component-selection.md#upgrade-downgrade-strategy]. + +[component-selection.md#upgrade-downgrade-strategy]: https://github.com/openshift/enhancements/blob/23e4882a018c4cb07afbbc7a9bb5c92760f2f0d8/enhancements/installer/component-selection.md#upgrade--downgrade-strategy + +## Version Skew Strategy + + + +Behaviors for the capability will be added to other components +gradually, so it's possible that users of a component that sees +such an addition in a later release may ask for a backport, which, +under general circumstances, will not be pursued, due to the +complexity and scale of inter-dependencies that the capability may +have. + +Other than that, a skew will not disrupt the capability's behavior +or expectations. + +## Operational Aspects of API Extensions + + + +CMO will reflect the capability's status through logs, an info +metric, as well as `ClusterOperator`'s status, which can be used +by administrators or support to determine the health of the API +extension. + +**More on this as we implement the capability.** + +## Support Procedures + + + + +CMO will reflect the capability's status through logs, an info +metric, as well as `ClusterOperator`'s status, which can be used +by administrators or support to determine the health of the API +extension. + +Note that since capabilities cannot be disabled once enabled, any +issue that arises from enabling the capability will need to be +treated manually, on a case-by-case basis, as reverting to the +previous state is not possible. + +## Infrastructure Needed [optional] + + + +N/A