-
Notifications
You must be signed in to change notification settings - Fork 182
CNTRLPLANE-1616: add event-ttl config observer #1938
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@tjungblu: This pull request references CNTRLPLANE-1616 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.21.0" version, but no target version was set. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
Note Other AI code review bot(s) detectedCodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review. WalkthroughAdds a feature-gated config observer that reads EventTTLMinutes from the KubeAPIServer operator CR, computes an event-ttl string ("m" when >0), updates or prunes Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (3)
pkg/operator/configobservation/apiserver/observe_event_ttl.go (2)
59-63: Behavior on missing KubeAPIServer CR: confirm desired degradationGet("cluster") error bubbles up. Is operator supposed to degrade if the CR is temporarily missing, or should NotFound be tolerated by returning existing/pruned config without error?
If tolerating NotFound is desired, handle metav1.IsNotFound(err) and return existing config with no error.
64-72: Consider clamping minutes to allowed range (if API doesn’t validate)Tests mention “minimum/maximum,” but code accepts any positive value. If API validation doesn’t already enforce bounds, clamp here to prevent pathological retention.
Apply within this block:
- // Determine the event TTL value to use - var eventTTLValue string - if kubeAPIServer.Spec.EventTTLMinutes > 0 { - // Use the specified value, convert minutes to duration string (e.g., "180m" for 180 minutes) - eventTTLValue = fmt.Sprintf("%dm", kubeAPIServer.Spec.EventTTLMinutes) - } else { - // Use default value of 3h when EventTTLMinutes is 0 or not set - eventTTLValue = "3h" - } + // Determine the event TTL value to use + var eventTTLValue string + m := int(kubeAPIServer.Spec.EventTTLMinutes) + if m > 0 { + if m < minEventTTLMinutes { + m = minEventTTLMinutes + } + if m > maxEventTTLMinutes { + m = maxEventTTLMinutes + } + eventTTLValue = fmt.Sprintf("%dm", m) + } else { + eventTTLValue = defaultEventTTL + }Add near the top of the file:
const ( defaultEventTTL = "3h" minEventTTLMinutes = 5 maxEventTTLMinutes = 180 )pkg/operator/configobservation/apiserver/observe_event_ttl_test.go (1)
20-178: Good coverage; add a case for “feature gates not yet observed”Add a scenario where AreInitialFeatureGatesObserved() returns false and assert the function returns the pruned existing config unchanged. Stub FeatureGateAccess in-test to simulate this.
Example stub:
type stubFG struct{ featuregates.FeatureGateAccess } func (s stubFG) AreInitialFeatureGatesObserved() bool { return false } func TestObserveEventTTL_FeatureGatesNotObserved(t *testing.T) { eventRecorder := events.NewInMemoryRecorder("", clocktesting.NewFakePassiveClock(time.Now())) idx := cache.NewIndexer(cache.MetaNamespaceKeyFunc, cache.Indexers{}) _ = idx.Add(&operatorv1.KubeAPIServer{ObjectMeta: metav1.ObjectMeta{Name: "cluster"}}) listers := configobservation.Listers{KubeAPIServerOperatorLister_: operatorlistersv1.NewKubeAPIServerLister(idx)} existing := map[string]interface{}{"apiServerArguments": map[string]interface{}{"event-ttl": []interface{}{"120m"}}} observer := NewObserveEventTTL(stubFG{}) observed, errs := observer(listers, eventRecorder, existing) if len(errs) != 0 { t.Fatalf("unexpected errors: %v", errs) } if diff := cmp.Diff(configobserver.Pruned(existing, eventTTLPath), observed); diff != "" { t.Fatalf("unexpected config, diff = %s", diff) } }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
Cache: Disabled due to data retention organization setting
Knowledge base: Disabled due to Reviews -> Disable Knowledge Base setting
⛔ Files ignored due to path filters (54)
go.sumis excluded by!**/*.sumvendor/github.com/openshift/api/.ci-operator.yamlis excluded by!vendor/**,!**/vendor/**vendor/github.com/openshift/api/.golangci.go-validated.yamlis excluded by!vendor/**,!**/vendor/**vendor/github.com/openshift/api/.golangci.yamlis excluded by!vendor/**,!**/vendor/**vendor/github.com/openshift/api/AGENTS.mdis excluded by!vendor/**,!**/vendor/**vendor/github.com/openshift/api/Dockerfile.ocpis excluded by!vendor/**,!**/vendor/**vendor/github.com/openshift/api/Makefileis excluded by!vendor/**,!**/vendor/**vendor/github.com/openshift/api/OWNERSis excluded by!vendor/**,!**/vendor/**vendor/github.com/openshift/api/config/v1/register.gois excluded by!vendor/**,!**/vendor/**vendor/github.com/openshift/api/config/v1/types_apiserver.gois excluded by!vendor/**,!**/vendor/**vendor/github.com/openshift/api/config/v1/types_authentication.gois excluded by!vendor/**,!**/vendor/**vendor/github.com/openshift/api/config/v1/types_cluster_operator.gois excluded by!vendor/**,!**/vendor/**vendor/github.com/openshift/api/config/v1/types_cluster_version.gois excluded by!vendor/**,!**/vendor/**vendor/github.com/openshift/api/config/v1/types_infrastructure.gois excluded by!vendor/**,!**/vendor/**vendor/github.com/openshift/api/config/v1/types_insights.gois excluded by!vendor/**,!**/vendor/**vendor/github.com/openshift/api/config/v1/zz_generated.deepcopy.gois excluded by!vendor/**,!**/vendor/**vendor/github.com/openshift/api/config/v1/zz_generated.featuregated-crd-manifests.yamlis excluded by!vendor/**,!**/vendor/**vendor/github.com/openshift/api/config/v1/zz_generated.swagger_doc_generated.gois excluded by!vendor/**,!**/vendor/**vendor/github.com/openshift/api/config/v1alpha1/types_cluster_monitoring.gois excluded by!vendor/**,!**/vendor/**vendor/github.com/openshift/api/config/v1alpha1/zz_generated.deepcopy.gois excluded by!vendor/**,!**/vendor/**vendor/github.com/openshift/api/config/v1alpha1/zz_generated.swagger_doc_generated.gois excluded by!vendor/**,!**/vendor/**vendor/github.com/openshift/api/console/v1/types_console_cli_download.gois excluded by!vendor/**,!**/vendor/**vendor/github.com/openshift/api/console/v1/types_console_link.gois excluded by!vendor/**,!**/vendor/**vendor/github.com/openshift/api/console/v1/zz_generated.featuregated-crd-manifests.yamlis excluded by!vendor/**,!**/vendor/**vendor/github.com/openshift/api/envtest-releases.yamlis excluded by!vendor/**,!**/vendor/**vendor/github.com/openshift/api/features.mdis excluded by!vendor/**,!**/vendor/**vendor/github.com/openshift/api/features/features.gois excluded by!vendor/**,!**/vendor/**vendor/github.com/openshift/api/features/legacyfeaturegates.gois excluded by!vendor/**,!**/vendor/**vendor/github.com/openshift/api/imageregistry/v1/types.gois excluded by!vendor/**,!**/vendor/**vendor/github.com/openshift/api/imageregistry/v1/zz_generated.featuregated-crd-manifests.yamlis excluded by!vendor/**,!**/vendor/**vendor/github.com/openshift/api/legacyconfig/v1/types.gois excluded by!vendor/**,!**/vendor/**vendor/github.com/openshift/api/legacyconfig/v1/zz_generated.swagger_doc_generated.gois excluded by!vendor/**,!**/vendor/**vendor/github.com/openshift/api/machine/v1/types_controlplanemachineset.gois excluded by!vendor/**,!**/vendor/**vendor/github.com/openshift/api/machine/v1/zz_generated.swagger_doc_generated.gois excluded by!vendor/**,!**/vendor/**vendor/github.com/openshift/api/machine/v1beta1/types_awsprovider.gois excluded by!vendor/**,!**/vendor/**vendor/github.com/openshift/api/machine/v1beta1/types_machinehealthcheck.gois excluded by!vendor/**,!**/vendor/**vendor/github.com/openshift/api/machine/v1beta1/zz_generated.deepcopy.gois excluded by!vendor/**,!**/vendor/**vendor/github.com/openshift/api/machine/v1beta1/zz_generated.swagger_doc_generated.gois excluded by!vendor/**,!**/vendor/**vendor/github.com/openshift/api/operator/v1/types_etcd.gois excluded by!vendor/**,!**/vendor/**vendor/github.com/openshift/api/operator/v1/types_ingress.gois excluded by!vendor/**,!**/vendor/**vendor/github.com/openshift/api/operator/v1/types_kubeapiserver.gois excluded by!vendor/**,!**/vendor/**vendor/github.com/openshift/api/operator/v1/types_machineconfiguration.gois excluded by!vendor/**,!**/vendor/**vendor/github.com/openshift/api/operator/v1/zz_generated.crd-manifests/0000_20_kube-apiserver_01_kubeapiservers-CustomNoUpgrade.crd.yamlis excluded by!vendor/**,!**/vendor/**vendor/github.com/openshift/api/operator/v1/zz_generated.crd-manifests/0000_20_kube-apiserver_01_kubeapiservers-Default.crd.yamlis excluded by!vendor/**,!**/vendor/**vendor/github.com/openshift/api/operator/v1/zz_generated.crd-manifests/0000_20_kube-apiserver_01_kubeapiservers-DevPreviewNoUpgrade.crd.yamlis excluded by!vendor/**,!**/vendor/**vendor/github.com/openshift/api/operator/v1/zz_generated.crd-manifests/0000_20_kube-apiserver_01_kubeapiservers-TechPreviewNoUpgrade.crd.yamlis excluded by!vendor/**,!**/vendor/**vendor/github.com/openshift/api/operator/v1/zz_generated.crd-manifests/0000_80_machine-config_01_machineconfigurations-CustomNoUpgrade.crd.yamlis excluded by!vendor/**,!**/vendor/**vendor/github.com/openshift/api/operator/v1/zz_generated.crd-manifests/0000_80_machine-config_01_machineconfigurations-Default.crd.yamlis excluded by!vendor/**,!**/vendor/**vendor/github.com/openshift/api/operator/v1/zz_generated.crd-manifests/0000_80_machine-config_01_machineconfigurations-DevPreviewNoUpgrade.crd.yamlis excluded by!vendor/**,!**/vendor/**vendor/github.com/openshift/api/operator/v1/zz_generated.crd-manifests/0000_80_machine-config_01_machineconfigurations-TechPreviewNoUpgrade.crd.yamlis excluded by!vendor/**,!**/vendor/**vendor/github.com/openshift/api/operator/v1/zz_generated.deepcopy.gois excluded by!vendor/**,!**/vendor/**vendor/github.com/openshift/api/operator/v1/zz_generated.featuregated-crd-manifests.yamlis excluded by!vendor/**,!**/vendor/**vendor/github.com/openshift/api/operator/v1/zz_generated.swagger_doc_generated.gois excluded by!vendor/**,!**/vendor/**vendor/modules.txtis excluded by!vendor/**,!**/vendor/**
📒 Files selected for processing (4)
go.mod(1 hunks)pkg/operator/configobservation/apiserver/observe_event_ttl.go(1 hunks)pkg/operator/configobservation/apiserver/observe_event_ttl_test.go(1 hunks)pkg/operator/configobservation/configobservercontroller/observe_config_controller.go(1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Cursor Bugbot
🔇 Additional comments (5)
go.mod (1)
18-18: openshift/api bump: confirm feature and tidy/vendor updates
- Verify the bumped commit provides features.FeatureEventTTL and related types used by this PR.
- Ensure go.sum and vendor/ are updated (go mod tidy; go mod vendor) and CI passes with this pseudo-version.
pkg/operator/configobservation/apiserver/observe_event_ttl.go (3)
30-33: Prune to observed path only — goodReturning only apiServerArguments.event-ttl avoids clobbering unrelated config.
35-48: Feature-gate handling looks correctWaits for gates, short-circuits when disabled. Matches observer patterns here.
75-81: Only write on change and emit event — goodAvoids unnecessary churn; the event is helpful for traceability.
pkg/operator/configobservation/configobservercontroller/observe_config_controller.go (1)
128-129: Wiring looks correctObserver added to the chain; required informer/lister (KubeAPIServers) is already present above.
3b4fe8f to
54bdf32
Compare
|
@tjungblu: This pull request references CNTRLPLANE-1616 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.21.0" version, but no target version was set. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
54bdf32 to
baca7fd
Compare
|
@tjungblu: This pull request references CNTRLPLANE-1616 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.21.0" version, but no target version was set. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
@tjungblu: This pull request references CNTRLPLANE-1616 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.21.0" version, but no target version was set. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (4)
pkg/operator/configobservation/apiserver/observe_event_ttl_test.go (4)
148-153: Check the error from indexer.Add().While failures are unlikely with in-memory test data, explicitly checking the error makes the test more robust and prevents silent failures.
Apply this diff to check the error:
// Add KubeAPIServer resource -_ = kubeAPIServerIndexer.Add(&operatorv1.KubeAPIServer{ +if err := kubeAPIServerIndexer.Add(&operatorv1.KubeAPIServer{ ObjectMeta: metav1.ObjectMeta{Name: "cluster"}, Spec: operatorv1.KubeAPIServerSpec{ EventTTLMinutes: scenario.eventTTLMinutes, }, -}) +}); err != nil { + t.Fatalf("failed to add KubeAPIServer to indexer: %v", err) +}
141-178: Consider verifying event emission in tests.The test checks the observed configuration but doesn't verify that the
ObserveEventTTLChangedevent (mentioned in the AI summary) is emitted when changes occur, or that no event is emitted for no-op cases.While the current test validates the core functionality, adding event emission checks would provide more comprehensive coverage of the observer's behavior.
Example pattern for verifying events:
// After calling observer if len(errs) > 0 { t.Fatalf("unexpected errors: %v", errs) } // Check events based on whether change was expected expectedEvents := 0 if !cmp.Equal(scenario.existingKubeAPIConfig, scenario.expectedKubeAPIConfig) { expectedEvents = 1 } if len(eventRecorder.Events()) != expectedEvents { t.Errorf("expected %d events, got %d", expectedEvents, len(eventRecorder.Events())) }
20-140: Consider adding test scenarios for error conditions.The current test coverage is solid for happy paths, but consider adding scenarios for:
- Missing KubeAPIServer resource: What happens when the "cluster" resource doesn't exist in the indexer?
- Edge cases: While validation may be handled by API admission, testing the observer's behavior with unexpected values (negative, very large) ensures defensive coding.
- Feature gate accessor errors: Though rare, testing error handling paths makes the observer more robust.
These additions would strengthen the test suite's resilience and documentation value.
Example additional scenarios:
{ name: "KubeAPIServer resource not found", existingKubeAPIConfig: map[string]interface{}{}, expectedKubeAPIConfig: map[string]interface{}{ "apiServerArguments": map[string]interface{}{ "event-ttl": []interface{}{"3h"}, }, }, eventTTLMinutes: 0, // Not used, resource doesn't exist featureOn: true, skipAddResource: true, // New field to skip adding resource },
58-78: Acknowledge CRD schema enforces min/max on EventTTLMinutes
The CRD in types_kubeapiserver.go uses kubebuilder:validation Minimum=5 and Maximum=180, so out-of-range values are rejected by API admission. No additional validation logic or edge-case tests are required in ObserveEventTTL. Consider adding a brief comment in ObserveEventTTL noting that bounds are enforced upstream.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
Cache: Disabled due to data retention organization setting
Knowledge base: Disabled due to Reviews -> Disable Knowledge Base setting
📒 Files selected for processing (3)
pkg/operator/configobservation/apiserver/observe_event_ttl.go(1 hunks)pkg/operator/configobservation/apiserver/observe_event_ttl_test.go(1 hunks)pkg/operator/configobservation/configobservercontroller/observe_config_controller.go(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- pkg/operator/configobservation/apiserver/observe_event_ttl.go
🔇 Additional comments (1)
pkg/operator/configobservation/configobservercontroller/observe_config_controller.go (1)
128-128: LGTM: Clean integration of the new observer.The new event TTL observer is correctly integrated into the observer chain, following the established pattern for feature-gated observers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (6)
pkg/operator/configobservation/configobservercontroller/observe_config_controller.go (1)
128-129: Wiring looks good; add KubeAPIServer informer HasSynced to PreRunCachesSynced to avoid early flaps.Observer depends on the KubeAPIServer lister; include its HasSynced in PreRunCachesSynced for parity with others and to reduce transient NotFound before initial sync (it’s already in the informers list).
Apply:
PreRunCachesSynced: append(preRunCacheSynced, operatorClient.Informer().HasSynced, + operatorInformer.Operator().V1().KubeAPIServers().Informer().HasSynced, kubeInformersForNamespaces.InformersFor("openshift-etcd").Core().V1().ConfigMaps().Informer().HasSynced, kubeInformersForNamespaces.InformersFor(operatorclient.TargetNamespace).Core().V1().Secrets().Informer().HasSynced,Also applies to: 98-114
pkg/operator/configobservation/apiserver/observe_event_ttl.go (3)
58-63: Treat NotFound from KubeAPIServer lister as no-op to reduce startup noise.If the CR isn’t observed yet, return existing config without error instead of bubbling an error.
Apply:
@@ - kubeAPIServer, err := genericListers.(configobservation.Listers).KubeAPIServerOperatorLister().Get("cluster") + kubeAPIServer, err := genericListers.(configobservation.Listers).KubeAPIServerOperatorLister().Get("cluster") if err != nil { - return existingConfig, []error{err} + if apierrors.IsNotFound(err) { + return existingConfig, nil + } + return existingConfig, []error{err} }And add the import:
import ( "fmt" + apierrors "k8s.io/apimachinery/pkg/api/errors"
64-72: Defaulting is fine; consider consistent representation.Optional: use "180m" instead of "3h" for uniform minute-based formatting across values. Tests would need updating.
74-81: Include old value in the change event for better auditability.Improves observability during rollouts.
Apply:
- recorder.Eventf("ObserveEventTTLChanged", "event TTL changed to %s", eventTTLValue) + recorder.Eventf("ObserveEventTTLChanged", "event TTL changed from %s to %s", currentEventTTL, eventTTLValue)pkg/operator/configobservation/apiserver/observe_event_ttl_test.go (2)
141-158: Add a scenario for KubeAPIServer NotFound (and/or nil existing config).Helps validate graceful no-op behavior when the CR isn’t observed yet and that nil existingConfig is handled.
Example addition:
{ name: "feature gate enabled, KAS CR not found", existingKubeAPIConfig: nil, expectedKubeAPIConfig: map[string]interface{}{}, // no observed change eventTTLMinutes: 60, featureOn: true, // In setup: do NOT add the KubeAPIServer object to the indexer }
167-175: Optional: assert event emission only on change.You can check the in-memory recorder to ensure an event is recorded when TTL changes and none when it doesn’t.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
Cache: Disabled due to data retention organization setting
Knowledge base: Disabled due to Reviews -> Disable Knowledge Base setting
📒 Files selected for processing (3)
pkg/operator/configobservation/apiserver/observe_event_ttl.go(1 hunks)pkg/operator/configobservation/apiserver/observe_event_ttl_test.go(1 hunks)pkg/operator/configobservation/configobservercontroller/observe_config_controller.go(1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Cursor Bugbot
🔇 Additional comments (2)
pkg/operator/configobservation/apiserver/observe_event_ttl.go (1)
28-34: Good use of Pruned to scope returned config.Keeps observer output constrained to apiServerArguments.event-ttl as expected.
pkg/operator/configobservation/apiserver/observe_event_ttl_test.go (1)
20-139: Solid, table-driven coverage.Scenarios cover FG off/on, defaulting, updates, and no-ops.
baca7fd to
3123d67
Compare
|
@tjungblu: This pull request references CNTRLPLANE-1616 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.21.0" version, but no target version was set. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
pkg/operator/configobservation/apiserver/observe_event_ttl_test.go (1)
141-177: Consider verifying event recording behavior.The tests are well-structured and comprehensive. However, since the PR objectives state the observer "emits an event on change and only updates on change," consider adding assertions to verify that:
- Events are recorded in scenarios where the TTL changes (scenarios 2-6, 8)
- No events are recorded in no-change scenarios (scenarios 7, 9)
You can check the event recorder's contents using
eventRecorder.Events()after calling the observer.Example assertion for scenario 6 (update case):
observedKubeAPIConfig, errs := observer(listers, eventRecorder, scenario.existingKubeAPIConfig) if len(errs) > 0 { t.Fatalf("unexpected errors: %v", errs) } // Verify event was recorded for change events := eventRecorder.Events() if len(events) == 0 { t.Error("expected event to be recorded when TTL changes") }For scenario 7 (no-change case):
observedKubeAPIConfig, errs := observer(listers, eventRecorder, scenario.existingKubeAPIConfig) if len(errs) > 0 { t.Fatalf("unexpected errors: %v", errs) } // Verify no event was recorded when no change occurred events := eventRecorder.Events() if len(events) > 0 { t.Error("expected no event when TTL unchanged") }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
Cache: Disabled due to data retention organization setting
Knowledge base: Disabled due to Reviews -> Disable Knowledge Base setting
📒 Files selected for processing (3)
pkg/operator/configobservation/apiserver/observe_event_ttl.go(1 hunks)pkg/operator/configobservation/apiserver/observe_event_ttl_test.go(1 hunks)pkg/operator/configobservation/configobservercontroller/observe_config_controller.go(2 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- pkg/operator/configobservation/apiserver/observe_event_ttl.go
🔇 Additional comments (2)
pkg/operator/configobservation/configobservercontroller/observe_config_controller.go (2)
100-100: LGTM! Informer sync correctly added.The
HasSyncedfor the KubeAPIServers informer is appropriately added toPreRunCachesSynced, ensuring the cache is synced before the observer runs.
129-129: LGTM! Observer correctly registered.The new EventTTL observer is appropriately placed among other apiserver observers and correctly receives the
featureGateAccessorfor feature-gated behavior.
3123d67 to
aae5999
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (2)
pkg/operator/configobservation/apiserver/observe_event_ttl.go (2)
68-70: Document or validate the handling of edge cases for EventTTLMinutes.The condition
EventTTLMinutes > 0implicitly treats zero and negative values as "use default." While this may be intentional, it's not explicitly validated or documented. Consider adding a comment or validation to clarify this behavior.Add a comment to clarify the behavior:
// Determine the event TTL value to use var eventTTLValue string + // EventTTLMinutes <= 0 is treated as unset and defaults to 3h if kubeAPIServer.Spec.EventTTLMinutes > 0 { // Use the specified value, convert minutes to duration string (e.g., "180m" for 180 minutes) eventTTLValue = fmt.Sprintf("%dm", kubeAPIServer.Spec.EventTTLMinutes)
73-73: Extract the default TTL value to a constant.The default value "3h" is hardcoded here. Extracting it to a package-level constant would improve maintainability and make it easier to reference or change in the future.
Add a constant at the package level:
var eventTTLPath = []string{"apiServerArguments", "event-ttl"} + +const defaultEventTTL = "3h"Then use it:
} else { // Use default value of 3h when EventTTLMinutes is 0 or not set - eventTTLValue = "3h" + eventTTLValue = defaultEventTTL }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
Cache: Disabled due to data retention organization setting
Knowledge base: Disabled due to Reviews -> Disable Knowledge Base setting
📒 Files selected for processing (3)
pkg/operator/configobservation/apiserver/observe_event_ttl.go(1 hunks)pkg/operator/configobservation/apiserver/observe_event_ttl_test.go(1 hunks)pkg/operator/configobservation/configobservercontroller/observe_config_controller.go(2 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
- pkg/operator/configobservation/configobservercontroller/observe_config_controller.go
- pkg/operator/configobservation/apiserver/observe_event_ttl_test.go
🔇 Additional comments (1)
pkg/operator/configobservation/apiserver/observe_event_ttl.go (1)
77-85: LGTM - Change detection and update logic.The change detection logic correctly avoids unnecessary updates by comparing the current and target values. Event emission provides good observability, and error handling is appropriate. Returning
existingConfigwhen unchanged correctly preserves the current state.
|
/verified by test -#1941 |
|
@gangwgr: This PR has been marked as verified by In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
| // Use default value of 3h when EventTTLMinutes is 0 or not set | ||
| eventTTLValue = "3h" | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we clear it altogether from the observed config and allow the static value from defaultconfig.yaml to take effect?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this makes the unit testing a bit more difficult because multiple other paths also do that - I've updated it nevertheless
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🙇
| if len(currentEventTTLSlice) > 0 { | ||
| currentEventTTL = currentEventTTLSlice[0] | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we handle the case where a bug or rogue admin writes multiple values here and the first value is what the config observer expects?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how would you like to handle it? Error out when there is more than one that doesn't parse as a golang duration?
just for reference, goaway chance also doesn't handle that case either:
if len(currentGoawayChanceSlice) > 0 {
currentGoawayChance = currentGoawayChanceSlice[0]
}
same as those others:
if len(currentSendRetryAfterWhileNotReadyOnceSlice) > 0 {
currentSendRetryAfterWhileNotReadyOnce = currentSendRetryAfterWhileNotReadyOnceSlice[0]
}
if len(currentShutdownDelaySlice) > 0 {
currentShutdownDelayDuration = currentShutdownDelaySlice[0]
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've put a code suggestion down below, let me know if that fits your bill here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the last observed value is not a duration, a kube-apiserver using this config will fail during options validation. Repeated flags won't fail, but only the last one takes effect (IIUC). I think we want to do what is best for kube-apiserver in both cases and overwrite/clear the existing config.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
okay, so to summarize: any len != 1 and non-duration is just discarded and we return an empty result.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In order of decreasing preference, we want 1. value computed from the config API, or empty if we observe that the config API field is empty, 2. existing value if we were unable to read the config API and the existing value is valid, then 3. empty value.
So here, we can treat an invalid existing value as though there is no existing value, but we must still try to read the config API to choose the effective value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in which case, we can also just forego looking into the existing value and just take what's in the operator CRD and if 0 is set return an empty map - correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. This probably becomes simpler if the API read happens first, because if that succeeds then we can ignore the existing value completely.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pushed it again, let me know if that makes more sense
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, thank you.
| ResourceSync: resourceSyncer, | ||
| PreRunCachesSynced: append(preRunCacheSynced, | ||
| operatorClient.Informer().HasSynced, | ||
| operatorInformer.Operator().V1().KubeAPIServers().Informer().HasSynced, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this the same as the informer returned by operatorClient.Informer()?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, also used in ObserveServiceAccountIssuer that way, which would indicate this one was missing before already
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice find!
aae5999 to
a2542db
Compare
|
@tjungblu: This pull request references CNTRLPLANE-1616 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.21.0" version, but no target version was set. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
/assign |
a2542db to
7d5a93c
Compare
|
@tjungblu: This pull request references CNTRLPLANE-1616 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.21.0" version, but no target version was set. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
Bug: Observer Returns Early on Config ErrorsThe observer returns early with errors when existing config has invalid duration or multiple values, preventing the clearing/overriding of bad configuration. This contradicts the established pattern in other observers (like observe_goaway_chance.go and observe_termination_duration.go) which collect errors but continue processing to allow overwriting bad config. Based on the PR discussion mentioning "overwrite/clear the existing config" and "discarded and we return an empty result", the intended behavior should be to add errors to an errors slice but continue processing, not return early with errors that prevent config updates. |
This updates the API to latest main and adds the logic to set the event-ttl accoringly to the newly introduced API field. Note to the reviewer, this has been largely AI generated by Cursor. Signed-off-by: Thomas Jungblut <tjungblu@redhat.com>
7d5a93c to
439a7b5
Compare
|
@tjungblu: This pull request references CNTRLPLANE-1616 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.21.0" version, but no target version was set. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (1)
pkg/operator/configobservation/apiserver/observe_event_ttl.go (1)
51-54: Type assertion should use two-value form for safety.The single-value type assertion will panic if
genericListersdoesn't implement the expected interface. While controller wiring likely guarantees this, using the two-value form would be more defensive.This issue was previously raised by coderabbitai[bot] and remains unaddressed.
🧹 Nitpick comments (1)
pkg/operator/configobservation/apiserver/observe_event_ttl.go (1)
56-69: Consider validating EventTTLMinutes is non-negative.The current logic treats any value ≤ 0 the same (use default), which means negative values are silently accepted. Depending on the API schema validation, this might be acceptable, but explicitly validating and returning an error for negative values would be more defensive.
Example:
// Determine the event TTL value to use var eventTTLValue string + if kubeAPIServer.Spec.EventTTLMinutes < 0 { + return existingConfig, []error{fmt.Errorf("EventTTLMinutes cannot be negative: %d", kubeAPIServer.Spec.EventTTLMinutes)} + } if kubeAPIServer.Spec.EventTTLMinutes > 0 { observedConfig := map[string]interface{}{}However, if the API schema already prevents negative values through validation, the current implementation is acceptable.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
Cache: Disabled due to data retention organization setting
Knowledge base: Disabled due to Reviews -> Disable Knowledge Base setting
📒 Files selected for processing (3)
pkg/operator/configobservation/apiserver/observe_event_ttl.go(1 hunks)pkg/operator/configobservation/apiserver/observe_event_ttl_test.go(1 hunks)pkg/operator/configobservation/configobservercontroller/observe_config_controller.go(2 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- pkg/operator/configobservation/apiserver/observe_event_ttl_test.go
🔇 Additional comments (6)
pkg/operator/configobservation/configobservercontroller/observe_config_controller.go (2)
100-100: LGTM! Good catch on the missing cache sync.This ensures the KubeAPIServers informer cache is synced before observers run, which is essential since the new event TTL observer reads from this informer. As noted in past reviews, this was previously missing.
129-129: LGTM! Observer registration is correct.The new event TTL observer is properly wired into the controller with the feature gate accessor, following the established pattern of other feature-gated observers in the codebase.
pkg/operator/configobservation/apiserver/observe_event_ttl.go (4)
16-22: LGTM! Factory function follows established pattern.The factory function correctly creates and returns a feature-gated observer, consistent with other observers in the codebase.
30-33: LGTM! Clever use of defer for pruning.The deferred pruning ensures the observed config is always scoped to only the event-ttl path, regardless of which return path is taken. This is a clean pattern for maintaining consistency.
35-43: LGTM! Feature gate checks are correct.The logic properly waits for initial feature gate observation before proceeding and handles errors appropriately.
45-49: LGTM! Feature gate disabled case correctly prunes the value.Returning an empty map when the feature is disabled ensures any previously observed event-ttl value is removed from the configuration, allowing the system to fall back to defaults. The deferred pruning will scope this to the event-ttl path.
|
/lgtm |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: benluddy, tjungblu The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
@tjungblu: This pull request references CNTRLPLANE-1616 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.21.0" version, but no target version was set. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
| } | ||
|
|
||
| // ObserveEventTTL reads the eventTTLMinutes from the KubeAPIServer operator CRD | ||
| func (o *eventTTLObserver) ObserveEventTTL(genericListers configobserver.Listers, recorder events.Recorder, existingConfig map[string]interface{}) (ret map[string]interface{}, errs []error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: The 'errs' variable is unused.
|
/retest-required |
|
@tjungblu: This PR has been marked as verified by In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
@tjungblu: The following test failed, say
Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
2634c93
into
openshift:main
This updates the API to latest main and adds the logic to set the event-ttl accoringly to the newly introduced API field.
Enhancement: openshift/enhancements#1857
Note to the reviewer, this has been largely AI generated by Cursor.
Note
Introduce a feature-gated observer that maps
KubeAPIServer.Spec.EventTTLMinutestoapiServerArguments.event-ttl, wire it into the controller, and add unit tests.apiserver/observe_event_ttl.goto observeKubeAPIServer.Spec.EventTTLMinutes(behindFeatureEventTTL) and setapiServerArguments.event-ttlas a duration (e.g.,60m); returns empty to use defaults when disabled or zero; prunes toevent-ttlpath.NewObserveEventTTL(featureGateAccessor)inobserve_config_controller.go.KubeAPIServers().Informer().HasSyncedto pre-run cache syncs.observe_event_ttl_test.gocovering feature on/off and various TTL values, including default behavior.Written by Cursor Bugbot for commit 439a7b5. This will update automatically on new commits. Configure here.