-
Notifications
You must be signed in to change notification settings - Fork 0
start collector when it got available and stop when unavailable #220
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
WalkthroughThe changes introduce dynamic runtime availability management for collectors by adding an Changes
Sequence DiagramsequenceDiagram
participant Reconciler
participant AvailChecker as Availability Checker
participant Factory as Collector Factory
participant UnavailTracker as Unavailable Collectors
participant Collector
Reconciler->>AvailChecker: checkCollectorAvailabilityChanges()
AvailChecker->>AvailChecker: Detect resource availability changes
alt Resource became available
AvailChecker->>UnavailTracker: Check if collector in unavailable map
AvailChecker->>Factory: createCollectorByType(type, config)
Factory->>Collector: Instantiate collector
AvailChecker->>Collector: Register & start collector
AvailChecker->>UnavailTracker: Remove from unavailable map
Note over AvailChecker,UnavailTracker: Log & emit telemetry for registration
end
alt Resource became unavailable
AvailChecker->>Collector: Deregister & stop collector
AvailChecker->>UnavailTracker: Track in unavailable map
Note over AvailChecker,UnavailTracker: Log & emit telemetry for deregistration
end
AvailChecker-->>Reconciler: Return result
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes
Poem
Pre-merge checks and finishing touches✅ Passed checks (3 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Nitpick comments (2)
internal/controller/collectionpolicy_controller.go (2)
78-81: Consider thread-safety forUnavailableCollectorsmap.If the controller is configured with multiple workers (via
MaxConcurrentReconciles), concurrent map access toUnavailableCollectorscould cause a data race. While controller-runtime defaults to a single worker, consider protecting this map with a mutex if concurrent reconciliation is anticipated, or document that this reconciler requires single-worker configuration.
2702-3151: Missing collector types in factory; consider unifying withregisterResourceCollectors.The
createCollectorByTypefactory doesn't handle all types fromcollector.AllResourceTypes():
"node_resource"- appears inAllResourceTypes()but has no corresponding collector"cluster"- handled specially viasetupClusterCollector, so returningnilis correctWhen
checkCollectorAvailabilityChangesiteratesAllResourceTypes(), these will silently returnnil. Consider adding explicit cases that returnnilwith a comment explaining why, to make the intent clear.Additionally, this switch statement largely duplicates the collector creation logic in
registerResourceCollectors. Consider extracting the collectors list to a shared registry to reduce duplication and ensure consistency.For the missing types, add explicit cases:
+ case "node_resource": + // NodeResource metrics are collected via the Node collector + return nil + case "cluster": + // Cluster collector is handled separately via setupClusterCollector + return nil default: return nil
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
internal/controller/collectionpolicy_controller.go(6 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
internal/controller/collectionpolicy_controller.go (3)
internal/collector/manager.go (1)
CollectionManager(44-58)internal/collector/types.go (1)
AllResourceTypes(4-17)internal/collector/interface.go (1)
ResourceCollector(331-349)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: Run make test
- GitHub Check: Build Docker Image
- GitHub Check: Analyze (go)
🔇 Additional comments (5)
internal/controller/collectionpolicy_controller.go (5)
311-317: LGTM!Good approach to make the availability check non-fatal. The error is logged appropriately, and the comment clearly explains the purpose of handling dynamic CRD availability.
1049-1055: LGTM!Good refactoring to use the centralized
createCollectorByTypefactory, improving maintainability and ensuring consistency in collector creation across the codebase.
2187-2254: LGTM!Good improvements:
- Lazy initialization of
UnavailableCollectorsmap- Consistent use of
collectorTypeNamevariable- Tracking unavailable collectors enables the dynamic availability feature
2379-2384: LGTM!Consistent with the refactoring to use the centralized
createCollectorByTypefactory.
2692-2698: LGTM!Good practice to only log the summary when there are actual changes (
collectorsStarted > 0 || collectorsStopped > 0), keeping logs clean during normal operation.
| for _, resourceType := range allResourceTypes { | ||
| collectorType := resourceType.String() | ||
| if disabledCollectorsMap[collectorType] { | ||
| continue | ||
| } | ||
|
|
||
| tempCollector := r.createCollectorByType(collectorType, config, logger, metricsClient) | ||
| if tempCollector == nil { | ||
| continue | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Performance and potential resource concern with temporary collector creation.
This loop creates a new collector instance for every resource type (~40+) on every reconciliation cycle (every 5 minutes), even for collectors that are already running. Two concerns:
- Performance overhead: Creating collector objects just to call
IsAvailable()is wasteful. - Potential resource leak: If collector constructors allocate resources (channels, goroutines, etc.) that require explicit cleanup, these temporary collectors won't be properly cleaned up.
Consider optimizing by:
- Only checking availability for collectors in
UnavailableCollectorsand running collectors (not all types) - For running collectors, call
IsAvailable()on the existing instance viaCollectionManager
- for _, resourceType := range allResourceTypes {
+ // Only check unavailable collectors (for becoming available) and running collectors (for becoming unavailable)
+ collectorsToCheck := make(map[string]bool)
+ for collectorType := range r.UnavailableCollectors {
+ collectorsToCheck[collectorType] = true
+ }
+ for _, collectorType := range r.CollectionManager.GetCollectorTypes() {
+ collectorsToCheck[collectorType] = true
+ }
+
+ for collectorType := range collectorsToCheck {
- collectorType := resourceType.String()
if disabledCollectorsMap[collectorType] {
continue
}Committable suggestion skipped: line range outside the PR's diff.
🤖 Prompt for AI Agents
In internal/controller/collectionpolicy_controller.go around lines 2567-2576,
the loop currently instantiates a new collector for every resource type on each
reconcile which wastes CPU and risks leaking resources; instead, restrict
availability checks to only those types listed in UnavailableCollectors plus
types that are currently running, obtain running collector instances from the
CollectionManager and call IsAvailable() on those instances, and avoid calling
createCollectorByType for every type — if you must create an instance for a type
not currently tracked, replace it with a lightweight static/registry
availability check or ensure the constructor is non-allocating and that any
allocated resources are cleaned up immediately after the check; update the loop
logic to query CollectionManager for running collectors, check
UnavailableCollectors directly, and remove unconditional createCollectorByType
usage.
| // Deregister on failure | ||
| r.CollectionManager.DeregisterCollector(collectorType) | ||
| continue |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Log error from cleanup DeregisterCollector call.
When collector start fails and deregistration is attempted as cleanup, the error is silently discarded. Consider logging it for debugging purposes.
- // Deregister on failure
- r.CollectionManager.DeregisterCollector(collectorType)
+ // Deregister on failure
+ if deregErr := r.CollectionManager.DeregisterCollector(collectorType); deregErr != nil {
+ logger.Error(deregErr, "Failed to deregister collector after start failure", "type", collectorType)
+ }📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| // Deregister on failure | |
| r.CollectionManager.DeregisterCollector(collectorType) | |
| continue | |
| // Deregister on failure | |
| if deregErr := r.CollectionManager.DeregisterCollector(collectorType); deregErr != nil { | |
| logger.Error(deregErr, "Failed to deregister collector after start failure", "type", collectorType) | |
| } | |
| continue |
🤖 Prompt for AI Agents
internal/controller/collectionpolicy_controller.go around lines 2630-2632: the
call to r.CollectionManager.DeregisterCollector(collectorType) on startup
failure discards any returned error; update the cleanup to capture the error and
log it via the controller's logger (e.g., r.Log.WithError(err).Errorf or
equivalent) including context about which collectorType failed to deregister so
the failure is visible for debugging; do not change control flow—just log the
error before continuing.
Summary by CodeRabbit
New Features
Refactor
✏️ Tip: You can customize this high-level summary in your review settings.