Skip to content

Conversation

@Parthiba-Hazra
Copy link
Collaborator

@Parthiba-Hazra Parthiba-Hazra commented Dec 17, 2025

Summary by CodeRabbit

  • New Features

    • Improved dynamic management of resource collectors with automatic availability detection and recovery.
  • Refactor

    • Centralized collector creation logic for consistency across registration and restart workflows.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link

coderabbitai bot commented Dec 17, 2025

Walkthrough

The changes introduce dynamic runtime availability management for collectors by adding an UnavailableCollectors tracker, a new checkCollectorAvailabilityChanges method to proactively register/deregister collectors as resources become available or disappear, a centralized createCollectorByType factory method, and refactored registration logic to consolidate collector instantiation across multiple code paths.

Changes

Cohort / File(s) Change Summary
Availability tracking and management
internal/controller/collectionpolicy_controller.go
Added UnavailableCollectors field (map[string]bool) to track collectors whose resources are unavailable; introduced checkCollectorAvailabilityChanges method to detect availability changes, register new collectors, and deregister collectors when resources disappear, with telemetry and logging integration.
Collector factory pattern
internal/controller/collectionpolicy_controller.go
Introduced createCollectorByType factory method supporting 25+ collector types (endpoints, service_account, pod, deployment, stateful_set, daemon_set, namespace, job, ingress, network_policy, role, node, storage_class, karpenter, datadog, argo_rollouts, keda_scaled_job, volcano_job, and others); centralizes collector instantiation.
Registration and handler refactoring
internal/controller/collectionpolicy_controller.go
Refactored registerResourceCollectors to use createCollectorByType factory and track unavailable collectors; enhanced handleDisabledCollectorsChange to leverage factory for creating newly enabled collectors; updated logging for consistency with collectorTypeName variable; integrated checkCollectorAvailabilityChanges into reconciliation loop.

Sequence Diagram

sequenceDiagram
    participant Reconciler
    participant AvailChecker as Availability Checker
    participant Factory as Collector Factory
    participant UnavailTracker as Unavailable Collectors
    participant Collector
    
    Reconciler->>AvailChecker: checkCollectorAvailabilityChanges()
    AvailChecker->>AvailChecker: Detect resource availability changes
    
    alt Resource became available
        AvailChecker->>UnavailTracker: Check if collector in unavailable map
        AvailChecker->>Factory: createCollectorByType(type, config)
        Factory->>Collector: Instantiate collector
        AvailChecker->>Collector: Register & start collector
        AvailChecker->>UnavailTracker: Remove from unavailable map
        Note over AvailChecker,UnavailTracker: Log & emit telemetry for registration
    end
    
    alt Resource became unavailable
        AvailChecker->>Collector: Deregister & stop collector
        AvailChecker->>UnavailTracker: Track in unavailable map
        Note over AvailChecker,UnavailTracker: Log & emit telemetry for deregistration
    end
    
    AvailChecker-->>Reconciler: Return result
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

  • Carefully review the availability detection logic in checkCollectorAvailabilityChanges for correctness and edge cases
  • Verify that UnavailableCollectors access is thread-safe and handles concurrent reconciliation cycles
  • Validate the factory method createCollectorByType comprehensively covers all 25+ collector type mappings
  • Inspect integration points where the availability checker is invoked in the reconciliation loop for potential deadlocks or performance impacts
  • Check telemetry and logging statements for consistency and observability

Poem

🐰 A rabbit hops through collectors bright,
Tracking their dance—when unavailable, come in at night!
The factory spins threads of types galore,
Dynamic registration opens each door.
hop hop No more tangled switches to mend,
One unified path from start to the end! ✨

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title directly and clearly describes the main change: dynamic collector lifecycle management based on resource availability.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch ph/collector-availability

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (2)
internal/controller/collectionpolicy_controller.go (2)

78-81: Consider thread-safety for UnavailableCollectors map.

If the controller is configured with multiple workers (via MaxConcurrentReconciles), concurrent map access to UnavailableCollectors could cause a data race. While controller-runtime defaults to a single worker, consider protecting this map with a mutex if concurrent reconciliation is anticipated, or document that this reconciler requires single-worker configuration.


2702-3151: Missing collector types in factory; consider unifying with registerResourceCollectors.

The createCollectorByType factory doesn't handle all types from collector.AllResourceTypes():

  • "node_resource" - appears in AllResourceTypes() but has no corresponding collector
  • "cluster" - handled specially via setupClusterCollector, so returning nil is correct

When checkCollectorAvailabilityChanges iterates AllResourceTypes(), these will silently return nil. Consider adding explicit cases that return nil with a comment explaining why, to make the intent clear.

Additionally, this switch statement largely duplicates the collector creation logic in registerResourceCollectors. Consider extracting the collectors list to a shared registry to reduce duplication and ensure consistency.

For the missing types, add explicit cases:

+	case "node_resource":
+		// NodeResource metrics are collected via the Node collector
+		return nil
+	case "cluster":
+		// Cluster collector is handled separately via setupClusterCollector
+		return nil
 	default:
 		return nil
📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c037ef6 and c821d63.

📒 Files selected for processing (1)
  • internal/controller/collectionpolicy_controller.go (6 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
internal/controller/collectionpolicy_controller.go (3)
internal/collector/manager.go (1)
  • CollectionManager (44-58)
internal/collector/types.go (1)
  • AllResourceTypes (4-17)
internal/collector/interface.go (1)
  • ResourceCollector (331-349)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: Run make test
  • GitHub Check: Build Docker Image
  • GitHub Check: Analyze (go)
🔇 Additional comments (5)
internal/controller/collectionpolicy_controller.go (5)

311-317: LGTM!

Good approach to make the availability check non-fatal. The error is logged appropriately, and the comment clearly explains the purpose of handling dynamic CRD availability.


1049-1055: LGTM!

Good refactoring to use the centralized createCollectorByType factory, improving maintainability and ensuring consistency in collector creation across the codebase.


2187-2254: LGTM!

Good improvements:

  • Lazy initialization of UnavailableCollectors map
  • Consistent use of collectorTypeName variable
  • Tracking unavailable collectors enables the dynamic availability feature

2379-2384: LGTM!

Consistent with the refactoring to use the centralized createCollectorByType factory.


2692-2698: LGTM!

Good practice to only log the summary when there are actual changes (collectorsStarted > 0 || collectorsStopped > 0), keeping logs clean during normal operation.

Comment on lines +2567 to +2576
for _, resourceType := range allResourceTypes {
collectorType := resourceType.String()
if disabledCollectorsMap[collectorType] {
continue
}

tempCollector := r.createCollectorByType(collectorType, config, logger, metricsClient)
if tempCollector == nil {
continue
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Performance and potential resource concern with temporary collector creation.

This loop creates a new collector instance for every resource type (~40+) on every reconciliation cycle (every 5 minutes), even for collectors that are already running. Two concerns:

  1. Performance overhead: Creating collector objects just to call IsAvailable() is wasteful.
  2. Potential resource leak: If collector constructors allocate resources (channels, goroutines, etc.) that require explicit cleanup, these temporary collectors won't be properly cleaned up.

Consider optimizing by:

  • Only checking availability for collectors in UnavailableCollectors and running collectors (not all types)
  • For running collectors, call IsAvailable() on the existing instance via CollectionManager
-	for _, resourceType := range allResourceTypes {
+	// Only check unavailable collectors (for becoming available) and running collectors (for becoming unavailable)
+	collectorsToCheck := make(map[string]bool)
+	for collectorType := range r.UnavailableCollectors {
+		collectorsToCheck[collectorType] = true
+	}
+	for _, collectorType := range r.CollectionManager.GetCollectorTypes() {
+		collectorsToCheck[collectorType] = true
+	}
+
+	for collectorType := range collectorsToCheck {
-		collectorType := resourceType.String()
 		if disabledCollectorsMap[collectorType] {
 			continue
 		}

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In internal/controller/collectionpolicy_controller.go around lines 2567-2576,
the loop currently instantiates a new collector for every resource type on each
reconcile which wastes CPU and risks leaking resources; instead, restrict
availability checks to only those types listed in UnavailableCollectors plus
types that are currently running, obtain running collector instances from the
CollectionManager and call IsAvailable() on those instances, and avoid calling
createCollectorByType for every type — if you must create an instance for a type
not currently tracked, replace it with a lightweight static/registry
availability check or ensure the constructor is non-allocating and that any
allocated resources are cleaned up immediately after the check; update the loop
logic to query CollectionManager for running collectors, check
UnavailableCollectors directly, and remove unconditional createCollectorByType
usage.

Comment on lines +2630 to +2632
// Deregister on failure
r.CollectionManager.DeregisterCollector(collectorType)
continue
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Log error from cleanup DeregisterCollector call.

When collector start fails and deregistration is attempted as cleanup, the error is silently discarded. Consider logging it for debugging purposes.

-				// Deregister on failure
-				r.CollectionManager.DeregisterCollector(collectorType)
+				// Deregister on failure
+				if deregErr := r.CollectionManager.DeregisterCollector(collectorType); deregErr != nil {
+					logger.Error(deregErr, "Failed to deregister collector after start failure", "type", collectorType)
+				}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
// Deregister on failure
r.CollectionManager.DeregisterCollector(collectorType)
continue
// Deregister on failure
if deregErr := r.CollectionManager.DeregisterCollector(collectorType); deregErr != nil {
logger.Error(deregErr, "Failed to deregister collector after start failure", "type", collectorType)
}
continue
🤖 Prompt for AI Agents
internal/controller/collectionpolicy_controller.go around lines 2630-2632: the
call to r.CollectionManager.DeregisterCollector(collectorType) on startup
failure discards any returned error; update the cleanup to capture the error and
log it via the controller's logger (e.g., r.Log.WithError(err).Errorf or
equivalent) including context about which collectorType failed to deregister so
the failure is visible for debugging; do not change control flow—just log the
error before continuing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants