start collector when it got available and stop when unavailable #220

Parthiba-Hazra · 2025-12-17T15:55:37Z

Summary by CodeRabbit

New Features
- Improved dynamic management of resource collectors with automatic availability detection and recovery.
Refactor
- Centralized collector creation logic for consistency across registration and restart workflows.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

coderabbitai · 2025-12-17T15:55:49Z

Walkthrough

The changes introduce dynamic runtime availability management for collectors by adding an UnavailableCollectors tracker, a new checkCollectorAvailabilityChanges method to proactively register/deregister collectors as resources become available or disappear, a centralized createCollectorByType factory method, and refactored registration logic to consolidate collector instantiation across multiple code paths.

Changes

Cohort / File(s)	Change Summary
Availability tracking and management `internal/controller/collectionpolicy_controller.go`	Added `UnavailableCollectors` field (map[string]bool) to track collectors whose resources are unavailable; introduced `checkCollectorAvailabilityChanges` method to detect availability changes, register new collectors, and deregister collectors when resources disappear, with telemetry and logging integration.
Collector factory pattern `internal/controller/collectionpolicy_controller.go`	Introduced `createCollectorByType` factory method supporting 25+ collector types (endpoints, service_account, pod, deployment, stateful_set, daemon_set, namespace, job, ingress, network_policy, role, node, storage_class, karpenter, datadog, argo_rollouts, keda_scaled_job, volcano_job, and others); centralizes collector instantiation.
Registration and handler refactoring `internal/controller/collectionpolicy_controller.go`	Refactored `registerResourceCollectors` to use `createCollectorByType` factory and track unavailable collectors; enhanced `handleDisabledCollectorsChange` to leverage factory for creating newly enabled collectors; updated logging for consistency with `collectorTypeName` variable; integrated `checkCollectorAvailabilityChanges` into reconciliation loop.

Sequence Diagram

sequenceDiagram
    participant Reconciler
    participant AvailChecker as Availability Checker
    participant Factory as Collector Factory
    participant UnavailTracker as Unavailable Collectors
    participant Collector
    
    Reconciler->>AvailChecker: checkCollectorAvailabilityChanges()
    AvailChecker->>AvailChecker: Detect resource availability changes
    
    alt Resource became available
        AvailChecker->>UnavailTracker: Check if collector in unavailable map
        AvailChecker->>Factory: createCollectorByType(type, config)
        Factory->>Collector: Instantiate collector
        AvailChecker->>Collector: Register & start collector
        AvailChecker->>UnavailTracker: Remove from unavailable map
        Note over AvailChecker,UnavailTracker: Log & emit telemetry for registration
    end
    
    alt Resource became unavailable
        AvailChecker->>Collector: Deregister & stop collector
        AvailChecker->>UnavailTracker: Track in unavailable map
        Note over AvailChecker,UnavailTracker: Log & emit telemetry for deregistration
    end
    
    AvailChecker-->>Reconciler: Return result

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Carefully review the availability detection logic in checkCollectorAvailabilityChanges for correctness and edge cases
Verify that UnavailableCollectors access is thread-safe and handles concurrent reconciliation cycles
Validate the factory method createCollectorByType comprehensively covers all 25+ collector type mappings
Inspect integration points where the availability checker is invoked in the reconciliation loop for potential deadlocks or performance impacts
Check telemetry and logging statements for consistency and observability

Poem

🐰 A rabbit hops through collectors bright,
Tracking their dance—when unavailable, come in at night!
The factory spins threads of types galore,
Dynamic registration opens each door.
hop hop No more tangled switches to mend,
One unified path from start to the end! ✨

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title directly and clearly describes the main change: dynamic collector lifecycle management based on resource availability.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch ph/collector-availability

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 2

🧹 Nitpick comments (2)

internal/controller/collectionpolicy_controller.go (2)
78-81: Consider thread-safety for UnavailableCollectors map.

If the controller is configured with multiple workers (via MaxConcurrentReconciles), concurrent map access to UnavailableCollectors could cause a data race. While controller-runtime defaults to a single worker, consider protecting this map with a mutex if concurrent reconciliation is anticipated, or document that this reconciler requires single-worker configuration.

2702-3151: Missing collector types in factory; consider unifying with registerResourceCollectors.

The createCollectorByType factory doesn't handle all types from collector.AllResourceTypes():

"node_resource" - appears in AllResourceTypes() but has no corresponding collector

"cluster" - handled specially via setupClusterCollector, so returning nil is correct

When checkCollectorAvailabilityChanges iterates AllResourceTypes(), these will silently return nil. Consider adding explicit cases that return nil with a comment explaining why, to make the intent clear.

Additionally, this switch statement largely duplicates the collector creation logic in registerResourceCollectors. Consider extracting the collectors list to a shared registry to reduce duplication and ensure consistency.

For the missing types, add explicit cases:
+	case "node_resource":
+		// NodeResource metrics are collected via the Node collector
+		return nil
+	case "cluster":
+		// Cluster collector is handled separately via setupClusterCollector
+		return nil
 	default:
 		return nil

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c037ef6 and c821d63.

📒 Files selected for processing (1)

internal/controller/collectionpolicy_controller.go (6 hunks)

🧰 Additional context used

🧬 Code graph analysis (1)

internal/controller/collectionpolicy_controller.go (3)

internal/collector/manager.go (1)

CollectionManager (44-58)

internal/collector/types.go (1)

AllResourceTypes (4-17)

internal/collector/interface.go (1)

ResourceCollector (331-349)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)

GitHub Check: Run make test
GitHub Check: Build Docker Image
GitHub Check: Analyze (go)

🔇 Additional comments (5)

internal/controller/collectionpolicy_controller.go (5)

311-317: LGTM!

Good approach to make the availability check non-fatal. The error is logged appropriately, and the comment clearly explains the purpose of handling dynamic CRD availability.

1049-1055: LGTM!

Good refactoring to use the centralized createCollectorByType factory, improving maintainability and ensuring consistency in collector creation across the codebase.

2187-2254: LGTM!

Good improvements:

Lazy initialization of UnavailableCollectors map

Consistent use of collectorTypeName variable

Tracking unavailable collectors enables the dynamic availability feature

2379-2384: LGTM!

Consistent with the refactoring to use the centralized createCollectorByType factory.

2692-2698: LGTM!

Good practice to only log the summary when there are actual changes (collectorsStarted > 0 || collectorsStopped > 0), keeping logs clean during normal operation.

coderabbitai · 2025-12-17T16:01:36Z

internal/controller/collectionpolicy_controller.go

+	for _, resourceType := range allResourceTypes {
+		collectorType := resourceType.String()
+		if disabledCollectorsMap[collectorType] {
+			continue
+		}
+
+		tempCollector := r.createCollectorByType(collectorType, config, logger, metricsClient)
+		if tempCollector == nil {
+			continue
+		}


⚠️ Potential issue | 🟠 Major

Performance and potential resource concern with temporary collector creation.

This loop creates a new collector instance for every resource type (~40+) on every reconciliation cycle (every 5 minutes), even for collectors that are already running. Two concerns:

Performance overhead: Creating collector objects just to call IsAvailable() is wasteful.

Potential resource leak: If collector constructors allocate resources (channels, goroutines, etc.) that require explicit cleanup, these temporary collectors won't be properly cleaned up.

Consider optimizing by:

Only checking availability for collectors in UnavailableCollectors and running collectors (not all types)

For running collectors, call IsAvailable() on the existing instance via CollectionManager

- for _, resourceType := range allResourceTypes { + // Only check unavailable collectors (for becoming available) and running collectors (for becoming unavailable) + collectorsToCheck := make(map[string]bool) + for collectorType := range r.UnavailableCollectors { + collectorsToCheck[collectorType] = true + } + for _, collectorType := range r.CollectionManager.GetCollectorTypes() { + collectorsToCheck[collectorType] = true + } + + for collectorType := range collectorsToCheck { - collectorType := resourceType.String() if disabledCollectorsMap[collectorType] { continue }

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents

In internal/controller/collectionpolicy_controller.go around lines 2567-2576, the loop currently instantiates a new collector for every resource type on each reconcile which wastes CPU and risks leaking resources; instead, restrict availability checks to only those types listed in UnavailableCollectors plus types that are currently running, obtain running collector instances from the CollectionManager and call IsAvailable() on those instances, and avoid calling createCollectorByType for every type — if you must create an instance for a type not currently tracked, replace it with a lightweight static/registry availability check or ensure the constructor is non-allocating and that any allocated resources are cleaned up immediately after the check; update the loop logic to query CollectionManager for running collectors, check UnavailableCollectors directly, and remove unconditional createCollectorByType usage.

coderabbitai · 2025-12-17T16:01:37Z

internal/controller/collectionpolicy_controller.go

+				// Deregister on failure
+				r.CollectionManager.DeregisterCollector(collectorType)
+				continue


⚠️ Potential issue | 🟡 Minor

Log error from cleanup DeregisterCollector call.

When collector start fails and deregistration is attempted as cleanup, the error is silently discarded. Consider logging it for debugging purposes.

- // Deregister on failure - r.CollectionManager.DeregisterCollector(collectorType) + // Deregister on failure + if deregErr := r.CollectionManager.DeregisterCollector(collectorType); deregErr != nil { + logger.Error(deregErr, "Failed to deregister collector after start failure", "type", collectorType) + }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

// Deregister on failure

r.CollectionManager.DeregisterCollector(collectorType)

continue

// Deregister on failure

if deregErr := r.CollectionManager.DeregisterCollector(collectorType); deregErr != nil {

logger.Error(deregErr, "Failed to deregister collector after start failure", "type", collectorType)

}

continue

🤖 Prompt for AI Agents

internal/controller/collectionpolicy_controller.go around lines 2630-2632: the call to r.CollectionManager.DeregisterCollector(collectorType) on startup failure discards any returned error; update the cleanup to capture the error and log it via the controller's logger (e.g., r.Log.WithError(err).Errorf or equivalent) including context about which collectorType failed to deregister so the failure is visible for debugging; do not change control flow—just log the error before continuing.

start collector when it got available and stop when unavailable

6f19c66

Merge branch 'main' into ph/collector-availability

c821d63

Tzvonimir approved these changes Dec 17, 2025

View reviewed changes

coderabbitai bot reviewed Dec 17, 2025

View reviewed changes

Merge branch 'main' into ph/collector-availability

bddfa96

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

start collector when it got available and stop when unavailable #220

start collector when it got available and stop when unavailable #220

Uh oh!

Parthiba-Hazra commented Dec 17, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Dec 17, 2025 •

edited

Loading

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Dec 17, 2025

Uh oh!

coderabbitai bot Dec 17, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

start collector when it got available and stop when unavailable #220

Are you sure you want to change the base?

start collector when it got available and stop when unavailable #220

Uh oh!

Conversation

Parthiba-Hazra commented Dec 17, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Dec 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Poem

Pre-merge checks and finishing touches

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Dec 17, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Dec 17, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Parthiba-Hazra commented Dec 17, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Dec 17, 2025 •

edited

Loading