Skip to content

(fleet) Add experiment rollout to fleet daemon#2842

Draft
zhuminyi wants to merge 1 commit intomainfrom
minyi/fleet-experiment-rollout
Draft

(fleet) Add experiment rollout to fleet daemon#2842
zhuminyi wants to merge 1 commit intomainfrom
minyi/fleet-experiment-rollout

Conversation

@zhuminyi
Copy link
Copy Markdown
Contributor

Fill the fleet daemon experiment stubs with real logic to start, stop, and promote FA experiments by patching the DDA.

Fleet daemon (pkg/fleet/)

  • Add K8s client to Daemon struct for DDA operations
  • Add fleetManagementOperation type per installer config RFC
  • Add Operations field to installerConfig (alongside legacy FileOperations)
  • startDatadogAgentExperiment: extract config, apply JSON merge patch to DDA spec, set experiment status with post-patch generation
  • stopDatadogAgentExperiment: set phase=stopped (with ID validation)
  • promoteDatadogAgentExperiment: set phase=promoted (with ID validation)
  • Guards: reject start during active experiment, allow after terminal phases (aborted/timeout/promoted), silently drop stale stop/promote
  • setExperimentStatus with conflict retry (3x re-fetch)
  • testing.go: test-friendly exports for CLI and integration tests

CRD types

  • ExperimentStatus: phase, id, generation (aligned with Caroline RFC)
  • ExperimentPhase enum: running, stopped, rollback, timeout, promoted, aborted
  • Add Experiment field to DatadogAgentStatus

Reconciler compatibility

  • generateNewStatusFromDDA preserves Experiment via DeepCopy
  • IsEqualStatus compares Experiment field

Test CLI (cmd/fleet-test/)

  • Exercises daemon code against real K8s API server
  • Supports: start, stop, promote, status actions
  • Verified on workspace Kind cluster

Unit tests (13 new)

  • extractDDAPatch: success, no match
  • start: success, missing config, missing ID, DDA not found, already running, after aborted, merge patch preserves fields
  • stop: running, no running, ID mismatch
  • promote: running, no running

What does this PR do?

A brief description of the change being made with this pull request.

Motivation

What inspired you to submit this pull request?

Additional Notes

Anything else we should know when reviewing?

Minimum Agent Versions

Are there minimum versions of the Datadog Agent and/or Cluster Agent required?

  • Agent: vX.Y.Z
  • Cluster Agent: vX.Y.Z

Describe your test plan

Write there any instructions and details you may have to test your PR.

Checklist

  • PR has at least one valid label: bug, enhancement, refactoring, documentation, tooling, and/or dependencies
  • PR has a milestone or the qa/skip-qa label
  • All commits are signed (see: signing commits)

Fill the fleet daemon experiment stubs with real logic to start, stop,
and promote FA experiments by patching the DDA.

## Fleet daemon (pkg/fleet/)

- Add K8s client to Daemon struct for DDA operations
- Add fleetManagementOperation type per installer config RFC
- Add Operations field to installerConfig (alongside legacy FileOperations)
- startDatadogAgentExperiment: extract config, apply JSON merge patch
  to DDA spec, set experiment status with post-patch generation
- stopDatadogAgentExperiment: set phase=stopped (with ID validation)
- promoteDatadogAgentExperiment: set phase=promoted (with ID validation)
- Guards: reject start during active experiment, allow after terminal
  phases (aborted/timeout/promoted), silently drop stale stop/promote
- setExperimentStatus with conflict retry (3x re-fetch)
- testing.go: test-friendly exports for CLI and integration tests

## CRD types

- ExperimentStatus: phase, id, generation (aligned with Caroline RFC)
- ExperimentPhase enum: running, stopped, rollback, timeout, promoted, aborted
- Add Experiment field to DatadogAgentStatus

## Reconciler compatibility

- generateNewStatusFromDDA preserves Experiment via DeepCopy
- IsEqualStatus compares Experiment field

## Test CLI (cmd/fleet-test/)

- Exercises daemon code against real K8s API server
- Supports: start, stop, promote, status actions
- Verified on workspace Kind cluster

## Unit tests (13 new)

- extractDDAPatch: success, no match
- start: success, missing config, missing ID, DDA not found,
  already running, after aborted, merge patch preserves fields
- stop: running, no running, ID mismatch
- promote: running, no running

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant