Conversation
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #2838 +/- ##
==========================================
+ Coverage 38.94% 40.15% +1.21%
==========================================
Files 313 315 +2
Lines 27139 27956 +817
==========================================
+ Hits 10570 11227 +657
- Misses 15780 15919 +139
- Partials 789 810 +21
Flags with carried forward coverage won't be shown. Click here to find out more.
... and 14 files with indirect coverage changes Continue to review full report in Codecov by Sentry.
🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: d358c9c2bd
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
|
|
||
| ctx = ctrl.LoggerInto(ctx, ctrl.LoggerFrom(ctx).WithValues("experimentID", experiment.ID)) | ||
|
|
||
| if err := r.handleRollback(ctx, instance, newStatus, now, revList); err != nil { |
There was a problem hiding this comment.
nit: It is better to call abortExperiment first to detects user edit. If there is a user edit, phase will be changed to aborted and user's edit will be preserved (this is a narrow window of race condtion).
There was a problem hiding this comment.
I initially had it called first as an early return, but calling abortExperiment first right now causes the operator logs to look like it's aborting when it's actually in timeout. I ended up deciding to reorder over complicating the function
| if err := r.manageExperiment(ctx, instance, newDDAStatus, now, revList); err != nil { | ||
| return r.updateStatusIfNeededV2(logger, instance, newDDAStatus, result, err, now) | ||
| } | ||
| if err := r.manageRevision(ctx, instance, revList, newDDAStatus); err != nil { | ||
| return r.updateStatusIfNeededV2(logger, instance, newDDAStatus, result, err, now) | ||
| } |
There was a problem hiding this comment.
Is this safe to do in two steps ? What if the second steps fails after the first step succeeded ?
There was a problem hiding this comment.
Like won't this prevent rollbacks after apply ?
There was a problem hiding this comment.
It should be fine to do in two steps. The actual rollback is handled in manageExperiment so it won't prevent experiment rollbacks. We don't allow user-initiated manual rollbacks so no issues there. There is one bug though in that after a rollback, if the user tries to apply the same change again, the operator will immediately roll back so it looks like there was no change. I'll add a fix for that
What does this PR do?
Adds rollback functionality for DDAs with fleet experiments. Includes:
Motivation
https://datadoghq.atlassian.net/browse/CONTP-1404
Additional Notes
Anything else we should know when reviewing?
Minimum Agent Versions
Are there minimum versions of the Datadog Agent and/or Cluster Agent required?
Describe your test plan
TBA. Test commands for now:
Stop signal
Timeout
Abort
Checklist
bug,enhancement,refactoring,documentation,tooling, and/ordependenciesqa/skip-qalabel