Add backoff period to prevent rapid deployment updates#183
Add backoff period to prevent rapid deployment updates#183
Conversation
pkg/core/backoff.go
Outdated
| state.backoffLevel++ | ||
| // Cap at level 6 (64s) | ||
| if state.backoffLevel > 6 { | ||
| state.backoffLevel = 6 |
There was a problem hiding this comment.
This is redundant to the MaxBackoff check above. Either way works but both together do not make it better.
pkg/core/backoff.go
Outdated
| level = 6 | ||
| } | ||
| backoff := MinBackoff * (1 << level) // 2^level seconds | ||
| if backoff > MaxBackoff { |
There was a problem hiding this comment.
Either this check or the level check. Claude tried to be safe and added 3 checks ;-)
pkg/core/handler.go
Outdated
| // Record successful update | ||
| h.backoffTracker.RecordUpdate(instanceName) | ||
| } else { | ||
| // No changes detected - system is stable |
There was a problem hiding this comment.
I am not sure this is sound. If wave is triggered without a hash change (i.e. due to an annotation on the Deployment) that would reset the backoff. I guess we would have to check the backoff here as well.
There was a problem hiding this comment.
Even after reading this code and asking the LLM to add a clarifying comment, this is the part where I wanted to do the most testing :-D
I will have time to test this probably only next week.
There was a problem hiding this comment.
Those models do not have a notion of concurrency so dont expect them to produce race free solutions. They rarely do. I have seen too many LLM implementations with race, wrong order or dead locks ;-). Most of the times those models are not even able to fix it if you explain the issue to them.
There was a problem hiding this comment.
I saw a Mutex somewhere, but that also requires thorough review.
There was a problem hiding this comment.
That should be good. Its isolated to one method with defered release.
The race here is really that reconciles can (and will) happen for multiple unrelated reasons. Basically what needs to be changed is that the backoff check needs to happen for both if branches. I would move it up and simply bail out (maybe with two different log messages).
This definitely needs a test. The tests cursor generated look fine but arbirary to me (focus a bit unclear). However, the tests do not test the e2e behaviour at all.
Implements a minimum interval between updates (default: 10s, configurable) to prevent Wave from updating deployments too frequently when secrets or configmaps change rapidly. This prevents scenarios where a buggy controller rapidly updating secrets causes Wave to rapidly update deployments, which can overwhelm the Kubernetes API server. Key features: - Fixed minimum interval between updates (default: 10s) - Configurable via --min-update-interval flag - Configurable via Helm chart (minUpdateInterval value) - State tracked in-memory within the operator - Thread-safe implementation with mutex protection - Applies to ALL updates, even when config hashes change Fixes #182 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
c16f915 to
3dc7cc5
Compare
|
I did not want to close this, I just wanted to rename the branch :-( |
Add update throttling to prevent rapid deployment churn
Implements a minimum interval between updates (default: 10s, configurable) to prevent Wave from updating deployments too frequently when secrets or configmaps change rapidly.
This prevents scenarios where a buggy controller rapidly updating secrets causes Wave to rapidly update deployments, which can overwhelm the Kubernetes API server.
Key features:
Fixes #182
🤖 Generated with Claude Code
Co-Authored-By: Claude noreply@anthropic.com