Skip to content

Conversation

@milanchovatiya-boop
Copy link

@milanchovatiya-boop milanchovatiya-boop commented Jan 14, 2026

Description

Fixes HPA vertical scaling issue where old deployments were never scaled down during rolling updates when HPA is enabled.

Problem

When performing vertical resource upgrades (CPU/memory) or Rolling image/version on components with HPA enabled (replicas: -1), the operator would:

  • Create and bootstrap new deployment
  • Allow HPA to scale up new deployment
  • Never scale down old deployment

Result: Rollout is stuck while Both deployments are running indefinitely, wasting resources.

Also Milvus deployment is stuck and never finished it keeps complaining about below

rollout not finished, ...., reason, last deploy not scale to 0

This makes things really stuck

Solution

Simplified approach:

  1. Wait for new deployment to have ReadyReplicas >= oldDeployment.Replicas
  2. Scale down old deployment to 0 immediately
  3. Fast scale-down prevents HPA from fighting back

Testing

go test ./pkg/controllers/... -vAll tests pass including new HPA rolling update tests.

Also performed manual testing on cluster mode with HPAs.

Related Issues

#416 HPAs on QueryNodes prevent green/blue QueryNode deployments from completing

Checklist

  • Code follows project style guidelines
  • Tests added/updated
  • All tests pass
  • Documentation updated (if needed)

@sre-ci-robot
Copy link
Collaborator

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: milanchovatiya-boop
To complete the pull request process, please assign alintalu after the PR has been reviewed.
You can assign the PR to them by writing /assign @alintalu in a comment when ready.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@sre-ci-robot
Copy link
Collaborator

Welcome @milanchovatiya-boop! It looks like this is your first PR to zilliztech/milvus-operator 🎉

- Wait for current deployment ready replicas >= old deployment replicas
- Scale down old deployment to 0 immediately to avoid HPA conflict
- Add tests for HPA rolling update behavior
- Remove HPA query dependency for simpler logic

Fixes issue where old deployments were never scaled down during
rolling updates when HPA is enabled (replicas: -1)."

Signed-off-by: Milan Chovatiya <milan.chovatiya@reddit.com>
@milanchovatiya-boop milanchovatiya-boop force-pushed the fix_last_deployments_on_hpa branch from 0dafe3c to 7e46d88 Compare January 14, 2026 16:02
@milanchovatiya-boop milanchovatiya-boop marked this pull request as ready for review January 14, 2026 16:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants