-
Notifications
You must be signed in to change notification settings - Fork 21
CLOUDP-317886 - block removing cluster from MC Sharded deployment #495
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
MCK 1.5.0 Release NotesNew Features
Bug Fixes
|
require.NoError(t, err) | ||
addKubernetesTlsResources(ctx, client, rs) | ||
kubeClient, omConnectionFactory := mock.NewDefaultFakeClient(rs) | ||
reconciler := newReplicaSetReconciler(ctx, kubeClient, nil, "", "", false, false, omConnectionFactory.GetConnectionFunc) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this was previously using shardedClusterReconciler instead of replicaset reconciler
* User decides to remove `read-analytics` cluster, by removing the `clusterSpecItem` completely | ||
* Operator scales down members from `read-analytics` cluster one by one | ||
* Because the configuration does not have voting options specified anymore and by default `priority` is set to 1, the operator will remove one member, but the other two members will be reconfigured as voting members | ||
* `replicaset` contains now 9 voting members, which is not [supported by MongoDB](https://www.mongodb.com/docs/manual/reference/limits/#mongodb-limit-Number-of-Voting-Members-of-a-Replica-Set) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In practice there is a code in the operator that limits the number of voting members to 7 and mark the rest as non voting. But it's still a valid case of losing configuration and resulting in a non deterministic configuration of voting members.
https://github.com/mongodb/mongodb-kubernetes/blob/master/controllers/om/deployment.go#L1206-L1218
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great investigation and the fix! 👏 LGTM!
changelog/20251006_fix_block_removing_non_zero_member_cluster_from.md
Outdated
Show resolved
Hide resolved
changelog/20251006_fix_block_removing_non_zero_member_cluster_from.md
Outdated
Show resolved
Hide resolved
changelog/20251006_fix_block_removing_non_zero_member_cluster_from.md
Outdated
Show resolved
Hide resolved
changelog/20251006_fix_block_removing_non_zero_member_cluster_from.md
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
changelog/20251006_fix_block_removing_non_zero_member_cluster_from.md
Outdated
Show resolved
Hide resolved
…from.md Co-authored-by: Vivek Singh <vsingh.ggits.2010@gmail.com>
Summary
During investigation of https://jira.mongodb.org/browse/CLOUDP-317886 we have found that the operator panics, when the cluster is removed from
clusterSpecList
. Initially we thought about allowing to do that, but later on I have discussed the issue with @lsierant and the better approach was chosen:mongodb-kubernetes/controllers/operator/mongodbshardedcluster_controller.go
Lines 840 to 846 in 31c346e
...and in the changelog:
MultiClusterSharded: Block removing non-zero member cluster from MongoDB resource. This prevents from scaling down member cluster without current configuration available, which can lead to unexpected issues. Previously operator was crashing in that scenario, after the fix it will mark reconciliation as
Failed
with appropriate message. Example unsafe scenario that is now blocked:main
is used for application traffic,read-analytics
is used for read-only analyticsmain
cluster has 7 voting membersread-analytics
cluster has 3 non-voting membersread-analytics
cluster, by removing theclusterSpecItem
completelyread-analytics
cluster one by onepriority
is set to 1, the operator will remove one member, but the other two members will be reconfigured as voting membersreplicaset
contains now 9 voting members, which is not supported by MongoDBProof of Work
Passing CI + new unit tests for
blockNonEmptyClusterSpecItemRemoval
function.Checklist
skip-changelog
label if not needed