Fix: Select another pod if all hosts in the pod becomes unavailable#8085
Conversation
Codecov Report
@@ Coverage Diff @@
## 4.18 #8085 +/- ##
============================================
+ Coverage 13.02% 13.10% +0.07%
- Complexity 9032 9123 +91
============================================
Files 2720 2720
Lines 257080 257598 +518
Branches 40088 40158 +70
============================================
+ Hits 33476 33748 +272
- Misses 219400 219587 +187
- Partials 4204 4263 +59
... and 15 files with indirect coverage changes 📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
|
@blueorangutan package |
|
@vishesh92 a [SF] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress. |
|
Packaging result [SF]: ✔️ el7 ✔️ el8 ✔️ el9 ✔️ debian ✔️ suse15. SL-JID 7350 |
|
@blueorangutan test matrix |
|
@rohityadavcloud a [SF] Trillian-Jenkins matrix job (centos7 mgmt + xenserver71, rocky8 mgmt + vmware67u3, centos7 mgmt + kvmcentos7) has been kicked to run smoke tests |
|
[SF] Trillian test result (tid-7955)
|
|
[SF] Trillian test result (tid-7953)
|
|
@blueorangutan package |
|
@vishesh92 a [SF] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress. |
|
Packaging result [SF]: ✔️ el7 ✔️ el8 ✔️ el9 ✔️ debian ✔️ suse15. SL-JID 7412 |
|
@blueorangutan test |
|
@DaanHoogland a [SL] Trillian-Jenkins test job (centos7 mgmt + kvm-centos7) has been kicked to run smoke tests |
|
[SF] Trillian test result (tid-8070)
|
b0057f7 to
cad6412
Compare
|
@blueorangutan package |
|
@DaanHoogland a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress. |
cad6412 to
db22bdf
Compare
|
@blueorangutan package |
|
@vishesh92 a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress. |
|
Packaging result [SF]: ✔️ el7 ✔️ el8 ✔️ el9 ✔️ debian ✔️ suse15. SL-JID 7506 |
|
Packaging result [SF]: ✔️ el7 ✔️ el8 ✔️ el9 ✔️ debian ✔️ suse15. SL-JID 7508 |
|
@blueorangutan package |
|
@vishesh92 a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress. |
|
Packaging result [SF]: ✔️ el7 ✔️ el8 ✔️ el9 ✔️ debian ✔️ suse15. SL-JID 7520 |
DaanHoogland
left a comment
There was a problem hiding this comment.
clgtm, I'll be testing it manually to simulate the right conditions.
8af2720 to
07339be
Compare
07339be to
a425b07
Compare
|
@blueorangutan package |
|
@DaanHoogland a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress. |
|
Packaging result [SF]: ✔️ el7 ✔️ el8 ✔️ el9 ✔️ debian ✔️ suse15. SL-JID 7611 |
|
@blueorangutan test alma9 kvm-alma9 |
|
@DaanHoogland a [SL] Trillian-Jenkins test job (alma9 mgmt + kvm-alma9) has been kicked to run smoke tests |
|
[SF] Trillian test result (tid-8220)
|
|
not sure if the errors are related; |
|
@blueorangutan test alma9 kvm-alma9 |
|
@DaanHoogland a [SL] Trillian-Jenkins test job (alma9 mgmt + kvm-alma9) has been kicked to run smoke tests |
|
JFYI @DaanHoogland @vishesh92 I had some issues using Alma Linux (due to repo/mirror issue) but OL8/OL9 seems to work fine with backend CI/CD. |
|
@blueorangutan test alma9 kvm-alma9 |
this didn´t work 🤯 , so started it manually |
|
@DaanHoogland [SL] unsupported parameters provided. Supported mgmt server os are: |
results:
These error are all over the place at the moment, not specific to this issue. |
|
tested according to spec in the description. |
Description
In case of a failure while deploying VM, we reset the host_id for the failed VM to null but not the pod_id. This results in failure when there is enough capacity in another pod, but not in the existing pod.
Types of changes
Feature/Enhancement Scale or Bug Severity
Feature/Enhancement Scale
Bug Severity
Screenshots (if appropriate):
How Has This Been Tested?
How did you try to break this feature and the system with this change?
This needs an environment with 2 pods to reproduce the issue and test the fix.
cloudstack/server/src/main/java/com/cloud/capacity/CapacityManagerImpl.java
Line 383 in 9df580c
SELECT id, state, pod_id, host_id, last_host_id FROM vm_instance ORDER BY id DESC LIMIT 1;on theclouddatabase.UPDATE host_pod_ref SET allocation_state = 'Disabled' WHERE id = <pod id>.hostHasCpuCapability = falsein the debugger to throw an error in the first run.