Skip to content

VMware: Improve error messaging / logs when starting non-user VMs, and secondary storage not available or doesn't have enough capacity#9207

Merged
vishesh92 merged 4 commits intoapache:4.19from
shapeblue:vmware-sec-storage-check-improvements
Jun 25, 2024
Merged

VMware: Improve error messaging / logs when starting non-user VMs, and secondary storage not available or doesn't have enough capacity#9207
vishesh92 merged 4 commits intoapache:4.19from
shapeblue:vmware-sec-storage-check-improvements

Conversation

@sureshanaparti
Copy link
Copy Markdown
Contributor

@sureshanaparti sureshanaparti commented Jun 10, 2024

Description

This PR improves error messaging / logs when starting non-user VMs in VMware, and secondary storage not available or doesn't have enough capacity. It addresses inappropriate messages part 2, 3 in #8390.

Types of changes

  • Breaking change (fix or feature that would cause existing functionality to change)
  • New feature (non-breaking change which adds functionality)
  • Bug fix (non-breaking change which fixes an issue)
  • Enhancement (improves an existing feature and functionality)
  • Cleanup (Code refactoring and cleanup, that may add test cases)
  • build/CI

Feature/Enhancement Scale or Bug Severity

Feature/Enhancement Scale

  • Major
  • Minor

Bug Severity

  • BLOCKER
  • Critical
  • Major
  • Minor
  • Trivial

Screenshots (if appropriate):

How Has This Been Tested?

Set the config secondary.storage.capacity.thresholdto below the available secondary storage capacity, stop SSVM and start it. SSVM will fail to start with the below logs.

SSVMStartFailed_NoSecStorageCapacity
2024-06-24 21:24:59,152 WARN  [c.c.s.StatsCollector] (DirectAgent-35:ctx-1a4ece48 10.0.35.32, job-65/job-66, cmd: StartCommand) (logid:1c0be4ca) Image storage [1] has not enough capacity. Capacity: total=[1 TB], used=[1 TB], threshold=[20.000000298023224%].
2024-06-24 21:24:59,152 INFO  [c.c.h.v.m.VmwareManagerImpl] (DirectAgent-35:ctx-1a4ece48 10.0.35.32, job-65/job-66, cmd: StartCommand) (logid:1c0be4ca) Secondary storage is either not having free capacity or not NFS, then use cache/staging storage instead
2024-06-24 21:24:59,154 WARN  [c.c.h.v.m.VmwareManagerImpl] (DirectAgent-35:ctx-1a4ece48 10.0.35.32, job-65/job-66, cmd: StartCommand) (logid:1c0be4ca) No cache/staging storage found when NFS secondary storage with free capacity not available or non-NFS secondary storage is used
2024-06-24 21:24:59,155 INFO  [c.c.h.v.u.VmwareHelper] (DirectAgent-35:ctx-1a4ece48 10.0.35.32, job-65/job-66, cmd: StartCommand) (logid:1c0be4ca) [ignored]failed to get message for exception: NFS secondary or cache storage of dc 1 either doesn't have enough capacity (has reached 20% usage threshold) or not ready yet, or non-NFS secondary storage is used
2024-06-24 21:24:59,155 ERROR [c.c.h.v.r.VmwareResource] (DirectAgent-35:ctx-1a4ece48 10.0.35.32, job-65/job-66, cmd: StartCommand) (logid:1c0be4ca) StartCommand failed due to [Exception: java.lang.Exception
Message: NFS secondary or cache storage of dc 1 either doesn't have enough capacity (has reached 20% usage threshold) or not ready yet, or non-NFS secondary storage is used
].
java.lang.Exception: NFS secondary or cache storage of dc 1 either doesn't have enough capacity (has reached 20% usage threshold) or not ready yet, or non-NFS secondary storage is used
	at com.cloud.hypervisor.vmware.resource.VmwareResource.execute(VmwareResource.java:2286)
	at com.cloud.hypervisor.vmware.resource.VmwareResource.executeRequest(VmwareResource.java:566)

How did you try to break this feature and the system with this change?

@sureshanaparti
Copy link
Copy Markdown
Contributor Author

@blueorangutan package

@blueorangutan
Copy link
Copy Markdown

@sureshanaparti a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress.

@codecov
Copy link
Copy Markdown

codecov bot commented Jun 10, 2024

Codecov Report

Attention: Patch coverage is 0% with 15 lines in your changes missing coverage. Please review.

Project coverage is 14.95%. Comparing base (0f77019) to head (fd46dac).
Report is 71 commits behind head on 4.19.

Files Patch % Lines
...oud/hypervisor/vmware/resource/VmwareResource.java 0.00% 9 Missing ⚠️
...d/hypervisor/vmware/manager/VmwareManagerImpl.java 0.00% 5 Missing ⚠️
...e/image/manager/ImageStoreProviderManagerImpl.java 0.00% 1 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##              4.19    #9207       +/-   ##
============================================
+ Coverage     4.30%   14.95%   +10.64%     
- Complexity       0    11013    +11013     
============================================
  Files          363     5387     +5024     
  Lines        29312   470348   +441036     
  Branches      5118    61109    +55991     
============================================
+ Hits          1261    70319    +69058     
- Misses       27908   392237   +364329     
- Partials       143     7792     +7649     
Flag Coverage Δ
uitests 4.28% <ø> (-0.02%) ⬇️
unittests 15.66% <0.00%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@blueorangutan
Copy link
Copy Markdown

Packaging result [SF]: ✔️ el7 ✔️ el8 ✔️ el9 ✔️ debian ✔️ suse15. SL-JID 9849

@sureshanaparti
Copy link
Copy Markdown
Contributor Author

@blueorangutan test

@blueorangutan
Copy link
Copy Markdown

@sureshanaparti a [SL] Trillian-Jenkins test job (centos7 mgmt + kvm-centos7) has been kicked to run smoke tests

Copy link
Copy Markdown
Member

@yadvr yadvr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - didn't verify it

@blueorangutan
Copy link
Copy Markdown

[SF] Trillian test result (tid-10408)
Environment: kvm-centos7 (x2), Advanced Networking with Mgmt server 7
Total time taken: 51619 seconds
Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr9207-t10408-kvm-centos7.zip
Smoke tests completed. 128 look OK, 3 have errors, 0 did not run
Only failed and skipped tests results shown below:

Test Result Time (s) Test File
test_02_trigger_shutdown Failure 341.56 test_safe_shutdown.py
test_01_verify_ipv6_vpc Failure 569.16 test_vpc_ipv6.py
test_02_redundant_VPC_default_routes Failure 361.38 test_vpc_redundant.py
test_05_rvpc_multi_tiers Failure 712.59 test_vpc_redundant.py
test_05_rvpc_multi_tiers Error 712.60 test_vpc_redundant.py

Copy link
Copy Markdown
Contributor

@DaanHoogland DaanHoogland left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clgtm

@sureshanaparti sureshanaparti force-pushed the vmware-sec-storage-check-improvements branch from 68ffea9 to 6557da6 Compare June 12, 2024 09:09
@sureshanaparti sureshanaparti marked this pull request as ready for review June 12, 2024 09:10
@sureshanaparti
Copy link
Copy Markdown
Contributor Author

@blueorangutan package

@sureshanaparti sureshanaparti added this to the 4.19.1.0 milestone Jun 12, 2024
@blueorangutan
Copy link
Copy Markdown

@sureshanaparti a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress.

@blueorangutan
Copy link
Copy Markdown

Packaging result [SF]: ✔️ el7 ✔️ el8 ✔️ el9 ✔️ debian ✔️ suse15. SL-JID 9891

@sureshanaparti
Copy link
Copy Markdown
Contributor Author

@blueorangutan package

@blueorangutan
Copy link
Copy Markdown

@sureshanaparti a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress.

Copy link
Copy Markdown
Contributor

@andrijapanicsb andrijapanicsb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@blueorangutan
Copy link
Copy Markdown

Packaging result [SF]: ✔️ el7 ✔️ el8 ✔️ el9 ✔️ debian ✔️ suse15. SL-JID 9932

@DaanHoogland
Copy link
Copy Markdown
Contributor

@blueorangutan test alma9 vmware-70u2

@blueorangutan
Copy link
Copy Markdown

@DaanHoogland a [SL] Trillian-Jenkins test job (alma9 mgmt + vmware-70u2) has been kicked to run smoke tests

@blueorangutan
Copy link
Copy Markdown

[SF] Trillian Build Failed (tid-10437)

@DaanHoogland
Copy link
Copy Markdown
Contributor

@blueorangutan test alma9 vmware-70u3

@blueorangutan
Copy link
Copy Markdown

@DaanHoogland a [SL] Trillian-Jenkins test job (alma9 mgmt + vmware-70u3) has been kicked to run smoke tests

@blueorangutan
Copy link
Copy Markdown

[SF] Trillian test result (tid-10439)
Environment: vmware-70u3 (x2), Advanced Networking with Mgmt server a9
Total time taken: 53804 seconds
Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr9207-t10439-vmware-70u3.zip
Smoke tests completed. 129 look OK, 2 have errors, 0 did not run
Only failed and skipped tests results shown below:

Test Result Time (s) Test File
test_02_balanced_drs_algorithm Failure 132.20 test_cluster_drs.py
test_05_rvpc_multi_tiers Error 580.79 test_vpc_redundant.py
test_05_rvpc_multi_tiers Error 580.82 test_vpc_redundant.py

@sureshanaparti
Copy link
Copy Markdown
Contributor Author

@blueorangutan package

@blueorangutan
Copy link
Copy Markdown

@sureshanaparti a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress.

@blueorangutan
Copy link
Copy Markdown

Packaging result [SF]: ✔️ el7 ✔️ el8 ✔️ el9 ✔️ debian ✔️ suse15. SL-JID 10109

@vishesh92 vishesh92 merged commit 620ed16 into apache:4.19 Jun 25, 2024
@vishesh92 vishesh92 deleted the vmware-sec-storage-check-improvements branch June 25, 2024 06:55
dhslove pushed a commit to ablecloud-team/ablestack-cloud that referenced this pull request Jul 2, 2024
…d secondary storage not available or doesn't have enough capacity (apache#9207)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

No open projects
Status: Done

Development

Successfully merging this pull request may close these issues.

VMware: System VMs can't be started when no secondary storage with enough capacity + wrong messages around secondary storage

6 participants