Skip to content

Improve template download error message#8029

Merged
DaanHoogland merged 3 commits intoapache:4.18from
oscaralvaro:improve_download_template_errors_message
Oct 9, 2023
Merged

Improve template download error message#8029
DaanHoogland merged 3 commits intoapache:4.18from
oscaralvaro:improve_download_template_errors_message

Conversation

@oscaralvaro
Copy link
Copy Markdown
Contributor

@oscaralvaro oscaralvaro commented Oct 3, 2023

Description

This PR...
Improves the error messages to more accurately describe the reasons for failure to download a template. This is done via surfacing the underlying reason for the failure in the error message. This will facilitate troubleshooting.

This improved error messages are only shown to the admin role, for the rest of the users there is no change in error message details (tested with non-admin account).

Types of changes

  • Breaking change (fix or feature that would cause existing functionality to change)
  • New feature (non-breaking change which adds functionality)
  • Bug fix (non-breaking change which fixes an issue)
  • Enhancement (improves an existing feature and functionality)
  • Cleanup (Code refactoring and cleanup, that may add test cases)

Feature/Enhancement Scale or Bug Severity

Feature/Enhancement Scale

  • Major
  • Minor

Bug Severity

  • BLOCKER
  • Critical
  • Major
  • Minor
  • Trivial

Screenshots (if appropriate):

Manual testing via setting up a development environment and triggering failures. Two scenarios were tested with below resulting error messages. Several other failure reasons will be surfaced as well.

-Example: Template checksum mismatch
Before:
Unable to orchestrate start VM instance {id: "3112", name: "i-2-3112-VM", uuid: "db19b8af-00e0-4f7a-803b-a6c3b022ba5d", type="User"} due to [Template 203 could not be downloaded on pool 1, failing after trying on several hosts].

Now:
Unable to orchestrate start VM instance {id: "3112", name: "i-2-3112-VM", uuid: "db19b8af-00e0-4f7a-803b-a6c3b022ba5d", type="User"} due to [Template 203 could not be downloaded on pool 1, failing after trying on several hosts Details: Checksum validation failed].

UI:

Screenshot 2023-09-28 at 5 20 39 PM

-Example: Wrong template url

Before:
Unable to orchestrate start VM instance {id:
"3110", name: "¡-2-3110-VM", uuid:
"40478bcf-5ffa-4a18-9d22-
739d149df76", type="User") due to [Template 203 could not be downloaded on pool 1, failing after trying on several hosts].

Now:
Unable to orchestrate start VM instance {id:
"3110", name: "¡-2-3110-VM", uuid:
"40478bcf-5ffa-4a18-9d22-
739d149df76", type="User") due to [Template 203 could not be downloaded on pool 1, failing after trying on several hosts Details: Unable to download template: Error on HTTPS response].

UI:

Screenshot 2023-09-28 at 5 19 26 PM

@yadvr yadvr added this to the 4.18.2.0 milestone Oct 4, 2023
Copy link
Copy Markdown
Member

@yadvr yadvr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@yadvr
Copy link
Copy Markdown
Member

yadvr commented Oct 4, 2023

@blueorangutan package

@blueorangutan
Copy link
Copy Markdown

@rohityadavcloud a [SF] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress.

@blueorangutan
Copy link
Copy Markdown

Packaging result [SF]: ✔️ el7 ✔️ el8 ✔️ el9 ✔️ debian ✔️ suse15. SL-JID 7219

@codecov
Copy link
Copy Markdown

codecov bot commented Oct 4, 2023

Codecov Report

Merging #8029 (01a5ca1) into 4.18 (3d8cc63) will decrease coverage by 0.01%.
Report is 3 commits behind head on 4.18.
The diff coverage is 0.00%.

@@             Coverage Diff              @@
##               4.18    #8029      +/-   ##
============================================
- Coverage     13.07%   13.07%   -0.01%     
  Complexity     9107     9107              
============================================
  Files          2720     2720              
  Lines        257513   257520       +7     
  Branches      40151    40153       +2     
============================================
  Hits          33658    33658              
- Misses       219624   219630       +6     
- Partials       4231     4232       +1     
Files Coverage Δ
...ack/direct/download/DirectDownloadManagerImpl.java 7.94% <0.00%> (-0.15%) ⬇️

... and 1 file with indirect coverage changes

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@DaanHoogland
Copy link
Copy Markdown
Contributor

DaanHoogland commented Oct 4, 2023

thanks @oscaralvaro , I can imagine this would be an improvement in some cases but as @rohityadavcloud mentions, it might be exposing info to unprivileged users as well. Can you make an assessment of that?

@yadvr
Copy link
Copy Markdown
Member

yadvr commented Oct 4, 2023

@blueorangutan test

@blueorangutan
Copy link
Copy Markdown

@rohityadavcloud a [SF] Trillian-Jenkins test job (centos7 mgmt + kvm-centos7) has been kicked to run smoke tests

@mlsorensen
Copy link
Copy Markdown
Contributor

mlsorensen commented Oct 4, 2023

thanks @oscaralvaro , I can imagine this would be an improvement in some cases but as @rohityadavcloud mentions, it might be exposing info to unprivileged users as well. Can you make an assessment of that?

Yes, this may depend on the error. Certainly if users are able to register templates, they would also like the error feedback if they have a bad checksum configured, but there may be other errors leaked, like being able to indirectly probe what servers can be reached by the hosts. One could infer that just from failure to download templates as it is today, but this may give more detail.

@oscaralvaro
Copy link
Copy Markdown
Contributor Author

thanks for the feedback @rohityadavcloud @DaanHoogland @mlsorensen. I will filter the details to be shown just to admin users

@blueorangutan
Copy link
Copy Markdown

[SF] Trillian test result (tid-7836)
Environment: kvm-centos7 (x2), Advanced Networking with Mgmt server 7
Total time taken: 40250 seconds
Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr8029-t7836-kvm-centos7.zip
Smoke tests completed. 108 look OK, 0 have errors, 0 did not run
Only failed and skipped tests results shown below:

Test Result Time (s) Test File

@oscaralvaro
Copy link
Copy Markdown
Contributor Author

@rohityadavcloud @DaanHoogland @mlsorensen I've made changes to show error details only to admin role

I was wondering if other roles as DOMAIN_ADMIN, RESOURCE_DOMAIN_ADMIN would benefit from this information as well

Also, it feels that a middle ground in the error message detail is missing, because currently either the errors are too generic or too internal. A more granular error messaging (without leaking internals) would allow for a faster troubleshooting and better self-support

@DaanHoogland
Copy link
Copy Markdown
Contributor

@blueorangutan package

@blueorangutan
Copy link
Copy Markdown

@DaanHoogland a [SF] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress.

@blueorangutan
Copy link
Copy Markdown

Packaging result [SF]: ✖️ el7 ✖️ el8 ✖️ el9 ✖️ debian ✖️ suse15. SL-JID 7237

@DaanHoogland
Copy link
Copy Markdown
Contributor

you've got a compile error @oscaralvaro:

10:45:48 [ERROR] /jenkins/workspace/acs-centos7-pkg-builder/dist/rpmbuild/BUILD/cloudstack-4.18.2.0-SNAPSHOT/server/src/main/java/org/apache/cloudstack/direct/download/DirectDownloadManagerImpl.java:[371,65] error: cannot find symbol
10:45:48   symbol:   variable ACCOUNT_TYPE_ADMIN
10:45:48   location: interface Account

Did you forget to commit something?

@oscaralvaro
Copy link
Copy Markdown
Contributor Author

thanks for the commit @DaanHoogland , yes, there is a difference for Account type between 4.16 and 4.18, thanks for the fix

@DaanHoogland DaanHoogland reopened this Oct 6, 2023
Copy link
Copy Markdown
Contributor

@DaanHoogland DaanHoogland left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clgtm

@DaanHoogland
Copy link
Copy Markdown
Contributor

@blueorangutan package

@blueorangutan
Copy link
Copy Markdown

@DaanHoogland a [SF] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress.

@blueorangutan
Copy link
Copy Markdown

Packaging result [SF]: ✔️ el7 ✔️ el8 ✔️ el9 ✔️ debian ✔️ suse15. SL-JID 7251

@DaanHoogland
Copy link
Copy Markdown
Contributor

@blueorangutan test

@blueorangutan
Copy link
Copy Markdown

@DaanHoogland a [SF] Trillian-Jenkins test job (centos7 mgmt + kvm-centos7) has been kicked to run smoke tests

Copy link
Copy Markdown
Contributor

@stephankruggg stephankruggg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CLGTM, not manually tested

@blueorangutan
Copy link
Copy Markdown

[SF] Trillian test result (tid-7870)
Environment: kvm-centos7 (x2), Advanced Networking with Mgmt server 7
Total time taken: 42955 seconds
Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr8029-t7870-kvm-centos7.zip
Smoke tests completed. 108 look OK, 0 have errors, 0 did not run
Only failed and skipped tests results shown below:

Test Result Time (s) Test File

@DaanHoogland
Copy link
Copy Markdown
Contributor

@rohityadavcloud @DaanHoogland @mlsorensen I've made changes to show error details only to admin role

I was wondering if other roles as DOMAIN_ADMIN, RESOURCE_DOMAIN_ADMIN would benefit from this information as well

Also, it feels that a middle ground in the error message detail is missing, because currently either the errors are too generic or too internal. A more granular error messaging (without leaking internals) would allow for a faster troubleshooting and better self-support

i agree @oscaralvaro , but would leave it to a next iteration.

@DaanHoogland DaanHoogland merged commit c0128e2 into apache:4.18 Oct 9, 2023
shwstppr pushed a commit to shapeblue/cloudstack that referenced this pull request Oct 12, 2023
(cherry picked from commit c0128e2)
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants