-
Notifications
You must be signed in to change notification settings - Fork 6
Collect logs in case of test failure #362
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Signed-off-by: Emmanuel Varagnat <emmanuel.varagnat@vates.tech>
2d37684 to
28d7eed
Compare
dinhngtu
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've reviewed up to "lib: use option to put ssh in background". Could you squash the console capture commits, as it's a bit difficult to review otherwise?
| for fixturename in getattr(request, 'fixturenames', []): | ||
| try: | ||
| fixture_value = request.getfixturevalue(fixturename) | ||
| # Check if fixture is a Host | ||
| if isinstance(fixture_value, Host): | ||
| test_hosts.add(fixture_value) | ||
| # Check if fixture is a VM | ||
| elif isinstance(fixture_value, VM): | ||
| test_vms.add(fixture_value) | ||
| test_hosts.add(fixture_value.host) | ||
| # Check if fixture is a list | ||
| elif isinstance(fixture_value, list): | ||
| for item in fixture_value: | ||
| if isinstance(item, Host): | ||
| test_hosts.add(item) | ||
| elif isinstance(item, VM): | ||
| test_vms.add(item) | ||
| test_hosts.add(item.host) | ||
| # Check if fixture is a dict with Host or VM values | ||
| elif isinstance(fixture_value, dict): | ||
| for value in fixture_value.values(): | ||
| if isinstance(value, Host): | ||
| test_hosts.add(value) | ||
| elif isinstance(value, VM): | ||
| test_vms.add(value) | ||
| test_hosts.add(value.host) | ||
| # Check if fixture has a 'host' attribute (like SR, Pool objects) | ||
| elif hasattr(fixture_value, 'host') and isinstance(fixture_value.host, Host): | ||
| test_hosts.add(fixture_value.host) | ||
| # Check if fixture is a Pool | ||
| elif hasattr(fixture_value, 'hosts') and isinstance(fixture_value.hosts, list): | ||
| test_hosts.update(h for h in fixture_value.hosts if isinstance(h, Host)) | ||
| except Exception: | ||
| # Some fixtures may not be available yet or may fail to load | ||
| pass |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This check seems a bit "magical" and brittle to me. Perhaps there's a clearer way to manage the list of hosts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will try to find something easier to understand.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that this is a bit magical, it is the solution with the minimal friction with existing code.
Other solutions require adding code to tests, or will make maintenance harder, or even add coupling between Host and VM class and the .
Maybe @ydirson might come up with a better solution ?
In the meantime I tried to make it a bit more readable and do introspection only in error case.
Sure, I could do that. |
I found https://github.com/sibson/vncdotool which looks quite feature-rich. |
I saw it, but it seems to handle the connection itself. In our case the VNC connection is handled by XAPI via websocket. |
Maybe could just do a proxy that creates a websocket and expose a local port that could be used by an external tool or library. I could investigate that. |
6da69d7 to
550cd61
Compare
|
I change VM console capture and removed the VNC client implementation. Now the script mainly act as a proxy and uses a library for VNC capture. |
550cd61 to
6216473
Compare
|
"lib: check presence of xen-bugtool reports" has become empty. Could this commit be dropped, or did something get lost during the rebase? |
Yes I remember seeing that. It seems that I forgot to remove the commit, but it wasn't lost it was merged into another. And it looks like it was merged into the wrong one, it should be in 10f4f1f and not 61d2e2f. |
Signed-off-by: Emmanuel Varagnat <emmanuel.varagnat@vates.tech>
6216473 to
1ae1d60
Compare
|
Similarly to what I said regarding the follow-up commit, such big change needs to come with an explanation for the reviewers/maintainers regarding what the goal is, how it is meant to be used and when, how it is supposed to integrate in CI jobs, why this is the appropriate solution, etc. Here I'm not sure that @xcp-ng/os-platform-release as maintainers have been provided with these answers. |
Description was updated |
glehmann
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add the types in the signature of the new methods in conftest.py?
conftest.py
Outdated
| test_hosts.add(fixture_value) | ||
| # Check if fixture is a list of hosts | ||
| elif isinstance(fixture_value, list) and all(isinstance(h, Host) for h in fixture_value): | ||
| test_hosts.update(fixture_value) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe add the element if it's a Host, to be consistent with what is done in Check if fixture is a Pool?
conftest.py
Outdated
| logging.debug(f"Record timestamp (end): {class_name}") | ||
| host.ssh(f"echo end {class_name} $(date '+%s') >> {HOST_TIMESTAMPS_FILE}") | ||
|
|
||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This commit is mostly empty and doesn't match the description
conftest.py
Outdated
|
|
||
| # Helper function for immediate console capture | ||
| def capture_vm_console(vm: VM, log_dir: str) -> str: | ||
| import sys |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
imports should be at the top of the file, unless required by a circular import
| parser = argparse.ArgumentParser( | ||
| description='VNC WebSocket Console Capture', | ||
| formatter_class=argparse.RawDescriptionHelpFormatter, | ||
| usage=""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't argparse able to generate that by itself?
scripts/capture-console.py
Outdated
|
|
||
| else: | ||
| # Neither mode specified - show nice usage | ||
| print("VNC WebSocket Console Capture") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should also probably be generated by argparse
1ae1d60 to
0257aef
Compare
Signed-off-by: Emmanuel Varagnat <emmanuel.varagnat@vates.tech>
And not solely rely on subprocess and early quit. Signed-off-by: Emmanuel Varagnat <emmanuel.varagnat@vates.tech>
0257aef to
911bee5
Compare
Screenshot of the VM screen will be saved for failed tests, if marked with pytest.mark.capture_console. This is based on new script capture-console.py that connect to XO-lite for capture. Signed-off-by: Emmanuel Varagnat <emmanuel.varagnat@vates.tech>
Signed-off-by: Emmanuel Varagnat <emmanuel.varagnat@vates.tech>
Signed-off-by: Emmanuel Varagnat <emmanuel.varagnat@vates.tech>
Extract from syslog type logs based on epoch timestamps. Signed-off-by: Emmanuel Varagnat <emmanuel.varagnat@vates.tech>
Test for log collection need to fail in order to trigger the log collection; so pytest.mark.xfail can't be used. But tests need to by referenced by a job, otherwise './jobs.py check' will fail. Signed-off-by: Emmanuel Varagnat <emmanuel.varagnat@vates.tech>
911bee5 to
8ebf7af
Compare
ydirson
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The review below is only for the host log collection feature.
The VM console capture and others should IMO live a separate PRs, there seems to be no reason one would block the other.
Same for the "smaller general improvements" like bumping pyright and ssh fix, they could have their own PR which could be merged faster.
| name = "pyright" | ||
| version = "1.1.402" | ||
| version = "1.1.407" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For this kind of change we should remember to get the branch rebased before merging (to catch anything new in master that would not pass 1.1.407)
| assert CACHE_IMPORTED_VM in [True, False] | ||
|
|
||
| # Session-level tracking for failed tests and their associated hosts | ||
| FAILED_TESTS_HOSTS = {} # {test_nodeid: set(Host)} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment essentially looks like a good candidate for being a type hint?
|
|
||
| # Session-level tracking for failed tests and their associated hosts | ||
| FAILED_TESTS_HOSTS = {} # {test_nodeid: set(Host)} | ||
| SESSION_HAS_FAILURES = False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we really need SESSION_HAS_FAILURES? It looks like it just reflects that FAILED_TESTS_HOSTS has contents or not?
| parser.addoption( | ||
| "--collect-logs-on-failure", | ||
| action="store_true", | ||
| default=True, | ||
| help="Automatically collect xen-bugtool logs from hosts when tests fail (default: True)" | ||
| ) | ||
| parser.addoption( | ||
| "--no-collect-logs-on-failure", | ||
| action="store_false", | ||
| dest="collect_logs_on_failure", | ||
| help="Disable automatic log collection on test failure" | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It may not be necessary to have a flag for expliciting the default behaviour?
| # Session-scoped fixture to create a shared log directory for all artifacts | ||
| @pytest.fixture(scope='session') | ||
| def session_log_dir(): | ||
| """ | ||
| Create and return a session-wide log directory for storing all test artifacts. | ||
| The directory is created at session start and shared by all fixtures. | ||
| Directory naming includes BUILD_NUMBER if running in Jenkins CI environment. | ||
| """ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doc says this fixtures creates the dir, but it is actually created by a different one.
| # Record test timestamps to easily extract logs | ||
| @pytest.fixture(scope='class', autouse=True) | ||
| def bugreport_timestamp(request, host): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Record test start and end timestamps"? Could make sense to be a docstring?
The "to easily extract logs" could have more details, as is it is no obvious how those timestamps help that.
That indeed seems to be related to "scripts: add log extraction script for investigation", woudn't it make sense to regroup it into that other commit?
| # Check all fixtures used by this test | ||
| # XXX: this is a bit hard to understand but this introspection of fixtures | ||
| # is the best way to gather VMs and Hosts. Other alternatives require, | ||
| # maintenance, boilerplate and are more prone to miss some cases. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it's "hard to understand" maybe an overview of how it works would be useful (it's not that hard to understand actually 😉)
I'm a bit unsure this is a good approach: it still requires to enumerate the contexts in which a host can be found, so it seems possible that we miss some, or some other contexts get added in the future and we would need (or forget) to add them here.
Can't we instead get the Host ctor to cheaply register all hosts used (somewhere in the request), so we can just pick them when we see the test fail?
(sidenote: if we keep it, the function name is a bit too generic and describe how it does things, not what it does)
|
|
||
|
|
||
| # Session-scoped fixture to create a shared log directory for all artifacts |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since there is no single obvious entrypoint, it could be useful to have a little write-up first, on the overall architecture of the feature, and how the fixtures work together towards the goal. Right here before defining them could be a good-enough place?
Maybe session_log_dir could come last in definition order, as it looks more of an implementation detail, and collect_logs_on_session_failure could come first?
| # Path to timestamp file on remote hosts for log extraction | ||
| HOST_TIMESTAMPS_FILE = "/tmp/pytest-timestamps.log" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We would likely want to use a per-session temporary file, or there might be issues when several sessions run simultaneously.
| @pytest.fixture(scope='session', autouse=True) | ||
| def check_bug_reports(host): | ||
| """ | ||
| That could be the sign of interrupted tests and some cleanup might be | ||
| required. We don't want to accumulate too much of these files. | ||
| """ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Docstring should start with a one-line explaining what the fixture does
For tests that fails rarely or only on CI we need a way to investigate the failure.
This PR is perform a log collection (only using xen-bugtool for the moment)
xen-bugtool reports can be very big (~100MB or more) so we might change that
later, or filter report.
Timestamps for each tests are recorded and the start and stop of tests, so that corresponding
logs can be extracted for analysis (see extract_logs.py). Timestamps are recorded in XCP-ng to avoid difference in
clocks.
No change is required to enable log collection. But console capture is only enabled
on demand, using @pytest.mark.capture_console.
Log collection do not happen at each test, but at the end of the test session.
For Windows (mainly), a screenshot can be done for each failing test; this uses the VNC console
through XO-lite.