Add option --cloud-init-coverage to gather code coverage data #706
Add option --cloud-init-coverage to gather code coverage data #706zhaohuijuan wants to merge 1 commit intovirt-s1:masterfrom
Conversation
Reviewer's GuideIntroduce an optional cloud-init code coverage flow to os-tests, wired from CLI flags through the runner to per‑test VM handling, including SSH-based .coverage retrieval, incremental combination, and report generation, while making the feature fully opt‑in and robust to deleted working directories. File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
There was a problem hiding this comment.
Hey - I've found 10 security issues, and left some high level feedback:
Security issues:
- Detected subprocess function 'run' without a static string. If this data can be controlled by a malicious actor, it may be an instance of command injection. Audit the use of this call to ensure it is not controllable by an external resource. You may consider using 'shlex.escape()'. (link)
- Detected subprocess function 'run' without a static string. If this data can be controlled by a malicious actor, it may be an instance of command injection. Audit the use of this call to ensure it is not controllable by an external resource. You may consider using 'shlex.escape()'. (link)
- Detected subprocess function 'run' without a static string. If this data can be controlled by a malicious actor, it may be an instance of command injection. Audit the use of this call to ensure it is not controllable by an external resource. You may consider using 'shlex.escape()'. (link)
- Detected subprocess function 'run' without a static string. If this data can be controlled by a malicious actor, it may be an instance of command injection. Audit the use of this call to ensure it is not controllable by an external resource. You may consider using 'shlex.escape()'. (link)
- Detected subprocess function 'run' without a static string. If this data can be controlled by a malicious actor, it may be an instance of command injection. Audit the use of this call to ensure it is not controllable by an external resource. You may consider using 'shlex.escape()'. (link)
- Detected subprocess function 'run' without a static string. If this data can be controlled by a malicious actor, it may be an instance of command injection. Audit the use of this call to ensure it is not controllable by an external resource. You may consider using 'shlex.escape()'. (link)
- Detected subprocess function 'run' without a static string. If this data can be controlled by a malicious actor, it may be an instance of command injection. Audit the use of this call to ensure it is not controllable by an external resource. You may consider using 'shlex.escape()'. (link)
- Detected subprocess function 'Popen' without a static string. If this data can be controlled by a malicious actor, it may be an instance of command injection. Audit the use of this call to ensure it is not controllable by an external resource. You may consider using 'shlex.escape()'. (link)
- Detected subprocess function 'run' without a static string. If this data can be controlled by a malicious actor, it may be an instance of command injection. Audit the use of this call to ensure it is not controllable by an external resource. You may consider using 'shlex.escape()'. (link)
- Detected subprocess function 'run' without a static string. If this data can be controlled by a malicious actor, it may be an instance of command injection. Audit the use of this call to ensure it is not controllable by an external resource. You may consider using 'shlex.escape()'. (link)
General comments:
- In html_runner.py you call utils_lib.pull_cloudinit_coverage_from_vm / incremental_combine_cloudinit_coverage / generate_cloudinit_coverage_report_on_vm but never import utils_lib, so this module will raise a NameError at runtime unless you add the appropriate import.
- The per-test coverage hook in html_runner mutates the test method on the TestCase instance without restoring the original, which can surprise other tooling or re-runs of the same instance; consider using a local wrapper (e.g. calling a helper around ts(result)) or restoring the original method after execution instead of overwriting it.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- In html_runner.py you call utils_lib.pull_cloudinit_coverage_from_vm / incremental_combine_cloudinit_coverage / generate_cloudinit_coverage_report_on_vm but never import utils_lib, so this module will raise a NameError at runtime unless you add the appropriate import.
- The per-test coverage hook in html_runner mutates the test method on the TestCase instance without restoring the original, which can surprise other tooling or re-runs of the same instance; consider using a local wrapper (e.g. calling a helper around ts(result)) or restoring the original method after execution instead of overwriting it.
## Individual Comments
### Comment 1
<location path="os_tests/libs/utils_lib.py" line_range="2541" />
<code_context>
ret = subprocess.run(cmd, capture_output=True, timeout=15, env=env)
</code_context>
<issue_to_address>
**security (python.lang.security.audit.dangerous-subprocess-use-audit):** Detected subprocess function 'run' without a static string. If this data can be controlled by a malicious actor, it may be an instance of command injection. Audit the use of this call to ensure it is not controllable by an external resource. You may consider using 'shlex.escape()'.
*Source: opengrep*
</issue_to_address>
### Comment 2
<location path="os_tests/libs/utils_lib.py" line_range="2557" />
<code_context>
ret = subprocess.run(cmd, capture_output=True, timeout=15, env=env)
</code_context>
<issue_to_address>
**security (python.lang.security.audit.dangerous-subprocess-use-audit):** Detected subprocess function 'run' without a static string. If this data can be controlled by a malicious actor, it may be an instance of command injection. Audit the use of this call to ensure it is not controllable by an external resource. You may consider using 'shlex.escape()'.
*Source: opengrep*
</issue_to_address>
### Comment 3
<location path="os_tests/libs/utils_lib.py" line_range="2582" />
<code_context>
ret = subprocess.run(cmd, capture_output=True, timeout=30, env=env)
</code_context>
<issue_to_address>
**security (python.lang.security.audit.dangerous-subprocess-use-audit):** Detected subprocess function 'run' without a static string. If this data can be controlled by a malicious actor, it may be an instance of command injection. Audit the use of this call to ensure it is not controllable by an external resource. You may consider using 'shlex.escape()'.
*Source: opengrep*
</issue_to_address>
### Comment 4
<location path="os_tests/libs/utils_lib.py" line_range="2604" />
<code_context>
ret = subprocess.run(cmd, stdin=f, capture_output=True, timeout=60, env=env)
</code_context>
<issue_to_address>
**security (python.lang.security.audit.dangerous-subprocess-use-audit):** Detected subprocess function 'run' without a static string. If this data can be controlled by a malicious actor, it may be an instance of command injection. Audit the use of this call to ensure it is not controllable by an external resource. You may consider using 'shlex.escape()'.
*Source: opengrep*
</issue_to_address>
### Comment 5
<location path="os_tests/libs/utils_lib.py" line_range="2624" />
<code_context>
ret = subprocess.run(cmd, capture_output=True, timeout=120, env=env)
</code_context>
<issue_to_address>
**security (python.lang.security.audit.dangerous-subprocess-use-audit):** Detected subprocess function 'run' without a static string. If this data can be controlled by a malicious actor, it may be an instance of command injection. Audit the use of this call to ensure it is not controllable by an external resource. You may consider using 'shlex.escape()'.
*Source: opengrep*
</issue_to_address>
### Comment 6
<location path="os_tests/libs/utils_lib.py" line_range="2648" />
<code_context>
ret = subprocess.run(cmd, capture_output=True, timeout=120, env=env)
</code_context>
<issue_to_address>
**security (python.lang.security.audit.dangerous-subprocess-use-audit):** Detected subprocess function 'run' without a static string. If this data can be controlled by a malicious actor, it may be an instance of command injection. Audit the use of this call to ensure it is not controllable by an external resource. You may consider using 'shlex.escape()'.
*Source: opengrep*
</issue_to_address>
### Comment 7
<location path="os_tests/libs/utils_lib.py" line_range="2670" />
<code_context>
ret = subprocess.run(ssh_cmd, stdout=f, stderr=subprocess.PIPE, timeout=60, env=env)
</code_context>
<issue_to_address>
**security (python.lang.security.audit.dangerous-subprocess-use-audit):** Detected subprocess function 'run' without a static string. If this data can be controlled by a malicious actor, it may be an instance of command injection. Audit the use of this call to ensure it is not controllable by an external resource. You may consider using 'shlex.escape()'.
*Source: opengrep*
</issue_to_address>
### Comment 8
<location path="os_tests/libs/utils_lib.py" line_range="2691-2693" />
<code_context>
proc = subprocess.Popen(
ssh_cmd, stdout=subprocess.PIPE, stderr=subprocess.PIPE, env=env
)
</code_context>
<issue_to_address>
**security (python.lang.security.audit.dangerous-subprocess-use-audit):** Detected subprocess function 'Popen' without a static string. If this data can be controlled by a malicious actor, it may be an instance of command injection. Audit the use of this call to ensure it is not controllable by an external resource. You may consider using 'shlex.escape()'.
*Source: opengrep*
</issue_to_address>
### Comment 9
<location path="os_tests/libs/utils_lib.py" line_range="2830" />
<code_context>
ret = subprocess.run(cmd, cwd=work_dir, capture_output=True, timeout=60)
</code_context>
<issue_to_address>
**security (python.lang.security.audit.dangerous-subprocess-use-audit):** Detected subprocess function 'run' without a static string. If this data can be controlled by a malicious actor, it may be an instance of command injection. Audit the use of this call to ensure it is not controllable by an external resource. You may consider using 'shlex.escape()'.
*Source: opengrep*
</issue_to_address>
### Comment 10
<location path="os_tests/libs/utils_lib.py" line_range="2971-2976" />
<code_context>
ret = subprocess.run(
combine_cmd,
cwd=work_dir,
capture_output=True,
timeout=60,
)
</code_context>
<issue_to_address>
**security (python.lang.security.audit.dangerous-subprocess-use-audit):** Detected subprocess function 'run' without a static string. If this data can be controlled by a malicious actor, it may be an instance of command injection. Audit the use of this call to ensure it is not controllable by an external resource. You may consider using 'shlex.escape()'.
*Source: opengrep*
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
| # 1) Verify SSH auth with a simple command | ||
| try: | ||
| cmd = base + ['whoami'] | ||
| ret = subprocess.run(cmd, capture_output=True, timeout=15, env=env) |
There was a problem hiding this comment.
security (python.lang.security.audit.dangerous-subprocess-use-audit): Detected subprocess function 'run' without a static string. If this data can be controlled by a malicious actor, it may be an instance of command injection. Audit the use of this call to ensure it is not controllable by an external resource. You may consider using 'shlex.escape()'.
Source: opengrep
| for remote_path in ('/var/tmp/.coverage', '/.coverage'): | ||
| try: | ||
| cmd = base + ['ls -la {} 2>&1'.format(remote_path)] | ||
| ret = subprocess.run(cmd, capture_output=True, timeout=15, env=env) |
There was a problem hiding this comment.
security (python.lang.security.audit.dangerous-subprocess-use-audit): Detected subprocess function 'run' without a static string. If this data can be controlled by a malicious actor, it may be an instance of command injection. Audit the use of this call to ensure it is not controllable by an external resource. You may consider using 'shlex.escape()'.
Source: opengrep
| for remote_path in ('/var/tmp/.coverage', '/.coverage'): | ||
| try: | ||
| cmd = base + ['cat {} 2>/dev/null'.format(remote_path)] | ||
| ret = subprocess.run(cmd, capture_output=True, timeout=30, env=env) |
There was a problem hiding this comment.
security (python.lang.security.audit.dangerous-subprocess-use-audit): Detected subprocess function 'run' without a static string. If this data can be controlled by a malicious actor, it may be an instance of command injection. Audit the use of this call to ensure it is not controllable by an external resource. You may consider using 'shlex.escape()'.
Source: opengrep
| try: | ||
| with open(local_path, 'rb') as f: | ||
| cmd = base + ['sudo tee {} > /dev/null'.format(remote_path)] | ||
| ret = subprocess.run(cmd, stdin=f, capture_output=True, timeout=60, env=env) |
There was a problem hiding this comment.
security (python.lang.security.audit.dangerous-subprocess-use-audit): Detected subprocess function 'run' without a static string. If this data can be controlled by a malicious actor, it may be an instance of command injection. Audit the use of this call to ensure it is not controllable by an external resource. You may consider using 'shlex.escape()'.
Source: opengrep
| '--data-file={} --directory={} --ignore-errors' | ||
| ).format(data_file, html_dir) | ||
| cmd = base + [remote_cmd] | ||
| ret = subprocess.run(cmd, capture_output=True, timeout=120, env=env) |
There was a problem hiding this comment.
security (python.lang.security.audit.dangerous-subprocess-use-audit): Detected subprocess function 'run' without a static string. If this data can be controlled by a malicious actor, it may be an instance of command injection. Audit the use of this call to ensure it is not controllable by an external resource. You may consider using 'shlex.escape()'.
Source: opengrep
| '--data-file={} -o {} --ignore-errors' | ||
| ).format(data_file, xml_path) | ||
| cmd = base + [remote_cmd] | ||
| ret = subprocess.run(cmd, capture_output=True, timeout=120, env=env) |
There was a problem hiding this comment.
security (python.lang.security.audit.dangerous-subprocess-use-audit): Detected subprocess function 'run' without a static string. If this data can be controlled by a malicious actor, it may be an instance of command injection. Audit the use of this call to ensure it is not controllable by an external resource. You may consider using 'shlex.escape()'.
Source: opengrep
| os.makedirs(os.path.dirname(local_path) or '.', exist_ok=True) | ||
| ssh_cmd = base + ['sudo cat {}'.format(remote_path)] | ||
| with open(local_path, 'wb') as f: | ||
| ret = subprocess.run(ssh_cmd, stdout=f, stderr=subprocess.PIPE, timeout=60, env=env) |
There was a problem hiding this comment.
security (python.lang.security.audit.dangerous-subprocess-use-audit): Detected subprocess function 'run' without a static string. If this data can be controlled by a malicious actor, it may be an instance of command injection. Audit the use of this call to ensure it is not controllable by an external resource. You may consider using 'shlex.escape()'.
Source: opengrep
| proc = subprocess.Popen( | ||
| ssh_cmd, stdout=subprocess.PIPE, stderr=subprocess.PIPE, env=env | ||
| ) |
There was a problem hiding this comment.
security (python.lang.security.audit.dangerous-subprocess-use-audit): Detected subprocess function 'Popen' without a static string. If this data can be controlled by a malicious actor, it may be an instance of command injection. Audit the use of this call to ensure it is not controllable by an external resource. You may consider using 'shlex.escape()'.
Source: opengrep
| ] | ||
| else: | ||
| cmd = [sys.executable, '-m', 'coverage', 'combine', '--keep', new_file_path] | ||
| ret = subprocess.run(cmd, cwd=work_dir, capture_output=True, timeout=60) |
There was a problem hiding this comment.
security (python.lang.security.audit.dangerous-subprocess-use-audit): Detected subprocess function 'run' without a static string. If this data can be controlled by a malicious actor, it may be an instance of command injection. Audit the use of this call to ensure it is not controllable by an external resource. You may consider using 'shlex.escape()'.
Source: opengrep
| ret = subprocess.run( | ||
| combine_cmd, | ||
| cwd=work_dir, | ||
| capture_output=True, | ||
| timeout=60, | ||
| ) |
There was a problem hiding this comment.
security (python.lang.security.audit.dangerous-subprocess-use-audit): Detected subprocess function 'run' without a static string. If this data can be controlled by a malicious actor, it may be an instance of command injection. Audit the use of this call to ensure it is not controllable by an external resource. You may consider using 'shlex.escape()'.
Source: opengrep
There was a problem hiding this comment.
Code Review
This pull request introduces comprehensive cloud-init code coverage collection functionality. It adds new configuration options and command-line arguments for coverage, implements robust logic within the test runner to pull coverage data from VMs, incrementally combine it, and generate HTML/XML reports. The pushd context manager was also improved to handle directory changes more gracefully. Feedback from the review points out a potential deadlock and error handling issue in the _pull_report_dir_from_vm function, suggests logging exceptions in the pushd context manager, notes a redundant _coverage_vm_user function, and identifies a resource leak concerning uncleaned temporary coverage files.
| try: | ||
| os.makedirs(local_dir, exist_ok=True) | ||
| parent = os.path.dirname(remote_dir.rstrip('/')) | ||
| name = os.path.basename(remote_dir.rstrip('/')) | ||
| ssh_cmd = base + ['sudo tar cf - -C {} {}'.format(parent, name)] | ||
| proc = subprocess.Popen( | ||
| ssh_cmd, stdout=subprocess.PIPE, stderr=subprocess.PIPE, env=env | ||
| ) | ||
| extract = subprocess.Popen( | ||
| ['tar', 'xf', '-', '-C', local_dir, '--strip-components=1'], | ||
| stdin=proc.stdout, stdout=subprocess.DEVNULL, stderr=subprocess.PIPE, | ||
| ) | ||
| proc.stdout.close() | ||
| _, err = extract.communicate(timeout=90) | ||
| proc.wait(timeout=10) | ||
| if extract.returncode != 0 and err: | ||
| log.debug("tar extract: %s", err) | ||
| return extract.returncode == 0 | ||
| except Exception as e: | ||
| log.debug("Pull report dir from VM failed: %s", e) | ||
| return False |
There was a problem hiding this comment.
This implementation has two issues:
- Potential deadlock: The
stderrof thesshprocess (proc) is piped but never read. If thesshcommand produces a large amount of output to stderr, its buffer could fill up, causing the process to block indefinitely. - Error handling: The return code of the
sshprocess (proc) is not checked. If thesshcommand fails (e.g.,sudo tarfails), this function will still returnTrueas long as the localtar extractprocess succeeds (which it might, on empty input). Also,proc.stderris never logged.
A more robust implementation should handle proc.stderr and check the return codes of both processes.
| try: | ||
| os.chdir(previous_dir) | ||
| except Exception: | ||
| pass |
There was a problem hiding this comment.
Catching a broad Exception and silently passing is generally discouraged as it can hide unexpected errors. It would be better to at least log the exception to aid in debugging potential issues with changing directories.
| try: | |
| os.chdir(previous_dir) | |
| except Exception: | |
| pass | |
| try: | |
| os.chdir(previous_dir) | |
| except Exception as e: | |
| logging.warning("Failed to change directory to %s: %s", previous_dir, e) | |
| def _coverage_vm_user(params, vm): | ||
| """Return SSH user for coverage operations: VM-specific or params default.""" | ||
| return ( | ||
| getattr(vm, 'vm_username', None) | ||
| or (params.get('VM') or {}).get('username') | ||
| or params.get('remote_user') | ||
| or 'cloud-user' | ||
| ) |
| if vms and not collected: | ||
| os.makedirs(results_dir, exist_ok=True) | ||
| log.info("Cloud-init coverage: no pre-collected files, trying pull from %d remaining VM(s)", len(vms)) | ||
| for i, vm in enumerate(vms): | ||
| try: | ||
| ip = getattr(vm, 'floating_ip', None) | ||
| if not ip: | ||
| log.info("Cloud-init coverage: remaining VM[%d] has no floating_ip, skip", i) | ||
| continue | ||
| u, kf, pw = _get_vm_ssh_auth(params, vm) | ||
| if not kf and not pw: | ||
| log.info("Cloud-init coverage: remaining VM[%d] %s has no keyfile/password, skip", i, ip) | ||
| continue | ||
| local_f = os.path.join( | ||
| pre_collected_dir or tempfile.gettempdir(), | ||
| '.coverage.vm_remaining_{}'.format(i) | ||
| ) | ||
| if pre_collected_dir: | ||
| os.makedirs(pre_collected_dir, exist_ok=True) | ||
| log.info("Cloud-init coverage: pulling from remaining VM %s (auth: keyfile=%s password=%s)", ip, bool(kf), bool(pw)) | ||
| if _pull_coverage_via_ssh(kf, pw, u, ip, local_f, log): | ||
| collected.append(local_f) | ||
| log.info("Cloud-init coverage: collected from remaining VM %s", ip) | ||
| else: | ||
| log.info("Cloud-init coverage: pull from remaining VM %s failed", ip) | ||
| except Exception as e: | ||
| log.info("Cloud-init coverage: pull from VM %s failed: %s", getattr(vm, 'floating_ip', vm), e) |
There was a problem hiding this comment.
When pre_collected_dir is not provided, this function creates temporary coverage files (.coverage.vm_remaining_*) in the system's temporary directory (tempfile.gettempdir()). These files are not cleaned up after they are used for the coverage combine operation, leading to a resource leak.
Consider tracking the paths of these temporary files and deleting them in a finally block to ensure they are always cleaned up, even if errors occur.
Add option --cloud-init-coverage in os-tests.
This function is mainly used to gather code coverage data within test cases or VMs.
When this option is included in the command, code coverage data is collected after the case finishes.
If the option is missing, the process remains unchanged—doing nothing extra—to guarantee that existing functionality remains intact.
The usage example:
$ os-tests -p cloudinit_tier1,cloudinit_tier2,cloud_utils_growpart --filter_by case_tag
--user cloud-user --keyfile /root/.ssh/id_rsa
--platform_profile /home/cloud-user/huzhao/openstack.yaml --result /home/cloud-user/huzhao/debug2/
--cloud-init-coverage
Summary by Sourcery
Add optional cloud-init code coverage collection for os-tests, including per-test data retrieval from VMs and end-of-run aggregation, while hardening result directory and working-directory handling.
New Features:
Enhancements: