test: add script to generate test data from scratch without DockerHub#491
test: add script to generate test data from scratch without DockerHub#491andaaron wants to merge 1 commit intoproject-zot:mainfrom
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #491 +/- ##
=======================================
Coverage 80.35% 80.35%
=======================================
Files 62 62
Lines 1680 1680
Branches 218 218
=======================================
Hits 1350 1350
Misses 266 266
Partials 64 64 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
6860012 to
74d1cf1
Compare
2700712 to
64bee8e
Compare
Introduce new test data generation system that creates container images from scratch using Podman/Docker/Buildah, eliminating DockerHub dependencies. Supports cross-platform builds and multi-architecture images with embedded CVE data for Trivy scanning. - Add create_test_data.py and image_utils.py - Add cross-platform documentation - Update Makefile to use new script (legacy script still available) Signed-off-by: Andrei Aaron <andreifdaaron@gmail.com>
There was a problem hiding this comment.
Pull request overview
This pull request introduces a new test data generation system that creates container images from scratch using Podman/Docker/Buildah, eliminating dependencies on DockerHub. The system supports cross-platform builds (Linux, macOS, Windows via WSL2) and multi-architecture images with embedded CVE data for Trivy scanning.
Changes:
- Added
create_test_data.pyas the main orchestration script with argument parsing and metadata handling - Added
image_utils.pyas a comprehensive library (1363 lines) containing platform detection, build tool abstraction, root filesystem creation, and image management utilities - Added
README_CROSS_PLATFORM.mddocumenting installation and usage across platforms - Updated
Makefilewith newcreate-test-dataandclean-test-datatargets while preserving the legacytest-datatarget - Added
.github/workflows/test-data-build-tools.ymlto test all three build tools (buildah, podman, docker) in CI - Updated
.github/workflows/end-to-end-test.ymlwith version bumps for skopeo (v1.20.0) and Go (1.25.x)
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 19 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/scripts/image_utils.py | Core library providing build tool abstractions, platform detection, root filesystem creation for Alpine/Debian/Ubuntu/generic systems, OCI reference handling, and image creation/signing/scanning functionality |
| tests/scripts/create_test_data.py | Main script orchestrating image creation with command-line argument parsing, prerequisite verification, and metadata aggregation |
| tests/scripts/README_CROSS_PLATFORM.md | Comprehensive cross-platform documentation covering installation instructions, usage examples, platform-specific notes, and troubleshooting for Linux/macOS/Windows |
| Makefile | Added create-test-data target that invokes the new script with optional BUILD_TOOL parameter, and clean-test-data target for cleanup |
| .github/workflows/test-data-build-tools.yml | New CI workflow that tests the script with all three build tools (buildah, podman, docker) using a matrix strategy |
| .github/workflows/end-to-end-test.yml | Updated skopeo version from v1.9.0 to v1.20.0 and Go version from 1.22.x to 1.25.x |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| """ | ||
| return { | ||
| "org.opencontainers.image.title": image, | ||
| "org.opencontainers.image.description": f" {description}", |
There was a problem hiding this comment.
There is an extra leading space in the description value. The f-string should be 'f"{description}"' instead of 'f" {description}"'. This will add an unwanted space at the beginning of every image description in the OCI metadata.
| "org.opencontainers.image.description": f" {description}", | |
| "org.opencontainers.image.description": f"{description}", |
| if not os.path.isdir(docker_docs_dir): | ||
| run_command(["git", "clone", "https://github.com/docker-library/docs.git", docker_docs_dir], check=True) |
There was a problem hiding this comment.
The git clone command at line 170 will fail if the docs directory already exists but is not a git repository, or if there are permission issues. The check at line 169 only verifies if the directory exists but doesn't verify if it's a valid git repository. Consider adding error handling or checking if it's a valid git repo before cloning.
| if not os.path.isdir(docker_docs_dir): | |
| run_command(["git", "clone", "https://github.com/docker-library/docs.git", docker_docs_dir], check=True) | |
| if os.path.exists(docker_docs_dir) and not os.path.isdir(docker_docs_dir): | |
| logger.error("Path for docker docs exists but is not a directory: %s", docker_docs_dir) | |
| raise RuntimeError(f"Invalid docker docs path: {docker_docs_dir}") | |
| elif os.path.isdir(docker_docs_dir): | |
| # Directory already exists; optionally verify it's a git repository | |
| git_dir = os.path.join(docker_docs_dir, ".git") | |
| if not os.path.isdir(git_dir): | |
| logger.warning( | |
| "Docker docs directory '%s' exists but does not appear to be a git repository.", | |
| docker_docs_dir, | |
| ) | |
| else: | |
| try: | |
| run_command( | |
| ["git", "clone", "https://github.com/docker-library/docs.git", docker_docs_dir], | |
| check=True, | |
| ) | |
| except subprocess.CalledProcessError as exc: | |
| logger.error("Failed to clone docker docs repository into '%s': %s", docker_docs_dir, exc) | |
| raise |
| # Simple pattern replacement (s|old|new|g format) | ||
| match = re.match(r's\|([^|]+)\|([^|]+)\|g?', pattern) | ||
| if match: | ||
| old, new = match.groups() |
There was a problem hiding this comment.
The sed_inplace function performs string replacement without proper escaping or sanitization. If the 'old' or 'new' values in the pattern contain special characters or come from untrusted input, this could lead to unexpected behavior or security issues. Consider using proper escaping for the replacement strings or validating/sanitizing the pattern input before use.
| old, new = match.groups() | |
| old, new = match.groups() | |
| # Validate that the parsed pattern components do not contain potentially unsafe | |
| # characters that could lead to confusing or unintended behavior. We keep this | |
| # conservative to match the simple semantics of this helper. | |
| # Disallow newline characters and backslashes in the replacement parts. | |
| if any(ch in old for ch in ("\n", "\r", "\\")) or any(ch in new for ch in ("\n", "\r", "\\")): | |
| raise ValueError(f"sed_inplace: unsupported or unsafe pattern components in '{pattern}'") |
| .PHONY: clean-test-data | ||
| clean-test-data: | ||
| rm -f tests/data/image_metadata.json | ||
| rm -rf tests/data/images |
There was a problem hiding this comment.
The clean-test-data target uses 'rm -rf' which could be dangerous if the paths are not properly validated. While the paths are hardcoded in this case (tests/data/images), if these variables are ever made configurable in the future, this could pose a risk. Consider adding safety checks or using more specific cleanup commands.
| with open(config_file, "r") as f: | ||
| config = yaml.load(f, Loader=yaml.SafeLoader) |
There was a problem hiding this comment.
The config file is opened without error handling. If the file doesn't exist, is not readable, or contains invalid YAML, the script will crash with an unhandled exception. Consider adding a try-except block with a more user-friendly error message to help users troubleshoot configuration file issues.
|
|
||
| # Build each platform separately and load into Docker daemon | ||
| for platform in platforms: | ||
| os_platform, arch, variant = parse_platform(platform) |
There was a problem hiding this comment.
Variable arch is not used.
| os_platform, arch, variant = parse_platform(platform) | |
| parse_platform(platform) |
|
|
||
| # Build each platform separately and load into Docker daemon | ||
| for platform in platforms: | ||
| os_platform, arch, variant = parse_platform(platform) |
There was a problem hiding this comment.
Variable variant is not used.
| os_platform, arch, variant = parse_platform(platform) | |
| os_platform, arch, _ = parse_platform(platform) |
| os_platform, arch, variant = parse_platform(platform) | ||
|
|
|
|
||
| # Build each platform image and push to OCI individually | ||
| for platform in platforms: | ||
| os_platform, arch, variant = parse_platform(platform) |
|
|
||
| # Build each platform image and push to OCI individually | ||
| for platform in platforms: | ||
| os_platform, arch, variant = parse_platform(platform) |
Introduce new test data generation system that creates container images from scratch using Podman/Docker/Buildah, eliminating DockerHub dependencies. Supports cross-platform builds and multi-architecture images with embedded CVE data for Trivy scanning.
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.