Skip to content

test: add script to generate test data from scratch without DockerHub#491

Draft
andaaron wants to merge 1 commit intoproject-zot:mainfrom
andaaron:refactor2
Draft

test: add script to generate test data from scratch without DockerHub#491
andaaron wants to merge 1 commit intoproject-zot:mainfrom
andaaron:refactor2

Conversation

@andaaron
Copy link
Contributor

Introduce new test data generation system that creates container images from scratch using Podman/Docker/Buildah, eliminating DockerHub dependencies. Supports cross-platform builds and multi-architecture images with embedded CVE data for Trivy scanning.

  • Add create_test_data.py and image_utils.py
  • Add cross-platform documentation
  • Update Makefile to use new script (legacy script still available)

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@codecov
Copy link

codecov bot commented Nov 13, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 80.35%. Comparing base (4d28d88) to head (eb02ae3).

Additional details and impacted files
@@           Coverage Diff           @@
##             main     #491   +/-   ##
=======================================
  Coverage   80.35%   80.35%           
=======================================
  Files          62       62           
  Lines        1680     1680           
  Branches      218      218           
=======================================
  Hits         1350     1350           
  Misses        266      266           
  Partials       64       64           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@andaaron andaaron force-pushed the refactor2 branch 6 times, most recently from 6860012 to 74d1cf1 Compare November 18, 2025 22:03
@andaaron andaaron force-pushed the refactor2 branch 2 times, most recently from 2700712 to 64bee8e Compare January 31, 2026 11:00
Introduce new test data generation system that creates container images
from scratch using Podman/Docker/Buildah, eliminating DockerHub dependencies.
Supports cross-platform builds and multi-architecture images with embedded
CVE data for Trivy scanning.

- Add create_test_data.py and image_utils.py
- Add cross-platform documentation
- Update Makefile to use new script (legacy script still available)

Signed-off-by: Andrei Aaron <andreifdaaron@gmail.com>
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request introduces a new test data generation system that creates container images from scratch using Podman/Docker/Buildah, eliminating dependencies on DockerHub. The system supports cross-platform builds (Linux, macOS, Windows via WSL2) and multi-architecture images with embedded CVE data for Trivy scanning.

Changes:

  • Added create_test_data.py as the main orchestration script with argument parsing and metadata handling
  • Added image_utils.py as a comprehensive library (1363 lines) containing platform detection, build tool abstraction, root filesystem creation, and image management utilities
  • Added README_CROSS_PLATFORM.md documenting installation and usage across platforms
  • Updated Makefile with new create-test-data and clean-test-data targets while preserving the legacy test-data target
  • Added .github/workflows/test-data-build-tools.yml to test all three build tools (buildah, podman, docker) in CI
  • Updated .github/workflows/end-to-end-test.yml with version bumps for skopeo (v1.20.0) and Go (1.25.x)

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 19 comments.

Show a summary per file
File Description
tests/scripts/image_utils.py Core library providing build tool abstractions, platform detection, root filesystem creation for Alpine/Debian/Ubuntu/generic systems, OCI reference handling, and image creation/signing/scanning functionality
tests/scripts/create_test_data.py Main script orchestrating image creation with command-line argument parsing, prerequisite verification, and metadata aggregation
tests/scripts/README_CROSS_PLATFORM.md Comprehensive cross-platform documentation covering installation instructions, usage examples, platform-specific notes, and troubleshooting for Linux/macOS/Windows
Makefile Added create-test-data target that invokes the new script with optional BUILD_TOOL parameter, and clean-test-data target for cleanup
.github/workflows/test-data-build-tools.yml New CI workflow that tests the script with all three build tools (buildah, podman, docker) using a matrix strategy
.github/workflows/end-to-end-test.yml Updated skopeo version from v1.9.0 to v1.20.0 and Go version from 1.22.x to 1.25.x

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

"""
return {
"org.opencontainers.image.title": image,
"org.opencontainers.image.description": f" {description}",
Copy link

Copilot AI Feb 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is an extra leading space in the description value. The f-string should be 'f"{description}"' instead of 'f" {description}"'. This will add an unwanted space at the beginning of every image description in the OCI metadata.

Suggested change
"org.opencontainers.image.description": f" {description}",
"org.opencontainers.image.description": f"{description}",

Copilot uses AI. Check for mistakes.
Comment on lines +169 to +170
if not os.path.isdir(docker_docs_dir):
run_command(["git", "clone", "https://github.com/docker-library/docs.git", docker_docs_dir], check=True)
Copy link

Copilot AI Feb 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The git clone command at line 170 will fail if the docs directory already exists but is not a git repository, or if there are permission issues. The check at line 169 only verifies if the directory exists but doesn't verify if it's a valid git repository. Consider adding error handling or checking if it's a valid git repo before cloning.

Suggested change
if not os.path.isdir(docker_docs_dir):
run_command(["git", "clone", "https://github.com/docker-library/docs.git", docker_docs_dir], check=True)
if os.path.exists(docker_docs_dir) and not os.path.isdir(docker_docs_dir):
logger.error("Path for docker docs exists but is not a directory: %s", docker_docs_dir)
raise RuntimeError(f"Invalid docker docs path: {docker_docs_dir}")
elif os.path.isdir(docker_docs_dir):
# Directory already exists; optionally verify it's a git repository
git_dir = os.path.join(docker_docs_dir, ".git")
if not os.path.isdir(git_dir):
logger.warning(
"Docker docs directory '%s' exists but does not appear to be a git repository.",
docker_docs_dir,
)
else:
try:
run_command(
["git", "clone", "https://github.com/docker-library/docs.git", docker_docs_dir],
check=True,
)
except subprocess.CalledProcessError as exc:
logger.error("Failed to clone docker docs repository into '%s': %s", docker_docs_dir, exc)
raise

Copilot uses AI. Check for mistakes.
# Simple pattern replacement (s|old|new|g format)
match = re.match(r's\|([^|]+)\|([^|]+)\|g?', pattern)
if match:
old, new = match.groups()
Copy link

Copilot AI Feb 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The sed_inplace function performs string replacement without proper escaping or sanitization. If the 'old' or 'new' values in the pattern contain special characters or come from untrusted input, this could lead to unexpected behavior or security issues. Consider using proper escaping for the replacement strings or validating/sanitizing the pattern input before use.

Suggested change
old, new = match.groups()
old, new = match.groups()
# Validate that the parsed pattern components do not contain potentially unsafe
# characters that could lead to confusing or unintended behavior. We keep this
# conservative to match the simple semantics of this helper.
# Disallow newline characters and backslashes in the replacement parts.
if any(ch in old for ch in ("\n", "\r", "\\")) or any(ch in new for ch in ("\n", "\r", "\\")):
raise ValueError(f"sed_inplace: unsupported or unsafe pattern components in '{pattern}'")

Copilot uses AI. Check for mistakes.
Comment on lines +48 to +51
.PHONY: clean-test-data
clean-test-data:
rm -f tests/data/image_metadata.json
rm -rf tests/data/images
Copy link

Copilot AI Feb 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The clean-test-data target uses 'rm -rf' which could be dangerous if the paths are not properly validated. While the paths are hardcoded in this case (tests/data/images), if these variables are ever made configurable in the future, this could pose a risk. Consider adding safety checks or using more specific cleanup commands.

Copilot uses AI. Check for mistakes.
Comment on lines +91 to +92
with open(config_file, "r") as f:
config = yaml.load(f, Loader=yaml.SafeLoader)
Copy link

Copilot AI Feb 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The config file is opened without error handling. If the file doesn't exist, is not readable, or contains invalid YAML, the script will crash with an unhandled exception. Consider adding a try-except block with a more user-friendly error message to help users troubleshoot configuration file issues.

Copilot uses AI. Check for mistakes.

# Build each platform separately and load into Docker daemon
for platform in platforms:
os_platform, arch, variant = parse_platform(platform)
Copy link

Copilot AI Feb 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Variable arch is not used.

Suggested change
os_platform, arch, variant = parse_platform(platform)
parse_platform(platform)

Copilot uses AI. Check for mistakes.

# Build each platform separately and load into Docker daemon
for platform in platforms:
os_platform, arch, variant = parse_platform(platform)
Copy link

Copilot AI Feb 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Variable variant is not used.

Suggested change
os_platform, arch, variant = parse_platform(platform)
os_platform, arch, _ = parse_platform(platform)

Copilot uses AI. Check for mistakes.
Comment on lines +950 to +951
os_platform, arch, variant = parse_platform(platform)

Copy link

Copilot AI Feb 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This assignment to 'os_platform' is unnecessary as it is redefined before this value is used.

Suggested change
os_platform, arch, variant = parse_platform(platform)

Copilot uses AI. Check for mistakes.

# Build each platform image and push to OCI individually
for platform in platforms:
os_platform, arch, variant = parse_platform(platform)
Copy link

Copilot AI Feb 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This assignment to 'arch' is unnecessary as it is redefined before this value is used.

Suggested change
os_platform, arch, variant = parse_platform(platform)

Copilot uses AI. Check for mistakes.

# Build each platform image and push to OCI individually
for platform in platforms:
os_platform, arch, variant = parse_platform(platform)
Copy link

Copilot AI Feb 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This assignment to 'variant' is unnecessary as it is redefined before this value is used.

Suggested change
os_platform, arch, variant = parse_platform(platform)

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants