test: add script to generate test data from scratch without DockerHub by andaaron · Pull Request #491 · project-zot/zui

andaaron · 2025-11-13T17:35:02Z

Introduce new test data generation system that creates container images from scratch using Podman/Docker/Buildah, eliminating DockerHub dependencies. Supports cross-platform builds and multi-architecture images with embedded CVE data for Trivy scanning.

Add create_test_data.py and image_utils.py
Add cross-platform documentation
Update Makefile to use new script (legacy script still available)

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

codecov · 2025-11-13T17:45:37Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 80.35%. Comparing base (4d28d88) to head (eb02ae3).

Additional details and impacted files

@@           Coverage Diff           @@
##             main     #491   +/-   ##
=======================================
  Coverage   80.35%   80.35%           
=======================================
  Files          62       62           
  Lines        1680     1680           
  Branches      218      218           
=======================================
  Hits         1350     1350           
  Misses        266      266           
  Partials       64       64

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Introduce new test data generation system that creates container images from scratch using Podman/Docker/Buildah, eliminating DockerHub dependencies. Supports cross-platform builds and multi-architecture images with embedded CVE data for Trivy scanning. - Add create_test_data.py and image_utils.py - Add cross-platform documentation - Update Makefile to use new script (legacy script still available) Signed-off-by: Andrei Aaron <andreifdaaron@gmail.com>

Copilot

Pull request overview

This pull request introduces a new test data generation system that creates container images from scratch using Podman/Docker/Buildah, eliminating dependencies on DockerHub. The system supports cross-platform builds (Linux, macOS, Windows via WSL2) and multi-architecture images with embedded CVE data for Trivy scanning.

Changes:

Added create_test_data.py as the main orchestration script with argument parsing and metadata handling
Added image_utils.py as a comprehensive library (1363 lines) containing platform detection, build tool abstraction, root filesystem creation, and image management utilities
Added README_CROSS_PLATFORM.md documenting installation and usage across platforms
Updated Makefile with new create-test-data and clean-test-data targets while preserving the legacy test-data target
Added .github/workflows/test-data-build-tools.yml to test all three build tools (buildah, podman, docker) in CI
Updated .github/workflows/end-to-end-test.yml with version bumps for skopeo (v1.20.0) and Go (1.25.x)

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 19 comments.

Show a summary per file

File	Description
tests/scripts/image_utils.py	Core library providing build tool abstractions, platform detection, root filesystem creation for Alpine/Debian/Ubuntu/generic systems, OCI reference handling, and image creation/signing/scanning functionality
tests/scripts/create_test_data.py	Main script orchestrating image creation with command-line argument parsing, prerequisite verification, and metadata aggregation
tests/scripts/README_CROSS_PLATFORM.md	Comprehensive cross-platform documentation covering installation instructions, usage examples, platform-specific notes, and troubleshooting for Linux/macOS/Windows
Makefile	Added `create-test-data` target that invokes the new script with optional `BUILD_TOOL` parameter, and `clean-test-data` target for cleanup
.github/workflows/test-data-build-tools.yml	New CI workflow that tests the script with all three build tools (buildah, podman, docker) using a matrix strategy
.github/workflows/end-to-end-test.yml	Updated skopeo version from v1.9.0 to v1.20.0 and Go version from 1.22.x to 1.25.x

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-02-03T07:33:31Z

tests/scripts/image_utils.py

+    """
+    return {
+        "org.opencontainers.image.title": image,
+        "org.opencontainers.image.description": f" {description}",


There is an extra leading space in the description value. The f-string should be 'f"{description}"' instead of 'f" {description}"'. This will add an unwanted space at the beginning of every image description in the OCI metadata.

Suggested change

"org.opencontainers.image.description": f" {description}",

"org.opencontainers.image.description": f"{description}",

Copilot · 2026-02-03T07:33:31Z

tests/scripts/image_utils.py

+    if not os.path.isdir(docker_docs_dir):
+        run_command(["git", "clone", "https://github.com/docker-library/docs.git", docker_docs_dir], check=True)


The git clone command at line 170 will fail if the docs directory already exists but is not a git repository, or if there are permission issues. The check at line 169 only verifies if the directory exists but doesn't verify if it's a valid git repository. Consider adding error handling or checking if it's a valid git repo before cloning.

Suggested change

if not os.path.isdir(docker_docs_dir):

run_command(["git", "clone", "https://github.com/docker-library/docs.git", docker_docs_dir], check=True)

if os.path.exists(docker_docs_dir) and not os.path.isdir(docker_docs_dir):

logger.error("Path for docker docs exists but is not a directory: %s", docker_docs_dir)

raise RuntimeError(f"Invalid docker docs path: {docker_docs_dir}")

elif os.path.isdir(docker_docs_dir):

# Directory already exists; optionally verify it's a git repository

git_dir = os.path.join(docker_docs_dir, ".git")

if not os.path.isdir(git_dir):

logger.warning(

"Docker docs directory '%s' exists but does not appear to be a git repository.",

docker_docs_dir,

)

else:

try:

run_command(

["git", "clone", "https://github.com/docker-library/docs.git", docker_docs_dir],

check=True,

)

except subprocess.CalledProcessError as exc:

logger.error("Failed to clone docker docs repository into '%s': %s", docker_docs_dir, exc)

raise

Copilot · 2026-02-03T07:33:32Z

tests/scripts/image_utils.py

+    # Simple pattern replacement (s|old|new|g format)
+    match = re.match(r's\|([^|]+)\|([^|]+)\|g?', pattern)
+    if match:
+        old, new = match.groups()


The sed_inplace function performs string replacement without proper escaping or sanitization. If the 'old' or 'new' values in the pattern contain special characters or come from untrusted input, this could lead to unexpected behavior or security issues. Consider using proper escaping for the replacement strings or validating/sanitizing the pattern input before use.

Suggested change

old, new = match.groups()

old, new = match.groups()

# Validate that the parsed pattern components do not contain potentially unsafe

# characters that could lead to confusing or unintended behavior. We keep this

# conservative to match the simple semantics of this helper.

# Disallow newline characters and backslashes in the replacement parts.

if any(ch in old for ch in ("\n", "\r", "\\")) or any(ch in new for ch in ("\n", "\r", "\\")):

raise ValueError(f"sed_inplace: unsupported or unsafe pattern components in '{pattern}'")

Copilot · 2026-02-03T07:33:32Z

Makefile

+.PHONY: clean-test-data
+clean-test-data:
+	rm -f tests/data/image_metadata.json
+	rm -rf tests/data/images


The clean-test-data target uses 'rm -rf' which could be dangerous if the paths are not properly validated. While the paths are hardcoded in this case (tests/data/images), if these variables are ever made configurable in the future, this could pose a risk. Consider adding safety checks or using more specific cleanup commands.

Copilot · 2026-02-03T07:33:32Z

tests/scripts/create_test_data.py

+    with open(config_file, "r") as f:
+        config = yaml.load(f, Loader=yaml.SafeLoader)


The config file is opened without error handling. If the file doesn't exist, is not readable, or contains invalid YAML, the script will crash with an unhandled exception. Consider adding a try-except block with a more user-friendly error message to help users troubleshoot configuration file issues.

Copilot · 2026-02-03T07:33:35Z

tests/scripts/image_utils.py

+
+        # Build each platform separately and load into Docker daemon
+        for platform in platforms:
+            os_platform, arch, variant = parse_platform(platform)


Variable arch is not used.

Suggested change

os_platform, arch, variant = parse_platform(platform)

parse_platform(platform)

Copilot · 2026-02-03T07:33:35Z

tests/scripts/image_utils.py

+
+        # Build each platform separately and load into Docker daemon
+        for platform in platforms:
+            os_platform, arch, variant = parse_platform(platform)


Variable variant is not used.

Suggested change

os_platform, arch, variant = parse_platform(platform)

os_platform, arch, _ = parse_platform(platform)

Copilot · 2026-02-03T07:33:35Z

tests/scripts/image_utils.py

+            os_platform, arch, variant = parse_platform(platform)
+


This assignment to 'os_platform' is unnecessary as it is redefined before this value is used.

Suggested change

os_platform, arch, variant = parse_platform(platform)

Copilot · 2026-02-03T07:33:36Z

tests/scripts/image_utils.py

+
+        # Build each platform image and push to OCI individually
+        for platform in platforms:
+            os_platform, arch, variant = parse_platform(platform)


This assignment to 'arch' is unnecessary as it is redefined before this value is used.

Suggested change

os_platform, arch, variant = parse_platform(platform)

Copilot · 2026-02-03T07:33:36Z

tests/scripts/image_utils.py

+
+        # Build each platform image and push to OCI individually
+        for platform in platforms:
+            os_platform, arch, variant = parse_platform(platform)


This assignment to 'variant' is unnecessary as it is redefined before this value is used.

Suggested change

os_platform, arch, variant = parse_platform(platform)

andaaron force-pushed the refactor2 branch 6 times, most recently from 6860012 to 74d1cf1 Compare November 18, 2025 22:03

andaaron force-pushed the refactor2 branch 2 times, most recently from 2700712 to 64bee8e Compare January 31, 2026 11:00

andaaron force-pushed the refactor2 branch from 64bee8e to eb02ae3 Compare February 3, 2026 07:25

andaaron requested a review from Copilot February 3, 2026 07:25

Copilot started reviewing on behalf of andaaron February 3, 2026 07:26 View session

Copilot AI reviewed Feb 3, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test: add script to generate test data from scratch without DockerHub#491

test: add script to generate test data from scratch without DockerHub#491
andaaron wants to merge 1 commit intoproject-zot:mainfrom
andaaron:refactor2

andaaron commented Nov 13, 2025

Uh oh!

codecov bot commented Nov 13, 2025 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Feb 3, 2026

Uh oh!

Copilot AI Feb 3, 2026

Uh oh!

Copilot AI Feb 3, 2026

Uh oh!

Copilot AI Feb 3, 2026

Uh oh!

Copilot AI Feb 3, 2026

Uh oh!

Copilot AI Feb 3, 2026

Uh oh!

Copilot AI Feb 3, 2026

Uh oh!

Copilot AI Feb 3, 2026

Uh oh!

Copilot AI Feb 3, 2026

Uh oh!

Copilot AI Feb 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	"org.opencontainers.image.description": f" {description}",
	"org.opencontainers.image.description": f"{description}",

		if not os.path.isdir(docker_docs_dir):
		run_command(["git", "clone", "https://github.com/docker-library/docs.git", docker_docs_dir], check=True)

-    if not os.path.isdir(docker_docs_dir):
-        run_command(["git", "clone", "https://github.com/docker-library/docs.git", docker_docs_dir], check=True)
+    if os.path.exists(docker_docs_dir) and not os.path.isdir(docker_docs_dir):
+        logger.error("Path for docker docs exists but is not a directory: %s", docker_docs_dir)
+        raise RuntimeError(f"Invalid docker docs path: {docker_docs_dir}")
+    elif os.path.isdir(docker_docs_dir):
+        # Directory already exists; optionally verify it's a git repository
+        git_dir = os.path.join(docker_docs_dir, ".git")
+        if not os.path.isdir(git_dir):
+            logger.warning(
+                "Docker docs directory '%s' exists but does not appear to be a git repository.",
+                docker_docs_dir,
+            )
+    else:
+        try:
+            run_command(
+                ["git", "clone", "https://github.com/docker-library/docs.git", docker_docs_dir],
+                check=True,
+            )
+        except subprocess.CalledProcessError as exc:
+            logger.error("Failed to clone docker docs repository into '%s': %s", docker_docs_dir, exc)
+            raise

-        old, new = match.groups()
+        old, new = match.groups()
+        # Validate that the parsed pattern components do not contain potentially unsafe
+        # characters that could lead to confusing or unintended behavior. We keep this
+        # conservative to match the simple semantics of this helper.
+        # Disallow newline characters and backslashes in the replacement parts.
+        if any(ch in old for ch in ("\n", "\r", "\\")) or any(ch in new for ch in ("\n", "\r", "\\")):
+            raise ValueError(f"sed_inplace: unsupported or unsafe pattern components in '{pattern}'")

		with open(config_file, "r") as f:
		config = yaml.load(f, Loader=yaml.SafeLoader)

	os_platform, arch, variant = parse_platform(platform)
	parse_platform(platform)

	os_platform, arch, variant = parse_platform(platform)
	os_platform, arch, _ = parse_platform(platform)

Conversation

andaaron commented Nov 13, 2025

Uh oh!

codecov bot commented Nov 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Feb 3, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 3, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 3, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 3, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 3, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 3, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 3, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 3, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 3, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 3, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

codecov bot commented Nov 13, 2025 •

edited

Loading