-
Notifications
You must be signed in to change notification settings - Fork 13k
chore(ci): track docker image size changes #37527
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Looks like this PR is ready to merge! 🎉 |
|
WalkthroughThe PR replaces digest-based artifact storage with manifest-based JSON files in the Docker build pipeline, adds a new image size tracking workflow that measures and compares Docker image sizes across services against baselines and historical data, and integrates this tracking into the CI workflow. Changes
Sequence Diagram(s)sequenceDiagram
participant CI as CI Workflow
participant BDA as Build Docker Action
participant SRT as Size Tracker Action
participant GH as GitHub API
CI->>BDA: Build & Push Images
BDA->>BDA: Inspect Manifest for Each Arch
BDA->>BDA: Save manifest.json to /tmp/manifests
BDA-->>CI: Upload manifests artifact
CI->>SRT: Download manifests artifact
SRT->>SRT: Parse manifest.json files
SRT->>SRT: Calculate per-service sizes
SRT->>SRT: Fetch baseline sizes (if exists)
SRT->>SRT: Load 30-day history
SRT->>SRT: Generate comparison report
SRT->>SRT: Create Mermaid trend chart
SRT->>GH: Comment/Update PR with report
SRT-->>CI: Output size metrics
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes
Possibly related PRs
Suggested labels
Suggested reviewers
Poem
Pre-merge checks and finishing touches✅ Passed checks (3 passed)
✨ Finishing touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
066ad73 to
40778f9
Compare
b3c3103 to
d272b19
Compare
d272b19 to
59da571
Compare
8f76869 to
dbf9442
Compare
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## develop #37527 +/- ##
===========================================
+ Coverage 68.97% 68.99% +0.02%
===========================================
Files 3358 3358
Lines 114240 114241 +1
Branches 20537 20537
===========================================
+ Hits 78792 78817 +25
+ Misses 33359 33335 -24
Partials 2089 2089
Flags with carried forward coverage won't be shown. Click here to find out more. 🚀 New features to boost your workflow:
|
09a696f to
7084c5b
Compare
7084c5b to
2ec7e92
Compare
5109646 to
5441568
Compare
5441568 to
b767474
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
🧹 Nitpick comments (1)
.github/actions/docker-image-size-tracker/action.yml (1)
92-109: Use jq to generate JSON instead of manual echo concatenation.The manual string concatenation for JSON generation (lines 92–109) is fragile and error-prone. If service names contain quotes or special characters, output formatting could break. Additionally, the comma-insertion logic (lines 98–106) is duplicated in the baseline measurement step (lines 187–195).
Use jq to construct JSON from the associative array to ensure valid output and reduce duplication:
# After the loop populating sizes array, build JSON with jq jq -n \ --arg timestamp "$(date -u +%Y-%m-%dT%H:%M:%SZ)" \ --arg tag "${{ inputs.tag }}" \ --argjson total "$total" \ --argjson services "$(printf '%s\n' "${!sizes[@]}" | jq -R . | jq -s 'map(. as $k | {($k): '"'"'${sizes[$k]}'"'"'}) | add')" \ '{timestamp: $timestamp, tag: $tag, total: $total, services: $services}' > current-sizes.jsonAlternatively, use printf-based JSON generation with proper escaping.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
Disabled knowledge base sources:
- Jira integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (3)
.github/actions/build-docker/action.yml(1 hunks).github/actions/docker-image-size-tracker/action.yml(1 hunks).github/workflows/ci.yml(6 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
- GitHub Check: 🔨 Test API (EE) / MongoDB 8.2 coverage (1/1)
- GitHub Check: 🔨 Test API (EE) / MongoDB 8.2 coverage (1/1)
- GitHub Check: 🔨 Test API (EE) / MongoDB 8.2 coverage (1/1)
- GitHub Check: 🔨 Test API (EE) / MongoDB 8.2 coverage (1/1)
- GitHub Check: 🔨 Test API (EE) / MongoDB 8.2 coverage (1/1)
- GitHub Check: 🔨 Test API (EE) / MongoDB 8.2 coverage (1/1)
- GitHub Check: 🔨 Test API (EE) / MongoDB 8.2 coverage (1/1)
- GitHub Check: 🔨 Test API (EE) / MongoDB 8.2 coverage (1/1)
- GitHub Check: 🔨 Test API (EE) / MongoDB 8.2 coverage (1/1)
- GitHub Check: 🔨 Test API (EE) / MongoDB 8.2 coverage (1/1)
🔇 Additional comments (4)
.github/actions/docker-image-size-tracker/action.yml (1)
149-173: Clarify manifest inspection format for baseline vs. current measurements.The baseline step inspects published images without
-vflag (line 149), while the current measurement (build-docker action) uses-v. These produce different output structures:
docker manifest inspect→ manifest list or SchemaV2Manifestdocker manifest inspect -v→ SchemaV2Manifest with full size detailsConfirm that the parsing logic correctly handles both formats and that the jq queries at lines 159–160 and 169–170 extract sizes from the non-verbose output correctly.
.github/workflows/ci.yml (2)
440-460: Track-image-sizes job placement and dependency chain looks correct.The new job is positioned after build-gh-docker-publish, runs on PR and develop branches, has appropriate permissions (pull-requests:write, contents:write for git operations), and receives needed context from release-versions. Configuration appears sound, assuming upstream jobs produce the expected manifests artifacts.
415-419: Now I need to find where these manifest.json files are created in the workflow.Now let me search for where the manifests are created in the workflow:
Based on my investigation, I've confirmed that the
.Descriptorand.Reffields do exist indocker manifest inspect --verboseoutput for manifest lists. The jq query structure on line 418 is technically valid for that format.However, I cannot definitively verify the actual manifest.json structure saved in your CI workflow, as it's generated at runtime and not present in the repository. The review comment's core concern—ensuring the saved manifest.json matches the jq query expectations—remains valid but unconfirmed.
The jq query at line 418 assumes manifest.json has
.Descriptor.digestand.Reffields.To resolve this, verify:
- What action/script generates the manifest.json files (search for
upload-artifactwith "manifests-*" pattern in your workflow)- Confirm the actual structure by logging a sample manifest.json during a workflow run
- If the structure doesn't match
.Descriptor.digestand.Ref, the query will fail with a jq error.github/actions/build-docker/action.yml (1)
118-129: Original review concern is invalid—downstream parsing is compatible with manifest structure.Verification confirms that the manifest extraction and downstream parsing are correctly aligned:
- Lines 75-76 correctly parse
.SchemaV2Manifest.config.sizeand.SchemaV2Manifest.layers[]?.sizefrom the saved manifest.json- The manifest saved at line 126 (build-docker action) contains this exact SchemaV2Manifest structure from
docker manifest inspect -v- The meta.json extraction at line 119 produces the correct format for
docker buildx bake --metadata-fileoutput- All jq queries use defensive fallbacks (
// 0) to handle missing fieldsNo fixes required—the code is functioning as intended.
| jq -s '.' /tmp/history-worktree/history/*.json 2>/dev/null | jq ' | ||
| sort_by(.timestamp) | | ||
| group_by(.timestamp[0:16]) | | ||
| map(.[-1]) | | ||
| .[-30:] | ||
| ' > history-data.json || echo "[]" > history-data.json |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
Verify timestamp grouping for historical trend data.
The jq query groups by timestamp[0:16] (YYYY-MM-DDTHH), which keeps one measurement per hour-of-day, not one per calendar day. If multiple builds run on the same day across different hours, you may get multiple entries for a single day. Did you intend timestamp[0:10] (YYYY-MM-DD) for one per calendar day?
🏁 Script executed:
# Example to verify current behavior:
echo '[
{"timestamp": "2025-11-16T10:00:00Z"},
{"timestamp": "2025-11-16T15:00:00Z"},
{"timestamp": "2025-11-17T10:00:00Z"}
]' | jq 'group_by(.timestamp[0:16]) | map(.[-1])'
# Currently keeps 2 entries (10:00 and 15:00 on Nov 16)
# With [0:10] would keep 1 entry per dayLength of output: 149
Confirm grouping by hour, not calendar day—clarify intended behavior and consider changing to [0:10].
The verification confirms the current behavior: grouping by timestamp[0:16] creates separate groups for different hours on the same day. On 2025-11-16 at 10:00 and 15:00, both entries are retained (not merged into a single daily entry). If the intent is truly one measurement per calendar day, change the grouping to timestamp[0:10] (YYYY-MM-DD). If hourly tracking is intentional, update the comment or documentation to clarify this.
🤖 Prompt for AI Agents
.github/actions/docker-image-size-tracker/action.yml around lines 237 to 242:
the jq grouping currently uses timestamp[0:16] which groups by year-month-day
and hour (so multiple entries from different hours on the same day are kept); if
you want one measurement per calendar day change the grouping key to
timestamp[0:10] (YYYY-MM-DD), otherwise leave the key but update the surrounding
comment/docs to explicitly state hourly grouping is intentional; implement the
change by editing the jq group_by slice accordingly and update the comment in
this file or repository README to reflect the intended granularity.
| cat > report.md << 'EOF' | ||
| # 📦 Docker Image Size Report | ||
|
|
||
| **Status:** First measurement - no baseline for comparison | ||
|
|
||
| **Total Size:** $(numfmt --to=iec-i --suffix=B $current_total) | ||
| EOF | ||
| exit 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shell variables won't interpolate in single-quoted heredoc.
Line 277 uses a single-quoted heredoc (<< 'EOF'), which prevents variable expansion. The $current_total and numfmt command on line 282 will render as literal text in the generated report, not as computed values. Use double quotes for the heredoc delimiter instead:
cat > report.md << 'EOF'
# 📦 Docker Image Size Report
**Status:** First measurement - no baseline for comparison
- **Total Size:** $(numfmt --to=iec-i --suffix=B $current_total)
+ **Total Size:** $(numfmt --to=iec-i --suffix=B $current_total 2>/dev/null || echo "${current_total} bytes")
EOFActually, switch the heredoc to allow variable expansion:
- cat > report.md << 'EOF'
+ cat > report.md << EOF
# 📦 Docker Image Size Report
**Status:** First measurement - no baseline for comparison
**Total Size:** $(numfmt --to=iec-i --suffix=B $current_total 2>/dev/null || echo "${current_total} bytes")
EOF📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| cat > report.md << 'EOF' | |
| # 📦 Docker Image Size Report | |
| **Status:** First measurement - no baseline for comparison | |
| **Total Size:** $(numfmt --to=iec-i --suffix=B $current_total) | |
| EOF | |
| exit 0 | |
| cat > report.md << EOF | |
| # 📦 Docker Image Size Report | |
| **Status:** First measurement - no baseline for comparison | |
| **Total Size:** $(numfmt --to=iec-i --suffix=B $current_total 2>/dev/null || echo "${current_total} bytes") | |
| EOF | |
| exit 0 |
🤖 Prompt for AI Agents
.github/actions/docker-image-size-tracker/action.yml around lines 277 to 284:
the heredoc is quoted (<< 'EOF') which prevents shell variable and command
substitution so $current_total and the numfmt output will be written literally;
change the heredoc delimiter to an unquoted form (e.g., <<EOF) so variables and
command substitutions are expanded when writing report.md, and keep the rest of
the content unchanged.
| # Process historical data | ||
| while IFS= read -r line; do | ||
| timestamp=$(echo "$line" | jq -r '.timestamp') | ||
| date_label=$(date -d "$timestamp" +"%m/%d %H:%M") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
date -d is not portable across Unix systems.
Line 388 uses date -d "$timestamp", which is a GNU extension not available on macOS or BSD. For cross-platform compatibility, consider using date -f or gdate if available, or parse the timestamp differently:
# Option 1: Parse ISO8601 timestamp portably (truncate and use as-is)
# date_label=$(echo "$timestamp" | sed 's/T/ /; s/Z//')
# Option 2: Use gdate if available, fall back to date
# date_label=$((gdate || date) -d "$timestamp" +"%m/%d %H:%M" 2>/dev/null || echo "$timestamp")
# Option 3: Use bash string manipulation (most portable)
# date_label="${timestamp:5:2}/${timestamp:8:2} ${timestamp:11:5}"I'll verify the review comment by examining the file context and confirming the portability claim.
Additionally, let me search for information about date command portability:
🏁 Script executed:
#!/bin/bash
# Examine the action.yml file around line 388 to understand context
cat .github/actions/docker-image-size-tracker/action.yml | head -400 | tail -50Length of output: 1884
🏁 Script executed:
#!/bin/bash
# Also search for date command usage in the entire file
rg -n "date" .github/actions/docker-image-size-tracker/action.yml -C 5Length of output: 3840
🏁 Script executed:
#!/bin/bash
# Check if this is inside a run script or shell context
rg -n "run:" .github/actions/docker-image-size-tracker/action.yml -A 20 | head -100Length of output: 4073
🏁 Script executed:
#!/bin/bash
# Find what shell is being used in this action
rg -n "shell:" .github/actions/docker-image-size-tracker/action.ymlLength of output: 152
🏁 Script executed:
#!/bin/bash
# Check the entire structure around docker/container references
rg -n "runs:" .github/actions/docker-image-size-tracker/action.yml -A 10Length of output: 287
🏁 Script executed:
#!/bin/bash
# List the first 50 lines to see if this is composite action or other type
head -50 .github/actions/docker-image-size-tracker/action.ymlLength of output: 1335
🌐 Web query:
date -d GNU extension portability macOS BSD
💡 Result:
Short answer
- date -d is a GNU coreutils extension; BSD/macOS date does not support -d and is not portable. [1][2][5]
Portability options (pick one)
- Use BSD/macOS date syntax (portable on BSD & macOS):
- relative: date -v -1d (yesterday). [2]
- parse string → epoch: date -j -f '%Y-%m-%d %H:%M:%S' '2022-03-09 12:34:56' +%s. [6]
- Install GNU coreutils on macOS and call gdate (brew install coreutils → gdate behaves like date -d). [1][2]
- Use a language runtime (python3/perl/awk) for robust, portable date math and parsing (recommended for scripts that must run everywhere). [0][6]
Common translations
- GNU: date -d 'yesterday' +%Y-%m-%d
BSD/macOS: date -v -1d +%Y-%m-%d or use gdate -d 'yesterday' if coreutils installed. [1][2][0]
Notes
- Avoid hacks like subtracting 86400 seconds (breaks around DST). Prefer BSD -v, gdate, or language libraries for correctness. [0][1][6]
Sources
[1] GNU vs BSD date differences / installing coreutils on macOS.
[2] macOS/BSD date -v and examples.
[5] Stack Overflow: macOS uses BSD date (no -d), use -j -f / -v.
[6] Examples converting/parsing timestamps with BSD and GNU date.
🌐 Web query:
POSIX date command portable alternative to date -d
💡 Result:
Short answer: POSIX date has no portable equivalent to GNU date -d. Use one of these portable approaches instead:
-
Use shell/builtin epoch formatting (bash printf with %(...)T) when available. Example: printf "%(%F)T\n" "$(( $(date +%s) + 86400 ))". [1]
-
Do arithmetic on seconds since epoch (date +%s or
$EPOCHSECONDS) and convert with strftime via awk/gawk or printf/strftime bindings. Example pattern: seconds=$ ((now + 86400)); gawk 'BEGIN{print strftime("%Y-%m-%d", seconds)}' (or use awk implementations with strftime). [1][2] -
Use a small portable script in Perl/Python to parse/format relative dates (works on systems without GNU date). Example: perl -MTime::Local -e 'print scalar localtime(time + 86400),"\n"'. Or Python: python3 -c 'import time,datetime; print((datetime.datetime.now()+datetime.timedelta(days=1)).strftime("%F"))'. [3][4]
-
Detect platform and fall back to GNU date when installed (gdate on macOS via coreutils) or implement OS-specific date flags. Many projects use an if uname=Darwin then use BSD date -v +1d else GNU date -d. [5]
References:
[1] Shell tips — date differences and portable workarounds.
[2] Full Bash FAQ — using epoch seconds and conversions.
[3] Stack Overflow — awk/perl/python alternatives for date -d.
[4] GNU coreutils / macOS discussion re: gdate and BSD date.
date -d is not portable and will fail on macOS/BSD or non-GNU self-hosted runners.
Line 388 uses date -d "$timestamp", which is a GNU coreutils extension unavailable on macOS, BSD, or self-hosted runners without GNU date. Although this works on standard GitHub Actions Linux runners, the composite action has no platform constraints and could fail elsewhere.
Since the timestamp is ISO8601 format (YYYY-MM-DDTHH:MM:SSZ from line 93), use platform-agnostic bash string manipulation:
# Parse ISO8601 timestamp without external date command
date_label="${timestamp:5:2}/${timestamp:8:2} ${timestamp:11:5}"This extracts month/day and time directly from the string, eliminating the portability issue entirely.
🤖 Prompt for AI Agents
.github/actions/docker-image-size-tracker/action.yml around line 388: the script
uses GNU-specific `date -d "$timestamp"` which is not portable to macOS/BSD or
non-GNU runners; replace that call with bash string slicing to parse the ISO8601
timestamp (YYYY-MM-DDTHH:MM:SSZ) directly and build the label as MM/DD HH:MM
from the fixed character positions so it works on all shells without relying on
GNU date.
|
|
||
| if (botComment) { | ||
| await github.rest.issues.updateComment({ | ||
| owner: context.repo.owner, | ||
| repo: context.repo.repo, | ||
| comment_id: botComment.id, | ||
| body: commentBody | ||
| }); | ||
| console.log('Updated existing comment'); | ||
| } else { | ||
| await github.rest.issues.createComment({ | ||
| owner: context.repo.owner, | ||
| repo: context.repo.repo, | ||
| issue_number: context.issue.number, | ||
| body: commentBody | ||
| }); | ||
| console.log('Created new comment'); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Validate service name handling in report generation to prevent injection.
Lines 437 and similar locations echo service names directly into the Markdown report without quoting. If a service name contains pipe characters (|) or other Markdown special characters, the table formatting could break. Apply proper escaping or validation:
# Escape pipe characters in service names for Markdown tables
escaped_service=$(echo "$service" | sed 's/|/\\|/g')
echo "| $escaped_service | ... |" >> report.md🤖 Prompt for AI Agents
.github/actions/docker-image-size-tracker/action.yml around lines 509 to 526:
the code inserts service names directly into a Markdown table when
creating/updating the comment, which can break table formatting or allow
injection if a service name contains pipe characters or other Markdown special
characters; update the report-generation code to either validate service names
against an allowlist (reject/normalize any names with unsafe characters) or
escape Markdown-sensitive characters (at minimum replace '|' with '\|' and also
escape backticks, square brackets and backslashes) before concatenating into
table cells, and apply the same escaping/validation to all other places where
service names are interpolated into Markdown (e.g., the occurrence around line
437).
Proposed changes (including videos or screenshots)
Issue(s)
Steps to test or reproduce
Further comments
Summary by CodeRabbit