Skip to content

BUILD-10724: Import GitHub cache to S3 (migration fallback)#45

Draft
julien-carsique-sonarsource wants to merge 1 commit intomasterfrom
feat/jcarsique/BUILD-10724-migrationGh2s3
Draft

BUILD-10724: Import GitHub cache to S3 (migration fallback)#45
julien-carsique-sonarsource wants to merge 1 commit intomasterfrom
feat/jcarsique/BUILD-10724-migrationGh2s3

Conversation

@julien-carsique-sonarsource
Copy link
Contributor

@julien-carsique-sonarsource julien-carsique-sonarsource commented Mar 18, 2026

Summary

When migrating from GitHub cache to S3, the S3 bucket starts empty. This causes all runners to re-download dependencies from scratch until their first S3 cache save completes. This PR addresses that by automatically importing existing GitHub cache entries into S3 during the migration window.

Changes

action.yml — new import-github-cache input + migration steps

New input: import-github-cache (default: enabled)

Resolution order (mirrors the existing CACHE_BACKEND / backend pattern):

  1. Action input import-github-cache
  2. Environment variable CACHE_IMPORT_GITHUB (can be set from a repo variable via ${{ vars.CACHE_IMPORT_GITHUB }})
  3. Default: true

New steps (S3 path only):

Step Description
Determine GitHub cache import mode Resolves the effective import mode from input → env var → default
Import GitHub cache to S3 (migration fallback) Runs actions/cache/restore with the original unprefixed key when S3 misses and import mode is active. The S3 post-job save then persists the restored content to S3.
Enforce fail-on-cache-miss after GitHub import fallback Fails explicitly only after both S3 and GitHub have been tried, so fail-on-cache-miss: true is still respected.

fail-on-cache-miss and lookup-only are correctly propagated to the GitHub fallback step.

.github/workflows/check-cache-migration.yml — migration completion detector

Manually triggered (workflow_dispatch) workflow to determine when the migration is complete and automatically opt out of the import fallback.

It:

  1. Lists GitHub cache entries (paginated), filtering to long-lived branches (main, master, branch-*, dogfood-on-*, feature/long/*) and excluding transient keys (build-number-*, mise-*)
  2. Lists S3 objects (paginated via AWS CLI)
  3. Compares: for each included GitHub entry, checks for {ref}/{key} in S3
  4. If 100% of entries are present in S3: creates or updates repository variable CACHE_IMPORT_GITHUB=false, disabling the fallback automatically

Behaviour matrix

S3 hit Import mode GH hit Result fail-on-cache-miss
active S3 content used, no GH attempt per input
active GH content restored → saved to S3 post-job pass
active miss fail if flag set
inactive S3 content used per input
inactive miss per input (unchanged)

Testing

Dogfood PR: SonarSource/sonar-dummy-js#125

Jira

BUILD-10724 — child of BUILD-10684

…tion mode)

When the S3 backend is used and the S3 cache misses, automatically attempt
to restore the cache from GitHub using the original unprefixed key. The S3
post-job step will then save the restored content to S3, pre-provisioning it
for subsequent runs.

The feature is enabled by default. Resolution order to disable it:
  1. Action input `import-github-cache: 'false'`
  2. Environment variable `CACHE_IMPORT_GITHUB=false`
     (can be set from a repository variable via ${{ vars.CACHE_IMPORT_GITHUB }})
  3. Default: true

`fail-on-cache-miss` and `lookup-only` are correctly propagated to the
GitHub fallback step. When `fail-on-cache-miss` is set and import mode is
active, failure is deferred until both S3 and GitHub have been tried.

Also adds `.github/workflows/check-cache-migration.yml`: a manually-triggered
workflow that compares GitHub cache entries to S3 objects across target branches
(main, master, branch-*, dogfood-on-*, feature/long/*), ignoring transient
keys (build-number-*, mise-*). When 100% of entries are found in S3, it
automatically sets the CACHE_IMPORT_GITHUB=false repository variable to
disable the import fallback.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@julien-carsique-sonarsource julien-carsique-sonarsource requested a review from a team as a code owner March 18, 2026 17:13
@sonar-review-alpha
Copy link

sonar-review-alpha bot commented Mar 18, 2026

Summary

Implements a GitHub-to-S3 cache migration fallback for the actions/cache wrapper. When the S3 cache backend is active but empty (during the migration window), the action now attempts to restore from GitHub cache as a fallback. The PR adds the import logic to action.yml and a separate manually-triggered workflow to detect when migration is complete and disable the fallback automatically.

The key behavior: S3 miss → (if import enabled) try GitHub → persist to S3 on post-job save. The fail-on-cache-miss flag is deferred until both backends have been tried.

What reviewers should know

Start with action.yml (the core logic):

  • New import-github-cache input with resolution order: input → env var → default true
  • "Determine GitHub cache import mode" step sets this up
  • Note the S3 cache step now conditionally suppresses fail-on-cache-miss when import is active (deferred to a later step)
  • The GitHub import step only runs when S3 misses AND import mode is active — it uses the original unprefixed key (not the ref-prefixed S3 format)
  • "Enforce fail-on-cache-miss after GitHub import fallback" is the deferral mechanism: only fails if both S3 and GitHub miss AND the flag is set
  • The cache-hit output now chains three potential sources

Then review the migration workflow (.github/workflows/check-cache-migration.yml):

  • Manually triggered (workflow_dispatch) — not automatic
  • Filters GitHub caches to long-lived branches and excludes transient patterns (build-number, mise)
  • Compares {ref}/{key} in S3 for each included GitHub entry
  • Auto-disables fallback by setting CACHE_IMPORT_GITHUB=false when 100% migrated

Non-obvious decisions:

  • GitHub import uses unprefixed keys, S3 uses prefixed keys — the GitHub actions/cache/restore step doesn't know about the ref prefix
  • The S3 post-job save will pick up whatever the GitHub import restored and persist it to S3 with the proper prefix
  • Migration is opt-in via workflow dispatch (not automatic), so you have control over when to check and disable

  • Generate Walkthrough
  • Generate Diagram

🗣️ Give feedback

@julien-carsique-sonarsource julien-carsique-sonarsource marked this pull request as draft March 18, 2026 17:13
@sonarqube-cloud-us
Copy link

SonarQube reviewer guide

Review in SonarQube

Quality Gate Failed Quality Gate failed

Failed conditions
1 New issue

See analysis details on SonarQube Cloud

Catch issues before they fail your Quality Gate with our IDE extension SonarQube for IDE

@sonarqubecloud
Copy link

SonarQube reviewer guide

Review in SonarQube

Quality Gate Failed Quality Gate failed

Failed conditions
1 New issue

See analysis details on SonarQube Cloud

Catch issues before they fail your Quality Gate with our IDE extension SonarQube for IDE

steps.github-import.outputs.cache-hit != 'true'
shell: bash
run: |
echo "::error::Cache miss: no cache found in S3 or GitHub for key '${{ inputs.key }}'"

Check failure

Code scanning / SonarCloud

GitHub Actions should not be vulnerable to script injections High

Change this action to not use user-controlled data directly in a run block. See more on SonarQube Cloud
Copy link

@sonar-review-alpha sonar-review-alpha bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Conclusion: The migration logic is sound and fits the existing patterns well, but there is one security issue that needs fixing before merge.

SonarCloud recommendations: Even though action.yml:191 is not failing the quality gate, I strongly recommend fixing it because ${{ inputs.key }} is interpolated directly into the shell run: block inside double quotes — a key value containing a " character can break out of the string and execute arbitrary shell commands. Fix by passing the key through an environment variable:

env:
  CACHE_KEY: ${{ inputs.key }}
run: |
  echo "::error::Cache miss: no cache found in S3 or GitHub for key '${CACHE_KEY}'"
  exit 1

🗣️ Give feedback

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant