Skip to content

Conversation

@gurevichdmitry
Copy link
Collaborator

Summary of your changes

There is no built-in GitHub Actions mechanism to retry a step on failure. Since this is a relatively simple use case, it is preferable to handle retries with a custom script rather than introducing a third-party action that would require additional version management and maintenance.

This PR introduces a retry script and applies it to the Hermit initialization command to make the CI workflows more resilient to transient failures.

@gurevichdmitry gurevichdmitry requested a review from a team as a code owner December 17, 2025 16:31
@mergify
Copy link

mergify bot commented Dec 17, 2025

This pull request does not have a backport label. Could you fix it @gurevichdmitry? 🙏
To fixup this pull request, you need to add the backport labels for the needed
branches, such as:

  • backport-v./d./d./d is the label to automatically backport to the 8./d branch. /d is the digit
  • backport-active-all is the label that automatically backports to all active branches.
  • backport-active-8 is the label that automatically backports to all active minor branches for the 8 major.
  • backport-active-9 is the label that automatically backports to all active minor branches for the 9 major.

@mergify
Copy link

mergify bot commented Dec 23, 2025

This pull request is now in conflicts. Could you fix it? 🙏
To fixup this pull request, you can check out it locally. See documentation: https://help.github.com/articles/checking-out-pull-requests-locally/

git fetch upstream
git checkout -b dg-add-retry-to-hermit upstream/dg-add-retry-to-hermit
git merge upstream/main
git push upstream dg-add-retry-to-hermit

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces retry logic for Hermit initialization to make CI workflows more resilient to transient failures. Instead of using third-party retry actions, it implements a custom retry script with exponential backoff.

Key changes:

  • Adds a new retry wrapper script (.ci/scripts/retry.sh) with configurable retry attempts and exponential backoff
  • Integrates retry logic into the Hermit initialization step in the hermit composite action
  • Consolidates workflow files to use the centralized hermit action instead of inline initialization steps

Reviewed changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
.ci/scripts/retry.sh New retry wrapper script with exponential backoff for handling transient command failures
.github/actions/hermit/action.yml Integrates retry script into Hermit initialization and updates poetry install configuration
.github/workflows/ci-pull_request.yml Migrates to use centralized hermit action for initialization
.github/workflows/binary-size-monitor.yml Migrates to use centralized hermit action for initialization
.github/workflows/packaging.yml Migrates to use centralized hermit action for initialization
.github/workflows/test-environment.yml Migrates to use centralized hermit action for initialization
.github/workflows/test-opa-coverage.yml Migrates to use centralized hermit action for initialization
.github/workflows/test-opa-policies.yml Migrates to use centralized hermit action for initialization
.github/workflows/updatecli.yml Migrates to use centralized hermit action for initialization
.github/workflows/upgrade-environment.yml Migrates to use centralized hermit action for initialization

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 36 to 41
if eval "$command"; then
echo "Command succeeded on attempt $attempt"
exit 0
fi

exit_code=$?
Copy link

Copilot AI Dec 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The exit code capture at line 41 will always be 0 because the preceding 'if' statement succeeded (even though the command inside it failed). The exit code should be captured immediately after eval on line 36, or the 'if' statement logic should be inverted. Currently, this bug prevents the actual command exit code from being reported when all retries are exhausted.

Suggested change
if eval "$command"; then
echo "Command succeeded on attempt $attempt"
exit 0
fi
exit_code=$?
eval "$command"
exit_code=$?
if [ $exit_code -eq 0 ]; then
echo "Command succeeded on attempt $attempt"
exit 0
fi

Copilot uses AI. Check for mistakes.
run: |
max_retries=5
retry_delay=10
./.ci/scripts/retry.sh "$max_retries" "$retry_delay" './bin/hermit env --raw >> "$GITHUB_ENV"'
Copy link

Copilot AI Dec 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The command passed to retry.sh contains shell redirection ('>> "$GITHUB_ENV"') that will be evaluated in the retry script's context, not in the hermit command's context. This means the output redirection happens after the eval, potentially causing issues. The command should be restructured to ensure the redirection is properly handled within the retry mechanism, or the retry script should be modified to handle output redirection correctly.

Suggested change
./.ci/scripts/retry.sh "$max_retries" "$retry_delay" './bin/hermit env --raw >> "$GITHUB_ENV"'
./.ci/scripts/retry.sh "$max_retries" "$retry_delay" './bin/hermit env --raw' >> "$GITHUB_ENV"

Copilot uses AI. Check for mistakes.
@uri-weisman
Copy link
Collaborator

I understand those changes allows us to retry on a single steps, but other steps might also be flaky and we might not want to apply the retry block for all of them.
Should we consider using rerun-flaky-workflows to automatically re-run the GitHub Action if it fails?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants