refactor(agents): migrate dependabot AW review to workflow_run trigger by katriendg · Pull Request #612 · microsoft/physical-ai-toolchain

katriendg · 2026-05-05T06:19:30Z

Description

The aw-dependabot-pr-review agentic workflow used to fire on pull_request_target, which meant the resolver step captured a snapshot of PR Validation while it was still pending or in_progress:*, and the advisory review was posted before the orchestrator ever finished. PR #608 was the canonical example: the review correctly applied the Isaac Sim numpy 2.x ABI guard, but its CI banner quoted a stale in_progress:in_progress conclusion.

This PR migrates the workflow to workflow_run keyed on PR Validation completed, reads the orchestrator's terminal conclusion straight from context.payload.workflow_run.conclusion, and pre-resolves failing per-surface check-runs once in the resolver step. The persona rubric is rewritten to consume those env vars and to map every terminal conclusion explicitly - pending and in_progress:* branches are gone because they are now unreachable.

Related to #579.

Type of Change

🐛 Bug fix (non-breaking change fixing an issue)
✨ New feature (non-breaking change adding functionality)
💥 Breaking change (fix or feature causing existing functionality to change)
📚 Documentation update
🏗️ Infrastructure change (Terraform/IaC)
♻️ Refactoring (no functional changes)

Component(s) Affected

infrastructure/terraform/prerequisites/ - Azure subscription setup
infrastructure/terraform/ - Terraform infrastructure
infrastructure/setup/ - OSMO control plane / Helm
workflows/ - Training and evaluation workflows
training/ - Training pipelines and scripts
docs/ - Documentation

Changes

Workflow trigger and resolver

Switching to workflow_run runs the agent step against the trusted, default-branch copy of the workflow, so the gh-aw compiler can auto-inject fork-PR exclusion and the repository.id guard.

Replaced pull_request_target with workflow_run on workflows: ["PR Validation"], types: [completed], branches: ["dependabot/**"]. The branches: filter on workflow_run matches the triggering run's head_branch (not the base), so dependabot/** is the only value that fires for Dependabot PRs — using main here was the #583 regression fixed in #584. The workflow-level if: filters on workflow_run.event == 'pull_request', workflow_run.actor.login == 'dependabot[bot]', and a whitelist of seven terminal conclusions.
Kept on.bots: ["dependabot[bot]"] and on.roles: [admin, maintainer, write] at the top level — gh-aw's pre_activation guard checks the triggering actor against on.bots / on.roles independently of the workflow if:, so dropping these would resurrect the #585 / #586 User permission 'none' activation block.
Added checks: read to permissions: for server-side check-run enumeration; existing contents, pull-requests, and actions scopes are unchanged.
Rewrote the resolve-pr step. It reads context.payload.workflow_run, prefers workflow_run.pull_requests[0], and falls back to search.issuesAndPullRequests keyed on head_sha for the fork case. Both paths re-hydrate via pulls.get so body and draft are reliable.
Dropped the previous listWorkflowRunsForRepo lookup. PR_VALIDATION_CONCLUSION now reads directly from run.conclusion, which under types: [completed] is always one of success, failure, cancelled, timed_out, neutral, skipped, or action_required.
Added two new env vars exported by the resolver:
- PR_VALIDATION_FAILING_CHECKS — JSON array of {name, html_url, conclusion} from checks.listForRef(ref=pr.head.sha) filtered to completed non-success/non-neutral/non-skipped runs.
- PR_BODY — PR body hydrated server-side so the agent does not depend on the integrity-filtered MCP read of the PR.
New skip reasons in PR_DEPENDABOT_SKIP_REASON: not-a-pr-run and pr-resolution-failed, alongside the existing not-dependabot / draft.
Retargeted safe-outputs:
- submit-pull-request-review.target → ${{ env.PR_NUMBER }}
- add-comment.target → ${{ env.PR_NUMBER }} (was triggering, which is undefined under workflow_run)
- create-pull-request-review-comment.target → "*"

Persona verdict rubric

The agent now reasons over a final CI signal, so the rubric collapses to a clean terminal-conclusion map.

Rewrote the Validation Signal section in .github/agents/dependabot-pr-reviewer.agent.md. The persona is told the workflow runs after PR Validation reaches a terminal conclusion, and is explicitly forbidden from calling checks.listForRef or commits/{sha}/check-runs — it reads PR_VALIDATION_FAILING_CHECKS from the environment instead.
Reframed the Surface to Check Run Map as an informational lookup for mapping a failing check name back to its dependency surface. The persona no longer walks it via the API.
Rewrote the Verdict Adjustment block as an explicit terminal-conclusion map:
- success + no static concern + no sticky high-risk trigger → APPROVE-eligible, citing the orchestrator conclusion plus an empty PR_VALIDATION_FAILING_CHECKS.
- failure | cancelled | timed_out | action_required → COMMENT; body MUST quote every entry from PR_VALIDATION_FAILING_CHECKS (name plus html_url).
- neutral | skipped | unknown or PR_DEPENDABOT_SKIP_REASON == 'pr-resolution-failed' → COMMENT with a > [!CAUTION] banner: Deterministic CI signal unavailable ({conclusion}); review is advisory only.
Preserved the sticky Isaac Sim ABI guard verbatim — a numpy 2.x bump still keeps the verdict at COMMENT and forces the ⚠️ Maintainer review recommended banner regardless of CI conclusion.

Workflow documentation and lock files

Rewrote the Trigger Posture and step-by-step prose in aw-dependabot-pr-review.md to describe the workflow_run execution model, the gh-aw compiler's auto-injected fork-PR exclusion and repository.id guard, and the new env-var contract.
Bumped github/gh-aw-actions/setup v0.68.3 → v0.71.1 in .github/aw/actions-lock.json (SHA ba90f21… → 239aec4…), picked up by recompilation.
Regenerated .github/workflows/aw-dependabot-pr-review.lock.yml via the gh-aw compiler — diff reflects the trigger swap, the new env vars, and the setup-action SHA bump. No hand edits.

Testing Performed

Terraform plan reviewed (no unexpected changes)
Terraform apply tested in dev environment
Training scripts tested locally with Isaac Sim
OSMO workflow submitted successfully
Smoke tests passed (smoke_test_azure.py)

None of the templated test surfaces apply — this PR only touches .github/agents/ and .github/workflows/. Validation evidence: npm run lint:md and npm run lint:yaml pass on the changed files; the aw-dependabot-pr-review.lock.yml artifact is regenerated rather than hand-edited and matches the gh-aw compiler output for the new source. The behavioural change is observable on the next Dependabot PR — the advisory review will fire after PR Validation completes and quote the orchestrator's terminal conclusion plus any failing per-surface checks.

Documentation Impact

No documentation changes needed
Documentation updated in this PR
Documentation issue filed

Bug Fix Checklist

Not a bug fix — this is a refactor of an agentic-workflow trigger surface.

Linked to issue being fixed
Regression test included, OR
Justification for no regression test:

Checklist

My code follows the project conventions
Commit messages follow conventional commit format
I have performed a self-review
Documentation impact assessed above
No new linting warnings introduced

Related Issues

Related to #579

Notes

The min-integrity: approved setting on tools.github is intentionally preserved. The agent's MCP PR-body read is therefore filtered, which is why the resolver hydrates PR_BODY from the REST API server-side — the persona consumes the env var rather than relying on the filtered MCP payload.

Lowering min-integrity to unapproved was rejected on prompt-injection grounds; the resolver-side hydration is the chosen mitigation.
workflow_run runs in default-branch context, which means changes to the AW workflow itself cannot be exercised by a Dependabot PR — this is the secure-by-design tradeoff documented in the GitHub Security Lab "preventing pwn requests" guide and aligns with the gh-aw workflow_run recommendation.

Follow-up Tasks

Validate behaviour on a grouped Dependabot update that produces multiple PR Validation runs against the same head SHA — confirm that only the latest completed run drives the advisory review.
After the first live Dependabot PR runs through the new trigger, compare the posted review's CI banner against the orchestrator's final conclusion and the failing-check list to confirm the staleness regression observed in PR security(deps): bump the training-dependencies group across 1 directory with 76 updates #608 is gone.
Confirm that safe-outputs.submit-pull-request-review and add-comment post successfully under workflow_run — the target: ${{ env.PR_NUMBER }} overrides are the #588 / #589 mitigation; a Not in pull request context skip in safe_outputs would mean the env var did not resolve.

- Switch trigger from pull_request_target to workflow_run gated on PR Validation completion on main - Filter on workflow_run.actor.login == 'dependabot[bot]' (replacing pull_request_target bots:/roles: allowlists) - Hydrate PR_VALIDATION_CONCLUSION from workflow_run payload and PR_VALIDATION_FAILING_CHECKS via checks.listForRef - Tighten persona verdict rubric so non-success conclusions map to COMMENT with caution banner - Replace persona check-run API walk with resolver-supplied env vars - Regenerate aw-dependabot-pr-review.lock.yml 🤖 - Generated by Copilot Co-authored-by: Copilot <copilot@github.com>

…dabot branches - change workflow_run branches from main to dependabot/** - clarify workflow execution context for Dependabot PRs 🔧 - Generated by Copilot

github-actions · 2026-05-05T06:19:44Z

Dependency Review

✅ No vulnerabilities or license issues or OpenSSF Scorecard issues found.

Snapshot Warnings

⚠️: No snapshots were found for the head SHA f811335.

Ensure that dependencies are being submitted on PR branches and consider enabling retry-on-snapshot-warnings. See the documentation for more information and troubleshooting advice.

OpenSSF Scorecard

Package

Version

Score

Details

actions/actions/github-script

373c709c69115d41ff229c7e5df9f8788daa9553

🟢 7.7

Details

Check	Score	Reason
Code-Review	🟢 10	all changesets reviewed
Binary-Artifacts	🟢 10	no binaries found in the repo
Maintained	🟢 10	21 commit(s) and 0 issue activity found in the last 90 days -- score normalized to 10
Packaging	⚠️ -1	packaging workflow not detected
Dangerous-Workflow	🟢 10	no dangerous workflow patterns detected
CII-Best-Practices	⚠️ 0	no effort to earn an OpenSSF best practices badge detected
Token-Permissions	🟢 9	detected GitHub workflow tokens with excessive permissions
Pinned-Dependencies	⚠️ 1	dependency not pinned by hash detected -- score normalized to 1
Fuzzing	⚠️ 0	project is not fuzzed
License	🟢 10	license file detected
Signed-Releases	⚠️ -1	no releases found
Security-Policy	🟢 9	security policy file detected
SAST	🟢 10	SAST tool is run on all commits
Branch-Protection	🟢 5	branch protection is not maximal on development and all release branches

actions/github/gh-aw-actions/setup

239aec45b78c8799417efdd5bc6d8cc036629ec1

Unknown

Scanned Files

.github/workflows/aw-dependabot-pr-review.lock.yml

codecov-commenter · 2026-05-05T06:21:24Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 77.27%. Comparing base (10ab980) to head (f811335).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #612      +/-   ##
==========================================
- Coverage   77.27%   77.27%   -0.01%     
==========================================
  Files         272      272              
  Lines       18140    18137       -3     
  Branches     2452     2467      +15     
==========================================
- Hits        14018    14015       -3     
  Misses       3698     3698              
  Partials      424      424

Flag	Coverage Δ		*Carryforward flag
pester	`83.13% <ø> (ø)`		Carriedforward from db20d20
pytest-data-pipeline	`100.00% <ø> (ø)`		Carriedforward from db20d20
pytest-dataviewer	`93.60% <ø> (-0.01%)`	⬇️	Carriedforward from db20d20
pytest-dm-tools	`100.00% <ø> (ø)`		Carriedforward from db20d20
pytest-evaluation	`99.51% <ø> (ø)`
pytest-fuzz	`4.89% <ø> (+<0.01%)`	⬆️	Carriedforward from db20d20
pytest-inference	`100.00% <ø> (ø)`		Carriedforward from db20d20
pytest-training	`93.32% <ø> (ø)`		Carriedforward from db20d20
vitest	`53.02% <ø> (ø)`		Carriedforward from db20d20

*This pull request uses carry forward flags. Click here to find out more.
see 3 files with indirect coverage changes

🚀 New features to boost your workflow:

📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

katriendg and others added 2 commits May 4, 2026 18:31

chore(workflows): update Dependabot PR review triggers to match depen…

db20d20

…dabot branches - change workflow_run branches from main to dependabot/** - clarify workflow execution context for Dependabot PRs 🔧 - Generated by Copilot

katriendg requested a review from a team as a code owner May 5, 2026 06:19

Merge branch 'main' into chore/aw-stale-ci

f811335

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor(agents): migrate dependabot AW review to workflow_run trigger#612

refactor(agents): migrate dependabot AW review to workflow_run trigger#612
katriendg wants to merge 3 commits intomainfrom
chore/aw-stale-ci

katriendg commented May 5, 2026

Uh oh!

github-actions Bot commented May 5, 2026 •

edited

Loading

Uh oh!

codecov-commenter commented May 5, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

katriendg commented May 5, 2026

Description

Type of Change

Component(s) Affected

Changes

Workflow trigger and resolver

Persona verdict rubric

Workflow documentation and lock files

Testing Performed

Documentation Impact

Bug Fix Checklist

Checklist

Related Issues

Notes

Follow-up Tasks

Uh oh!

github-actions Bot commented May 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Dependency Review

Snapshot Warnings

OpenSSF Scorecard

Scanned Files

Uh oh!

codecov-commenter commented May 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

github-actions Bot commented May 5, 2026 •

edited

Loading

codecov-commenter commented May 5, 2026 •

edited

Loading