feat(security): container image signing reference architecture#592
feat(security): container image signing reference architecture#592WilliamBerryiii wants to merge 3 commits intomainfrom
Conversation
- commit-message.instructions.md: register (security) scope for supply-chain artifacts - copilot-instructions.md: add (security) to Git Workflow scope enumeration 🔒 - Generated by Copilot
…ce architecture - Add github-oidc, arc-runners, notation-akv, sigstore-mirror Terraform modules with per-module tftest suites - Wire signing_mode (sigstore|notation|none), should_use_public_rekor, should_deploy_sigstore_mirror, should_enable_premium_acr selectors into root module - Add root signing-mode matrix tftest covering all three modes plus mirror toggle - Apply Kyverno-runner egress restriction via Kubernetes NetworkPolicy on arc-runners namespace (DD-01: repo has no azurerm_firewall) - Validation: lint:tf 0 issues, lint:tf:validate 0 errors, test:tf 192/0/0 across 13 modules 🔒 - Generated by Copilot
* signing_mode sigstore/notation/none with Kyverno admission enforcement (single-cluster toggle, kyverno test 36/36 against rendered fixtures)
* sigstore-mirror, arc-runners, github-oidc, notation-akv Terraform modules with terraform test 192/0/0
* 7 SHA-pinned reusable+trigger workflows: container-build-verify, container-publish, container-publish-notation, container-vulnerability-scan, dataviewer-image-publish, lerobot-eval-image-publish, notation-key-rotate (cert-identity-regexp pinned)
* verify-image.sh, check-admission-readiness.sh, scan-image-vulns.sh + Pester suites; deploy-dataviewer.sh switched to signed-digest path; --accept-public-rekor consent banner in 01-deploy-robotics-charts.sh
* docs/security/{container-signing,rekor-disclosure}.md, ADR, notation key-rotation runbook, OpenVEX seed, configure-container-build prompt
🔒 - Generated by Copilot
Dependency ReviewThe following issues were found:
Snapshot WarningsEnsure that dependencies are being submitted on PR branches and consider enabling retry-on-snapshot-warnings. See the documentation for more information and troubleshooting advice. License Issues.github/workflows/container-vulnerability-scan.yml
.github/workflows/notation-key-rotate.yml
OpenSSF Scorecard
Scanned Files
|
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #592 +/- ##
==========================================
+ Coverage 63.91% 66.56% +2.65%
==========================================
Files 250 262 +12
Lines 15409 16639 +1230
Branches 2163 2301 +138
==========================================
+ Hits 9848 11076 +1228
Misses 5274 5274
- Partials 287 289 +2
*This pull request uses carry forward flags. Click here to find out more. 🚀 New features to boost your workflow:
|
Testing the Container Signing Reference ArchitectureEnd-to-end test guide for PR #592. Tests are layered from fastest (no infra) to slowest (live Azure + cluster). Run layers 1-3 locally before opening any cluster work; run layers 4-8 against a non-production subscription. Prerequisites
Azure prerequisites: an AKV with RBAC, an ACR (Premium SKU if testing Notation in-registry signatures), a federated GitHub OIDC credential bound to this fork's branches. # One-time auth
az login
az acr login --name <youracr>
gh auth loginLayer 1 — Terraform tests (no Azure required)All # Root integration + signing-mode matrix
cd infrastructure/terraform
terraform init -backend=false
terraform test
# Per-module conditionals
for m in arc-runners github-oidc notation-akv sigstore-mirror; do
pushd "modules/$m" >/dev/null
terraform init -backend=false
terraform test
popd >/dev/null
doneExpected coverage:
Pass criteria: every Layer 2 — Kyverno policy tests (no cluster required)kyverno test policies/kyverno/tests/Validates policies/kyverno/tests/kyverno-test.yaml against rendered policy fixtures and resource samples. Both Sigstore and Notation policies must report Layer 3 — Pester tests for security scriptspwsh -c "Invoke-Pester scripts/tests/security -Output Detailed"Covers:
Run shellcheck alongside: shellcheck scripts/security/*.shLayer 4 — End-to-end signing (Sigstore mode)Trigger the publish workflow on a throwaway tag/branch: gh workflow run container-build-verify.yml --ref <branch>
gh workflow run container-publish.yml --ref <branch> -f image=dataviewerAfter completion, resolve the immutable digest and verify: IMAGE=<youracr>.azurecr.io/dataviewer
DIGEST=$(az acr repository show-manifests --name <youracr> --repository dataviewer \
--orderby time_desc --top 1 --query '[0].digest' -o tsv)
cosign verify "$IMAGE@$DIGEST" \
--certificate-oidc-issuer https://token.actions.githubusercontent.com \
--certificate-identity-regexp "^https://github.com/microsoft/physical-ai-toolchain/.*"
for t in spdxjson slsaprovenance cyclonedx openvex; do
cosign verify-attestation --type "$t" "$IMAGE@$DIGEST" \
--certificate-oidc-issuer https://token.actions.githubusercontent.com \
--certificate-identity-regexp "^https://github.com/microsoft/physical-ai-toolchain/.*" \
>/dev/null && echo "OK: $t"
doneAll four attestations must verify. If Layer 5 — End-to-end signing (Notation mode)Apply the Notation infrastructure variant: cd infrastructure/terraform
terraform apply -var='signing_mode=notation' -var='should_enable_premium_acr=true'Trigger the Notation publisher and verify: gh workflow run container-publish-notation.yml --ref <branch> -f image=dataviewer
# Trust policy comes from the notation-akv module outputs
notation policy import "$(terraform output -raw notation_trust_policy_path)"
notation verify "$IMAGE@$DIGEST"Layer 6 — Admission enforcementWith Flux applied to a target cluster: ./scripts/security/check-admission-readiness.shThen exercise the policy: # Should ADMIT (signed image)
kubectl run signed --image="$IMAGE@$DIGEST" --restart=Never --rm -it -- /bin/true
# Should REJECT (unsigned upstream)
kubectl run unsigned --image=nginx:latest --restart=Never --rm -it -- /bin/true
# Expect: admission webhook "validate.kyverno.svc-fail" denied the requestConfirm both Kyverno policies attached: kubectl get cpol kyverno-sigstore-policy kyverno-notation-policy -o jsonpath='{.items[*].status.ready}'Layer 7 — Vulnerability scan + VEX./scripts/security/scan-image-vulns.sh "$IMAGE@$DIGEST"Confirm CVEs listed in security/vex/dataviewer-base.openvex.json appear under the suppressed/ Layer 8 — Notation key rotation drillgh workflow run notation-key-rotate.yml --ref mainVerification checklist:
Refer to docs/runbooks/notation-key-rotation.md for rollback steps. Cleanup# Remove test images
az acr repository delete --name <youracr> --repository dataviewer --yes
# Tear down infra (per-environment)
cd infrastructure/terraform
terraform destroy -var='signing_mode=notation'Troubleshooting
See the broader troubleshooting matrix in docs/security/container-signing.md. |
Pull Request
Description
Introduces a complete container image signing reference architecture for the Physical AI Toolchain. Adds dual-mode signing (Sigstore keyless and Notation+AKV HSM), Kyverno admission policies for edge clusters, OpenVEX vulnerability suppression, supporting Terraform modules, and a verified-digest deployment flow.
Major additions:
arc-runners,github-oidc,notation-akv,sigstore-mirrorwith full conditional wiring driven by a newsigning_modevariable (sigstore|notation|none)container-build-verify,container-publish(cosign keyless),container-publish-notation(AKV HSM),notation-key-rotate,lerobot-eval-image-publish;dataviewer-image-publishextended for dual-modedev/staging→ Sigstore,production→ Notation), TUF trusted-root distribution + hourly refresh CronJobscripts/security/verify-image.sh,scan-image-vulns.sh,check-admission-readiness.sh,probe-admission.shwith full Pester coverageinfrastructure/setup/02-deploy-dataviewer.shanddata-management/setup/deploy-dataviewer.shno longer build images. Operators must supply pre-signed digests via--backend-digest/--frontend-digestand select--verify-mode. Verification is mandatory and abort-on-failure.container-signing-public-rekor.md), runbook (notation-key-rotation.md), security guides (container-signing.md,rekor-disclosure.md)security/vex/dataviewer-base.openvex.json) suppressing CVE-2023-45853 in dataviewer base imagesconfigure-container-build.prompt.mdfor guided operator onboarding(security)registered in copilot-instructions and commit-message instructionsComponents beyond the template list — this PR also touches
policies/kyverno/,scripts/security/,security/vex/,fleet-deployment/gitops/,data-management/, and.github/workflows/. See pr-reference-log.md for the full file-by-file synthesis.Closes #
Type of Change
Component(s) Affected
infrastructure/terraform/prerequisites/- Azure subscription setupinfrastructure/terraform/- Terraform infrastructureinfrastructure/setup/- OSMO control plane / Helmworkflows/- Training and evaluation workflowstraining/- Training pipelines and scriptsdocs/- DocumentationAdditional surfaces (not enumerated in the template):
.github/workflows/— new reusable signing workflows + actionlint configfleet-deployment/gitops/— Kyverno admission, sources, per-cluster overlayspolicies/kyverno/tests/— Kyverno CLI policy testsscripts/security/+scripts/tests/security/— verification tooling and Pester coveragesecurity/vex/— first OpenVEX documentdata-management/setup/anddata-management/viewer/deploy script + READMETesting Performed
planreviewed (no unexpected changes)applytested in dev environmentsmoke_test_azure.py)Additional verification:
terraform testmatrix ininfrastructure/terraform/tests/signing-mode.tftest.hclcovers sigstore+mirror, sigstore+public, notation, noneterraform testsuites forarc-runners,github-oidc,notation-akv,sigstore-mirror(mock-provider plan-only)policies/kyverno/tests/(36 assertions: signed / unsigned / wrong-identity / missing-attestation / third-party)verify-image,scan-image-vulns,check-admission-readiness,deploy-dataviewer(in-process stub binaries; tmpdir sandbox)npm run lint:mdandnpm run spell-checkpassing on this branchOut of scope for this PR (deferred to follow-up):
evaluation/sil/Dockerfilelands — DD-04)Documentation Impact
New/updated docs:
docs/adrs/container-signing-public-rekor.mddocs/runbooks/notation-key-rotation.mddocs/security/container-signing.mddocs/security/rekor-disclosure.mddata-management/README.md— new "Build and Deploy Paths" sectionREADME.md— new "Verifying Container Images" sectionCONTRIBUTING.md— new "Container Image Signing" sectionsecurity/vex/README.md— OpenVEX authoring workflow.github/prompts/configure-container-build.prompt.md— interactive onboarding promptBug Fix Checklist
Not applicable — this is a feature + infrastructure PR.
Checklist
Reviewer attention:
signing_mode = "sigstore"writes signatures + workflow refs + image digests to a permanent public log. ADR documents rationale; consent gates exist on every operator surface, but please confirm acceptable for production posture.data-management/README.md.arc-runners— requires CNI L7 (Cilium/Calico) for enforcement. Defaults to enabled.infrastructure/terraform/main.tf— block "deploy when none" but do not enforce that ≥1 signing path is active. Intentional to allow staged rollout.🔒 - Generated by Copilot