diff --git a/.gitignore b/.gitignore index 84f2e0bf66b..fb2ac85231e 100644 --- a/.gitignore +++ b/.gitignore @@ -64,3 +64,6 @@ newRelease.json # Ignore golang's vendored files /vendor/ /tools/vendor/ + +# AI tooling instructions +AGENTS.md diff --git a/docs/ROADMAP.md b/docs/ROADMAP.md new file mode 100644 index 00000000000..d266a94cb0f --- /dev/null +++ b/docs/ROADMAP.md @@ -0,0 +1,150 @@ +# OpenSSF Scorecard Roadmap + +## 2026 + +### Theme: Scorecard v6 — Open Source Security Evidence Engine + +**Mission:** Scorecard produces trusted, structured security evidence for the +open source ecosystem. + +Scorecard v6 evolves Scorecard from a scoring tool to an evidence engine. The +primary initiative for 2026 is adding +[OSPS Baseline](https://baseline.openssf.org/) conformance evaluation as the +first use case that proves this architecture. Scorecard accepts diverse inputs +about a project's security practices, normalizes them through probe-based +analysis, and packages the resulting evidence in interoperable formats for +downstream tools to act on. + +Check scores (0-10) and conformance labels (PASS/FAIL/UNKNOWN) are parallel +evaluation layers over the same probe evidence, produced in a single run. +Existing checks, probes, and scores are unchanged — v6 is additive. The +conformance layer is a new product surface aligned with the +[ORBIT WG](https://github.com/ossf/wg-orbit) ecosystem. (While Scorecard is not +part of ORBIT WG, ecosystem interoperability with ORBIT tools is an overarching +OpenSSF goal, and Scorecard interoperates through published output formats.) + +**Target Baseline version:** [v2026.02.19](https://baseline.openssf.org/versions/2026-02-19) + +**Current coverage:** See [docs/osps-baseline-coverage.md](osps-baseline-coverage.md) +for a control-by-control analysis. + +### Phased delivery + +Phases are ordered by outcome. Maintainer bandwidth dictates delivery timing. + +#### Phase 1: Conformance foundation and Level 1 coverage + +**Outcome:** Scorecard produces a useful OSPS Baseline Level 1 conformance +report for any public GitHub repository, available across CLI, Action, and +API surfaces. + +Deliverables: + +- Evidence model and output formats: + - Enriched JSON (Scorecard-native) + - In-toto evidence predicate (`scorecard.dev/evidence/v0.1`) + - Gemara output (via [security-baseline](https://github.com/ossf/security-baseline) + dependency) + - OSCAL Assessment Results (via + [go-oscal](https://github.com/defenseunicorns/go-oscal)) + - OpenSSF Best Practices Badge [Automation Proposals](https://github.com/coreinfrastructure/best-practices-badge/blob/main/docs/automation-proposals.md) +- Unified framework abstraction for OSPS Baseline v2026.02.19: + - Checks and OSPS Baseline both use the same internal probe composition interface + - Probe-to-control mappings maintained in Scorecard +- Applicability engine detecting preconditions (e.g., "has made a release") +- Map existing probes to OSPS controls where coverage exists today +- New probes for Level 1 gaps: + - Governance and documentation presence + - Dependency manifest presence + - Security policy deepening + - Secrets detection — consuming platform signals where available +- Metadata ingestion layer — Security Insights as first supported source; + architecture supports additional metadata sources +- Scorecard control catalog extraction plan + +#### Phase 2: Release integrity and Level 2 core + +**Outcome:** Scorecard evaluates release-related OSPS controls, covering the +core of Level 2 and becoming useful for downstream due diligence workflows. + +Deliverables: + +- Release asset inspection layer +- Signed manifest support +- Release notes and changelog detection +- Evidence bundle output (conformance results + in-toto statement) +- Additional metadata sources for the ingestion layer + +**Note:** Phase 1 focuses on automatically verifiable controls. Design of +attestation mechanisms (for non-automatable controls) is deferred to Phase 3 or +beyond. + +#### Phase 3: Enforcement detection, Level 3, and multi-repo + +**Outcome:** Scorecard covers Level 3 controls including enforcement detection +and project-level aggregation. + +Deliverables: + +- SCA policy and enforcement detection (Scorecard detects enforcement mechanisms without being an enforcement tool itself) +- SAST policy and enforcement detection (Scorecard detects enforcement mechanisms without being an enforcement tool itself) +- Multi-repo project-level conformance aggregation +- Attestation mechanism for non-automatable controls (deferred from Phase 2) +- Attestation integration GA + +### Ecosystem alignment + +Scorecard operates within the ORBIT WG ecosystem as an evidence engine. All +downstream tools consume Scorecard evidence on equal terms through published +output formats. + +[Allstar](https://github.com/ossf/allstar), a Scorecard sub-project, +continuously monitors GitHub organizations and enforces Scorecard check +results as policies. OSPS conformance output could enable Allstar to enforce +Baseline conformance at the organization level. + +Scorecard SHOULD NOT (per [RFC 2119](https://www.rfc-editor.org/rfc/rfc2119)) +duplicate evaluation that downstream tools handle: + +- **[Privateer](https://github.com/ossf/pvtr-github-repo-scanner)** — Baseline evaluation powered by Gemara and Security Insights +- **[Minder](https://github.com/mindersec/minder)** — Policy enforcement and remediation platform (OpenSSF Sandbox, ORBIT WG) +- **[AMPEL](https://github.com/carabiner-dev/ampel)** — Attestation-based policy enforcement; already consumes Scorecard probe results via [policy library](https://github.com/carabiner-dev/policies/tree/main/scorecard) +- **[Darnit](https://github.com/kusari-oss/darnit)** — Compliance audit and remediation + +Scorecard's role is to produce deep, probe-based security evidence that these +tools and downstream consumers can use through interoperable output formats +(JSON, in-toto, Gemara, SARIF, OSCAL). + +### Design principles + +1. **Evidence is the product.** Scorecard's core output is structured, + normalized probe findings. Check scores and conformance labels are parallel + evaluation layers over the same evidence. +2. **Probes normalize diversity.** Each probe understands multiple ways a + control outcome can be satisfied. +3. **UNKNOWN-first honesty.** If Scorecard cannot observe a control, the + status is UNKNOWN with an explanation — never a false PASS or FAIL. +4. **All consumers are equal.** Downstream tools consume Scorecard evidence + through published output formats. +5. **No metadata monopolies.** Probes may evaluate multiple sources for the + same data. No single metadata file is required for meaningful results, + though they may enrich results. +6. **Formats are presentation.** Output formats (JSON, in-toto, Gemara, + SARIF, OSCAL) are views over the evidence model. No single format is + privileged. + +### Open questions + +The following design questions are under active discussion among maintainers: + +- **Attestation identity model** — How non-automatable controls are attested + (repo-local metadata vs. signed attestations via Sigstore/OIDC). Decomposed + into identity (who signs) and tooling (what generates, when) sub-questions. +- **Enforcement detection scope** — How Scorecard detects enforcement + mechanisms without being an enforcement tool itself + +### How to contribute + +See the [proposal](../openspec/changes/osps-baseline-conformance/proposal.md) +for detailed requirements and open questions. Discussion and feedback are +welcome via GitHub issues and the Scorecard community meetings. diff --git a/docs/osps-baseline-coverage.md b/docs/osps-baseline-coverage.md new file mode 100644 index 00000000000..33a79c13e39 --- /dev/null +++ b/docs/osps-baseline-coverage.md @@ -0,0 +1,221 @@ +# OSPS Baseline Coverage Analysis + +Analysis of Scorecard's current probe and check coverage against the +[OSPS Baseline v2026.02.19](https://baseline.openssf.org/versions/2026-02-19). + +This is a living document. As probes are added or enhanced, update the +coverage status and evidence columns accordingly. + +## Coverage legend + +| Symbol | Meaning | +|--------|---------| +| COVERED | Scorecard has probes that fully satisfy this control | +| PARTIAL | Scorecard has probes that provide evidence but do not fully satisfy the control | +| GAP | No existing probe provides meaningful evidence for this control | +| NOT_OBSERVABLE | Control requires data Scorecard cannot access (e.g., org-level admin permissions) | + +## Summary + +| Level | Total controls | COVERED | PARTIAL | GAP | NOT_OBSERVABLE | +|-------|---------------|---------|---------|-----|----------------| +| 1 | 25 | 6 | 8 | 9 | 2 | +| 2 | 17 | 2 | 5 | 9 | 1 | +| 3 | 17 | 0 | 4 | 13 | 0 | +| **Total** | **59** | **8** | **17** | **31** | **3** | + +**Automated coverage rate (COVERED + PARTIAL): 42% (25 of 59)** + +**Full coverage rate (COVERED only): 14% (8 of 59)** + +## Level 1 controls (25) + +| OSPS ID | Control (short) | Status | Scorecard probes/checks providing evidence | Gap / notes | +|---------|----------------|--------|---------------------------------------------|-------------| +| OSPS-AC-01.01 | MFA for sensitive resources | NOT_OBSERVABLE | None | Requires org admin API access; Scorecard tokens typically lack this. Must be UNKNOWN unless org-admin token is provided. | +| OSPS-AC-02.01 | Least-privilege defaults for new collaborators | NOT_OBSERVABLE | None | Requires org-level permission visibility. Must be UNKNOWN. | +| OSPS-AC-03.01 | Prevent direct commits to primary branch | COVERED | `requiresPRsToChangeCode`, `branchesAreProtected` | Branch-Protection check. Maps directly when PR-only merges are enforced. | +| OSPS-AC-03.02 | Prevent primary branch deletion | COVERED | `blocksDeleteOnBranches` | Branch-Protection check. Direct mapping. | +| OSPS-BR-01.01 | Sanitize untrusted CI/CD input | PARTIAL | `hasDangerousWorkflowScriptInjection` | Dangerous-Workflow check detects script injection patterns. Does not cover all sanitization cases (e.g., non-shell contexts). | +| OSPS-BR-01.03 | Untrusted code snapshots cannot access privileged credentials | PARTIAL | `hasDangerousWorkflowUntrustedCheckout` | Dangerous-Workflow check detects untrusted checkouts with access to secrets. Does not cover all credential isolation scenarios. | +| OSPS-BR-03.01 | Official channel URIs use encrypted transport | GAP | None | Requires a source-of-truth for official URIs (Security Insights). No probe exists. | +| OSPS-BR-03.02 | Distribution URIs use authenticated channels | GAP | None | Same as BR-03.01. Requires declared distribution channels. | +| OSPS-BR-07.01 | Prevent unintentional storage of secrets in VCS | GAP | None | No secrets detection probe exists today. Could consume platform signals (e.g., GitHub secret scanning API). | +| OSPS-DO-01.01 | User guides for released software | GAP | None | No documentation presence probe. Would need file/path heuristics. | +| OSPS-DO-02.01 | Defect reporting guide | GAP | None | No issue template / bug report documentation probe. | +| OSPS-GV-02.01 | Public discussion mechanism | GAP | None | Could check whether issues/discussions are enabled, but no probe exists. | +| OSPS-GV-03.01 | Documented contribution process | GAP | None | No CONTRIBUTING file presence/content probe. | +| OSPS-LE-02.01 | OSI/FSF license for source code | COVERED | `hasFSFOrOSIApprovedLicense` | License check. Direct mapping. | +| OSPS-LE-02.02 | OSI/FSF license for released assets | PARTIAL | `hasFSFOrOSIApprovedLicense` | License check verifies repo license, but does not verify license is shipped with release artifacts. | +| OSPS-LE-03.01 | License file in repository | COVERED | `hasLicenseFile` | License check. Direct mapping. | +| OSPS-LE-03.02 | License included with released assets | PARTIAL | `hasLicenseFile` | Detects license in repo, not in release artifacts. Needs release asset inspection. | +| OSPS-QA-01.01 | Repo publicly readable at static URL | COVERED | (implicit) | Scorecard can only scan public repos. If Scorecard runs, this is satisfied. | +| OSPS-QA-01.02 | Public commit history with authorship and timestamps | COVERED | (implicit) | VCS provides this by nature. Scorecard relies on commit history for multiple probes. | +| OSPS-QA-02.01 | Direct dependency list present | PARTIAL | `pinsDependencies` | Pinned-Dependencies check detects dependency manifests but focuses on pinning, not mere presence. | +| OSPS-QA-04.01 | Docs list subprojects | GAP | None | Requires Security Insights or similar metadata. No probe exists. | +| OSPS-QA-05.01 | No generated executable artifacts in VCS | PARTIAL | `hasBinaryArtifacts`, `hasUnverifiedBinaryArtifacts` | Binary-Artifacts check. Detects binary files but may not distinguish "generated executables" from other binaries. | +| OSPS-QA-05.02 | No unreviewable binary artifacts in VCS | PARTIAL | `hasUnverifiedBinaryArtifacts` | Detects unverified binaries. "Unreviewable" vs "reviewable" classification is not yet granular enough. | +| OSPS-VM-02.01 | Security contacts documented | PARTIAL | `securityPolicyPresent`, `securityPolicyContainsLinks` | Security-Policy check detects SECURITY.md presence and links, but does not verify actual contact methods (email, form, etc.). | +| OSPS-BR-01.04 | (Note: This is Level 3, not Level 1. Listed under Level 3 below.) | | | | + +## Level 2 controls (17) + +| OSPS ID | Control (short) | Status | Scorecard probes/checks providing evidence | Gap / notes | +|---------|----------------|--------|---------------------------------------------|-------------| +| OSPS-AC-04.01 | Default lowest CI/CD permissions | PARTIAL | `topLevelPermissions`, `jobLevelPermissions`, `hasNoGitHubWorkflowPermissionUnknown` | Token-Permissions check evaluates workflow permissions. "Defaults to lowest" semantics need verification. | +| OSPS-BR-02.01 | Releases have unique version identifier | GAP | None | No release versioning probe. Needs release API inspection. | +| OSPS-BR-04.01 | Releases have descriptive changelog | GAP | None | No changelog/release notes detection probe. | +| OSPS-BR-05.01 | Standardized tooling for dependency ingestion | GAP | None | No probe detects whether standard package managers are used in CI. | +| OSPS-BR-06.01 | Releases signed or accounted for in signed manifest | PARTIAL | `releasesAreSigned`, `releasesHaveProvenance`, `releasesHaveVerifiedProvenance` | Signed-Releases check covers signatures and provenance. Does not yet check for "signed manifest including hashes" as an alternative. | +| OSPS-DO-06.01 | Docs describe dependency selection/tracking | GAP | None | Documentation control. No probe exists. | +| OSPS-DO-07.01 | Build instructions in documentation | GAP | None | Documentation control. No probe exists. | +| OSPS-GV-01.01 | Docs list members with sensitive access | NOT_OBSERVABLE | None | Requires org-level data or attestation. Not automatable via Scorecard. | +| OSPS-GV-01.02 | Docs list roles and responsibilities | GAP | None | Documentation control. May require attestation. | +| OSPS-GV-03.02 | Contributor guide with acceptability requirements | PARTIAL | (related: CONTRIBUTING presence could be inferred) | No probe today; could extend a contributing-file probe to check for content structure. | +| OSPS-LE-01.01 | Legal authorization per commit (DCO/CLA) | GAP | None | No DCO/CLA detection probe. Would check for Signed-off-by trailers or CLA bot enforcement. | +| OSPS-QA-03.01 | Status checks pass or bypassed before merge | COVERED | `runsStatusChecksBeforeMerging` | Branch-Protection check. Direct mapping. | +| OSPS-QA-06.01 | Automated tests run prior to acceptance | COVERED | `testsRunInCI` | CI-Tests check. Maps directly. | +| OSPS-SA-01.01 | Design docs with actions/actors | GAP | None | Documentation/assessment control. Requires attestation. | +| OSPS-SA-02.01 | Docs describe external interfaces | GAP | None | Documentation control. Requires attestation. | +| OSPS-SA-03.01 | Security assessment performed | GAP | None | Process control. Requires attestation with evidence link. | +| OSPS-VM-01.01 | CVD policy with response timeframe | PARTIAL | `securityPolicyContainsVulnerabilityDisclosure`, `securityPolicyContainsText` | Security-Policy check detects disclosure language. Does not verify explicit timeframe commitment. | +| OSPS-VM-03.01 | Private vulnerability reporting method | PARTIAL | `securityPolicyContainsLinks` | Detects links in SECURITY.md. Does not verify private reporting is actually enabled (e.g., GitHub PSIRT feature). | +| OSPS-VM-04.01 | Publicly publish vulnerability data | GAP | None | No probe checks for GitHub Security Advisories, OSV entries, or CVE publication. | + +## Level 3 controls (17) + +| OSPS ID | Control (short) | Status | Scorecard probes/checks providing evidence | Gap / notes | +|---------|----------------|--------|---------------------------------------------|-------------| +| OSPS-AC-04.02 | Job-level least privilege in CI/CD | PARTIAL | `jobLevelPermissions` | Token-Permissions check evaluates job-level permissions. "Minimum necessary" is hard to assess without understanding job intent. | +| OSPS-BR-01.04 | Sanitize trusted collaborator CI/CD input | PARTIAL | `hasDangerousWorkflowScriptInjection` | Dangerous-Workflow check partially covers this, but focuses on untrusted input, not trusted collaborator input. | +| OSPS-BR-02.02 | Release assets tied to release identifier | GAP | None | No release asset naming/association probe. | +| OSPS-BR-07.02 | Secrets management policy | GAP | None | Documentation/policy control. Requires attestation. | +| OSPS-DO-03.01 | Instructions to verify release integrity/authenticity | GAP | None | Documentation control. Could partially automate by checking for verification docs alongside signed releases. | +| OSPS-DO-03.02 | Instructions to verify release author identity | GAP | None | Documentation control. | +| OSPS-DO-04.01 | Support scope/duration per release | GAP | None | Documentation control. | +| OSPS-DO-05.01 | EOL security update statement | GAP | None | Documentation control. | +| OSPS-GV-04.01 | Policy to review collaborators before escalated perms | GAP | None | Governance policy. Requires attestation. | +| OSPS-QA-02.02 | SBOM shipped with compiled release assets | PARTIAL | `hasReleaseSBOM`, `hasSBOM` | SBOM probes exist but may not specifically verify compiled-asset association. | +| OSPS-QA-04.02 | Subprojects enforce >= primary requirements | GAP | None | Requires multi-repo scanning and cross-repo comparison. | +| OSPS-QA-06.02 | Docs describe when/how tests run | GAP | None | Documentation control. | +| OSPS-QA-06.03 | Policy requiring tests for major changes | GAP | None | Documentation/policy control. | +| OSPS-QA-07.01 | Non-author approval before merging | PARTIAL | `codeApproved`, `codeReviewOneReviewers`, `requiresApproversForPullRequests` | Code-Review and Branch-Protection probes cover this. "Non-author" semantics need verification. | +| OSPS-VM-04.02 | VEX for non-affecting vulnerabilities | GAP | None | No VEX detection probe. | +| OSPS-VM-05.01 | SCA remediation threshold policy | GAP | None | Policy control. | +| OSPS-VM-05.02 | SCA violations addressed pre-release | GAP | None | Policy + enforcement control. | +| OSPS-VM-05.03 | Automated SCA eval + block violations | PARTIAL | `hasOSVVulnerabilities` | Vulnerabilities check detects known vulns. Does not verify gating/blocking enforcement. | +| OSPS-VM-06.01 | SAST remediation threshold policy | GAP | None | Policy control. | +| OSPS-VM-06.02 | Automated SAST eval + block violations | PARTIAL | `sastToolConfigured`, `sastToolRunsOnAllCommits` | SAST check detects tool presence and execution. Does not verify gating/blocking enforcement. | + +## Phase 1 priorities (Level 1 gap closure) + +The following Level 1 gaps should be addressed first, ordered by implementation feasibility: + +1. **OSPS-GV-03.01** (contribution process): Add probe for CONTRIBUTING file presence +2. **OSPS-GV-02.01** (public discussion): Add probe for issues/discussions enabled +3. **OSPS-DO-02.01** (defect reporting): Add probe for issue templates or bug report docs +4. **OSPS-DO-01.01** (user guides): Add probe for documentation presence heuristics +5. **OSPS-BR-07.01** (secrets in VCS): Consume GitHub secret scanning API or add detection heuristics +6. **OSPS-BR-03.01 / BR-03.02** (encrypted transport): Requires Security Insights ingestion for declared URIs +7. **OSPS-QA-04.01** (subproject listing): Requires Security Insights or equivalent metadata + +## Probes not mapped to any OSPS control + +The following probes exist in Scorecard but do not directly map to any OSPS Baseline control: + +| Probe | Check | Notes | +|-------|-------|-------| +| `archived` | Maintained | Project archival status — relates to "while active" preconditions | +| `hasRecentCommits` | Maintained | Activity signal — relates to "while active" preconditions | +| `issueActivityByProjectMember` | Maintained | Activity signal — relates to "while active" preconditions | +| `createdRecently` | Maintained | Age signal | +| `contributorsFromOrgOrCompany` | Contributors | Diversity signal | +| `dependencyUpdateToolConfigured` | Dependency-Update-Tool | Best practice, not a Baseline control | +| `fuzzed` | Fuzzing | Testing best practice, not a Baseline control | +| `hasOpenSSFBadge` | CII-Best-Practices | Meta-badge, not a Baseline control | +| `packagedWithAutomatedWorkflow` | Packaging | Distribution best practice | +| `webhooksUseSecrets` | Webhook | Security practice, not a Baseline control | +| `hasPermissiveLicense` | (uncategorized) | License type classification | +| `unsafeblock` | (independent) | Language-specific safety | +| `dismissesStaleReviews` | Branch-Protection | Review hygiene beyond Baseline scope | +| `requiresCodeOwnersReview` | Branch-Protection | CODEOWNERS enforcement beyond Baseline scope | +| `requiresLastPushApproval` | Branch-Protection | Push approval beyond Baseline scope | +| `requiresUpToDateBranches` | Branch-Protection | Branch freshness beyond Baseline scope | +| `branchProtectionAppliesToAdmins` | Branch-Protection | Admin override prevention beyond Baseline scope | +| `blocksForcePushOnBranches` | Branch-Protection | Force-push protection; related to AC-03 but not explicitly required | + +These probes remain valuable for Scorecard's existing scoring model and may become relevant for future Baseline versions. + +## Existing issues and PRs relevant to gap closure + +The following open issues and PRs in the Scorecard repository are directly +relevant to closing OSPS Baseline coverage gaps. These should be prioritized +and linked to the conformance work. + +### Security Insights ingestion +- [#2305](https://github.com/ossf/scorecard/issues/2305) — Support for SECURITY INSIGHTS +- [#2479](https://github.com/ossf/scorecard/issues/2479) — SECURITY-INSIGHTS.yml implementation + +These are critical for OSPS-BR-03.01, BR-03.02, QA-04.01, and other +controls that depend on declared project metadata. + +### Secrets detection (OSPS-BR-07.01) +- [#30](https://github.com/ossf/scorecard/issues/30) — New check: code is scanning for secrets + +Open since the project's earliest days. Phase 1 priority. + +### SBOM (OSPS-QA-02.02) +- [#1476](https://github.com/ossf/scorecard/issues/1476) — Feature: Detect if SBOMs generated +- [#2605](https://github.com/ossf/scorecard/issues/2605) — Add support for SBOM analyzing at Binary-Artifacts stage + +The SBOM check and probes (`hasSBOM`, `hasReleaseSBOM`) already exist but +may need enhancement for compiled release asset association. + +### Changelog / release notes (OSPS-BR-04.01) +- [#4824](https://github.com/ossf/scorecard/issues/4824) — Feature: New Check: Check if the project has and maintains a CHANGELOG + +Direct match for Phase 2 deliverable. + +### Private vulnerability reporting (OSPS-VM-03.01) +- [#2465](https://github.com/ossf/scorecard/issues/2465) — Factor whether or not private vulnerability reporting is enabled into the scorecard + +Direct match. GitHub's private vulnerability reporting API could provide +platform-level evidence. + +### Vulnerability disclosure improvements (OSPS-VM-01.01, VM-04.01) +- [#4192](https://github.com/ossf/scorecard/issues/4192) — Test for security policy in other places than SECURITY.md +- [#4789](https://github.com/ossf/scorecard/issues/4789) — Rethinking vulnerability check scoring logic +- [#1371](https://github.com/ossf/scorecard/issues/1371) — Feature: add check for vulnerability alerts + +### Signed releases and provenance (OSPS-BR-06.01) +- [#4823](https://github.com/ossf/scorecard/issues/4823) — Feature: pass Signed-Releases with GitHub immutable release process +- [#4080](https://github.com/ossf/scorecard/issues/4080) — Use GitHub attestations to check for signed releases +- [#2684](https://github.com/ossf/scorecard/issues/2684) — Rework: Signed-Releases: Separate score calculation of provenance and signatures +- [#1417](https://github.com/ossf/scorecard/issues/1417) — Feature: add support for keyless signed release + +### Threat model / security assessment (OSPS-SA-01.01, SA-03.01) +- [#2142](https://github.com/ossf/scorecard/issues/2142) — Feature: Assess presence and maintenance of a threat model + +### Release scoring (OSPS-BR-02.01, BR-02.02) +- [#1985](https://github.com/ossf/scorecard/issues/1985) — Feature: Scoring for individual releases + +### Minder integration +- [#4723](https://github.com/ossf/scorecard/pull/4723) — Initial draft of using Minder rules in Scorecard (CLOSED) + +Draft PR that attempted to run Minder Rego rules within Scorecard, +including OSPS-QA-05.01 and QA-03.01. Closed due to inactivity but +demonstrates interest in deeper Minder/Scorecard integration. + +## Notes + +- The OSPS Baseline v2026.02.19 contains 59 controls. Previous coverage + estimates against older Baseline versions should be treated as out-of-date. + This analysis supersedes any prior mapping. +- Controls marked NOT_OBSERVABLE cannot produce PASS or FAIL without elevated + permissions. The conformance engine must return UNKNOWN with an explanation. +- Many Level 2 and Level 3 controls are documentation or policy controls that + require attestation rather than automated detection. The attestation mechanism + (OQ-1 in the proposal) is critical for these. +- The "while active" precondition on many controls maps to Scorecard's Maintained + check probes (`archived`, `hasRecentCommits`, `issueActivityByProjectMember`). + These could serve as applicability detectors. diff --git a/openspec/changes/osps-baseline-conformance/decisions.md b/openspec/changes/osps-baseline-conformance/decisions.md new file mode 100644 index 00000000000..a37bfb2905a --- /dev/null +++ b/openspec/changes/osps-baseline-conformance/decisions.md @@ -0,0 +1,1017 @@ +# OSPS Baseline Conformance — Feedback and Decisions + +Companion document to [`proposal.md`](proposal.md). This document tracks +reviewer feedback, open questions, maintainer responses, and the decision +priority analysis. + +For the proposal itself (motivation, scope, phased delivery, ecosystem +positioning), see [`proposal.md`](proposal.md). + +For the control-by-control coverage analysis, see +[`docs/osps-baseline-coverage.md`](../../docs/osps-baseline-coverage.md). + +--- + +## Open questions from maintainer review + +The following questions were raised by Spencer (Steering Committee member) +during review of the roadmap and need to be resolved before or during +implementation. + +### OQ-1: Attestation mechanism identity + +> "The attestation/provenance layer. What is doing the attestation? Is this some OIDC? A personal token? A workflow (won't have the right tokens)?" +> — Spencer, on Section 5.1 + +**Stakeholders:** Spencer (raised this, flagged as blocking), Stephen, Steering Committee + +This is a fundamental design question. Options include: +- **Repo-local metadata files** (e.g., Security Insights, `.osps-attestations.yml`): simplest, no cryptographic identity, maintainer self-declares by committing the file. +- **Signed attestations via Sigstore/OIDC**: strongest guarantees, but requires workflow identity and the right tokens — which Spencer correctly notes may not be available in all contexts. +- **Platform-native signals**: e.g., GitHub's private vulnerability reporting enabled status, which the platform attests implicitly. + +**Recommendation to discuss**: Start with repo-local metadata files (unsigned) for the v1 attestation mechanism, with a defined extension point for signed attestations in a future iteration. This avoids blocking on the identity question while still making non-automatable controls reportable. + +### OQ-2: Scorecard's role in enforcement detection vs. enforcement + +> "I thought the other doc said Scorecard wasn't an enforcement tool?" +> — Spencer, on Q4 deliverables (enforcement detection) + +**Stakeholders:** Spencer (raised this), Stephen, Steering Committee + +This is a critical framing question. The roadmap proposes *detecting* whether enforcement exists (e.g., "are SAST results required to pass before merge?"), not *performing* enforcement. But the line between "detecting enforcement" and "being an enforcement tool" needs to be drawn clearly. + +**Recommendation to discuss**: Scorecard detects and reports whether enforcement mechanisms are in place. It does not itself enforce. This distinction should be documented explicitly. + +### OQ-3: `scan_scope` field in output schema + +> "Not sure I see the importance [of `scan_scope`]" +> — Spencer, on Section 9 (output schema) + +**Stakeholders:** Stephen (can resolve alone) + +The `scan_scope` field (repo|org|repos) in the proposed OSPS output schema may not carry meaningful information. If the output always describes a single repository's conformance, the scope is implicit. + +**Recommendation to discuss**: Drop `scan_scope` from the schema unless multi-repo aggregation (OSPS-QA-04.02) produces a fundamentally different output shape. Revisit when project-level aggregation is implemented. + +### OQ-4: Evidence model — probes only, not checks + +> "[Evidence] should be probe-based only, not check" +> — Spencer, on Section 9 (output schema) + +**Stakeholders:** Spencer (raised this), Stephen — effectively resolved (adopted) + +Spencer's position is that OSPS evidence references should point to probe findings, not check-level results. This aligns with the architectural direction of Scorecard v5 (probes as the measurement unit, checks as scoring aggregations). + +**Recommendation**: Adopt this. The `evidence` array in the OSPS output schema should reference probes and their findings only. Checks may be listed in a `derived_from` field for human context but are not evidence. + +--- + +## Maintainer review + +### Stephen's notes + + + +**Overall assessment:** + + +**Key concerns or risks:** + + +**Things I agree with:** + + +**Things I disagree with or want to change:** + +- "PVTR" is shorthand for "Privateer". Throughout this proposal it makes it appear as if https://github.com/ossf/pvtr-github-repo-scanner is separate from Privateer, when it is really THE Privateer plugin for GitHub repositories. Any references to PVTR should be corrected. +- This proposal does not contain an even consideration of the capabilities of [Darnit](https://github.com/kusari-oss/darnit) and [AMPEL](https://github.com/carabiner-dev/ampel). We should do that comparison to get a better idea of what should be in or out of scope for Scorecard. +- The timeline that is in this proposal is not accurate, as we're already about to enter Q2 2026. We should focus on phases and outcomes, and let maintainer bandwidth dictate delivery timing. +- Scorecard has an existing set of checks and probes, which is essentially a control catalog. We should make a plan to extract the Scorecard control catalog so that it can be used by other tools that can handle evaluation tasks. +- Use Mermaid when creating diagrams. +- We need to understand what level of coverage Scorecard currently has for OSPS Baseline and that analysis should be created in a separate file (in `docs/`). Assume that any existing findings are out-of-date. +- `docs/roadmap-ideas.md` will not be committed to the repo, as it is a rough draft which needs to be refined for public consumption. We should create `docs/ROADMAP.md` with a 2026 second-level heading with contains the publicly-consummable roadmap. + +**Priority ordering — what matters most to ship first:** + + +### Clarifying questions + +The following questions need input before this proposal can move to design. +Questions with Stephen's responses are answered; the rest are open. + +#### CQ-1: Scorecard as a conformance tool — product identity + +The proposal frames this as a "product-level shift" where Scorecard gains a second mode: conformance evaluation alongside its existing scoring. Does this framing match your vision, or do you see conformance as eventually *replacing* the scoring model? Should we be thinking about deprecating 0-10 scores long-term, or do both modes coexist indefinitely? + +**Stephen's response:** + +I believe the scoring model will continue to be useful to consumers and it should be maintained. For now, both modes should coexist. There is no need to make a decision about this for the current iteration of the proposal. + +**Update:** Check scores and conformance labels are *parallel evaluation layers* over the same probe evidence, not two competing "modes." Both can appear in the same output. The three-tier architecture model (evidence layer → evaluation layers → output formats) replaces the original "two modes" framing. OSPS conformance is *one goal*, not *the* goal — Scorecard's broader identity is as an open source security evidence engine. + +#### CQ-2: OSPS Baseline version targeting + +The roadmap previously targeted OSPS Baseline v2025-10-10. The Privateer GitHub plugin targets v2025-02-25. The Baseline is a living spec with periodic releases. How should Scorecard handle version drift? Options: +- Support only the latest version at any given time +- Support multiple versions concurrently via the versioned mapping file +- Pin to a version and update on a defined cadence (e.g., quarterly) + +**Stephen's response:** + +The current version of the OSPS Baseline is [v2026.02.19](https://baseline.openssf.org/versions/2026-02-19). + +We should align with the latest version at first and have a process for aligning with new versions on a defined cadence. We should understand the [OSPS Baseline maintenance process](https://baseline.openssf.org/maintenance.html) and align with it. + +The OSPS Baseline [FAQ](https://baseline.openssf.org/faq.html) and [Implementation Guidance for Maintainers](https://baseline.openssf.org/maintainers.html) may have guidance we should consider incorporating. + +#### CQ-3: Security Insights as a hard dependency + +Many OSPS controls depend on Security Insights data (official channels, distribution points, subproject inventory, core team). The Privateer GitHub plugin treats the Security Insights file as nearly required — most of its evaluation steps begin with `HasSecurityInsightsFile`. + +Should Scorecard: +- Treat Security Insights the same way (controls that need it go UNKNOWN without it)? +- Provide a degraded but still useful evaluation without it? +- Accept alternative metadata sources (e.g., `.project`, custom config)? + +This also raises a broader adoption question: most projects today don't have a `security-insights.yml`. How do we avoid making the OSPS output useless for the majority of repositories? + +**Stephen's response:** + +We should provide a degraded, but still-useful evaluation without a Security Insights file, especially since our probes today can already cover a lot of ground without it. It would be good for us to eventually support alternative metadata sources, but this should not be an immediate goal. + +**Update:** Reframed as a "metadata ingestion layer" that supports Security Insights as one source among several. SI is not privileged. Architecture invites contributions for alternative sources (SBOMs, VEX, platform APIs). No single metadata file is required for meaningful results, though metadata files may enrich results. + +#### CQ-4: PVTR relationship — complement vs. converge + +The proposal positions Scorecard as complementary to the Privateer plugin. But there's a deeper question: should this stay as two separate tools indefinitely, or is the long-term goal convergence (e.g., the Privateer plugin consuming Scorecard as a library, or Scorecard becoming a Privateer plugin itself)? Your position on this affects how tightly we couple the output formats and whether we invest in Gemara SDK integration. + +**Stephen's response:** + +Multiple tools should be able to consume Scorecard, so yes, we should invest in Gemara SDK integration. + +#### CQ-5: Scope of new probes in 2026 + +The roadmap calls for significant new probe development (secrets detection, governance/docs presence, dependency manifests, release asset inspection, enforcement detection). That's a lot of new surface area. Should we: +- Build all of these within Scorecard? +- Prioritize a subset and defer the rest? +- Look for ways to consume signals from external tools (e.g., GitHub's secret scanning API, SBOM tools) rather than building detection from scratch? + +If prioritizing, which new probes matter most to you? + +**Stephen's response:** + +We should prioritize OSPS Baseline Level 1 conformance work. +We should consider any signals that can be consumed from external sources. + +#### CQ-6: Community and governance process + +This is a major initiative touching Scorecard's product direction. What's the governance process for getting this approved? +- Does this need a formal proposal to the Scorecard maintainer group? +- Should this be presented at an ORBIT WG meeting? +- Do we need sign-off from the OpenSSF TAC? +- Who else beyond you and Spencer needs to weigh in? + +**Stephen's response:** + +We should have Stephen and Spencer sign off on this proposal as Steering Committee members. In addition, we should have reviews from: +- [blocking] At least 1 non-Steering Scorecard maintainer +- [non-blocking] Maintainers of tools in the WG ORBIT ecosystem + +This does not require review from the TAC, but we should inform WG ORBIT members. + +#### CQ-7: The "minimum viable conformance report" + +If we had to ship the smallest useful thing in Q1, what would it be? The roadmap proposes the full OSPS output format + mapping file + applicability engine. But a simpler starting point might be: +- Just the mapping file (documentation-only, no runtime) +- A `--format=osps` that only reports on controls Scorecard already covers (no new probes, lots of UNKNOWN) +- Something else? + +What would make Q2 a success in your eyes? + +**Stephen's response:** + +As previously mentioned, the quarterly targets are not currently accurate. One of our Q2 outcomes should be OSPS Baseline Level 1 conformance. + +#### CQ-8: Existing Scorecard Action and API impact + +Scorecard runs at scale via the Scorecard Action (GitHub Action) and the public API (api.scorecard.dev). Should OSPS conformance be available through these surfaces from day one, or should it start as a CLI-only feature? The API and Action have their own release and stability considerations. + +**Stephen's response:** + +We need to land these capabilities for as much surface area as possible. + +#### CQ-9: Coverage analysis and Phase 1 scope validation + +**Stakeholders:** Stephen (can answer alone) + +The coverage analysis (`docs/osps-baseline-coverage.md`) identifies 25 Level 1 controls. Of those, 6 are COVERED, 8 are PARTIAL, 9 are GAP, and 2 are NOT_OBSERVABLE. The Phase 1 plan targets closing the 9 GAP controls. Given that 2 controls (AC-01.01, AC-02.01) are NOT_OBSERVABLE without org-admin tokens, should Phase 1 explicitly include work on improving observability (e.g., documenting what tokens are needed, or providing guidance for org admins), or should those controls remain UNKNOWN until a later phase? + +**Stephen's response:** + + +#### CQ-10: Mapping file ownership and contribution model + +**Stakeholders:** Stephen, Eddie Knight, Baseline maintainers — partially superseded by CQ-17 + +The versioned mapping file (e.g., `pkg/osps/mappings/v2026-02-19.yaml`) is a critical artifact that defines which probes satisfy which OSPS controls. Who should own this file? Options: +- Scorecard maintainers only (changes require maintainer review) +- Community-contributed with maintainer approval (like checks/probes today) +- Co-maintained with ORBIT WG members who understand the Baseline controls + +This also affects how we handle disagreements about whether a probe truly satisfies a control. + +**Stephen's response:** + + +#### CQ-11: Backwards compatibility of OSPS output format + +**Stakeholders:** Stephen, Spencer, Eddie Knight — depends on CQ-18 (output format decision) + +The spec requires `--format=osps` as a new output format. Since this is a new surface, we have freedom to iterate on the schema. However, once shipped, consumers will depend on it. What stability guarantees should we offer? +- No guarantees during Phase 1 (alpha schema, may break between releases) +- Semver-like schema versioning from day one (breaking changes increment major version) +- Follow the Gemara L4 schema if one exists, inheriting its stability model + +**Stephen's response:** + + +#### CQ-12: Probe gap prioritization for Phase 1 + +**Stakeholders:** Stephen (can answer alone) + +The coverage analysis identifies 7 Level 1 GAP controls that need new probes (excluding the 2 that depend on Security Insights). Ranked by implementation feasibility: + +1. OSPS-GV-03.01 — CONTRIBUTING file presence +2. OSPS-GV-02.01 — Issues/discussions enabled +3. OSPS-DO-02.01 — Issue templates or bug report docs +4. OSPS-DO-01.01 — Documentation presence heuristics +5. OSPS-BR-07.01 — Secrets detection (platform signal consumption) +6. OSPS-BR-03.01 / BR-03.02 — Encrypted transport (requires Security Insights) +7. OSPS-QA-04.01 — Subproject listing (requires Security Insights) + +Do you agree with this priority ordering? Are there any controls you would move up or down, or any you would defer to Phase 2? + +**Stephen's response:** + + +#### CQ-13: Minder and AMPEL integration surfaces + +**Stakeholders:** Stephen, Minder maintainers, Adolfo García Veytia (AMPEL), Steering Committee + +Two tools already consume Scorecard data for policy enforcement: + +**[Minder](https://github.com/mindersec/minder)** (OpenSSF Sandbox, ORBIT WG) consumes Scorecard findings to enforce security policies and auto-remediate across repositories. Uses Rego-based rules and can enforce OSPS Baseline controls via policy profiles. A draft Scorecard PR (#4723, now stale) attempted deeper integration. + +**[AMPEL](https://github.com/carabiner-dev/ampel)** (production v1.0.0) validates Scorecard attestations against policies in CI/CD pipelines. Already maintains [5 Scorecard-consuming policies](https://github.com/carabiner-dev/policies/tree/main/scorecard) and [36 OSPS Baseline policy mappings](https://github.com/carabiner-dev/policies/tree/main/groups/osps-baseline). Uses CEL expressions and in-toto attestations. + +Questions: +- Should the OSPS conformance output be designed with Minder and AMPEL as explicit consumers (e.g., ensuring the output works as Minder policy input and as AMPEL attestation input)? +- Should we coordinate with both Minder maintainers and Adolfo during Phase 1 to validate the integration surface? +- Is there a risk of duplicating Baseline evaluation work that Minder or AMPEL already do via their own rules, and if so, how should we delineate? + +**Stephen's response:** + +All downstream tools — Privateer, AMPEL, Minder, Darnit, and others — are equal consumers of Scorecard's output formats. The output formats should serve different tool types equally (policy, remediation, dashboarding). + +Scorecard SHOULD NOT (per [RFC 2119](https://www.rfc-editor.org/rfc/rfc2119)) duplicate evaluation that downstream tools handle, but this is not a MUST NOT. There could be scenarios where overlapping evaluation makes sense (e.g., Scorecard brings deeper analysis or different evidence sources). + +Coordinate with downstream tool maintainers during Phase 1 to validate that output formats are consumable. + + +#### CQ-14: Darnit vs. Minder delineation + +**Stakeholders:** Stephen (can answer alone) + +The proposal lists both [Darnit](https://github.com/kusari-oss/darnit) and [Minder](https://github.com/mindersec/minder) as tools that handle remediation and enforcement. Their capabilities overlap in some areas (both can enforce Baseline controls, both can remediate). For Scorecard's purposes, the distinction matters primarily for the "What Scorecard must not do" boundary. + +Is the current framing correct — that Scorecard is the measurement layer and both Minder and Darnit are downstream consumers? Or should we position Scorecard differently relative to one versus the other, given that Minder is an OpenSSF project in the same working group while Darnit is not? + +**Stephen's response:** + + +#### CQ-15: Existing issues as Phase 1 work items + +**Stakeholders:** Stephen (can answer alone) + +The coverage analysis (`docs/osps-baseline-coverage.md`) now includes a section mapping existing Scorecard issues to OSPS Baseline gaps. Several long-standing issues align directly with Phase 1 priorities: + +- [#30](https://github.com/ossf/scorecard/issues/30) — Secrets scanning (OSPS-BR-07.01), open since the project's earliest days +- [#2305](https://github.com/ossf/scorecard/issues/2305) / [#2479](https://github.com/ossf/scorecard/issues/2479) — Security Insights ingestion +- [#2465](https://github.com/ossf/scorecard/issues/2465) — Private vulnerability reporting (OSPS-VM-03.01) +- [#4824](https://github.com/ossf/scorecard/issues/4824) — Changelog check (OSPS-BR-04.01) +- [#4723](https://github.com/ossf/scorecard/pull/4723) — Minder/Rego integration draft (closed) + +Should we adopt these existing issues as the starting work items for Phase 1, or create new issues that reference them? Some of these issues have significant discussion history that may contain design decisions worth preserving. + +**Stephen's response:** + + +--- + +## Scorecard Maintainer Feedback: Spencer Schrock + +The following feedback was provided by Spencer Schrock (Scorecard Steering Committee member and maintainer) on [PR #4952](https://github.com/ossf/scorecard/pull/4952). + +### SS-1: Conformance layer definition (ROADMAP.md:19) + +**Comment**: "to be clear, in this situation 'evaluation' or 'conformance' layer, just means output format?" + +**Response**: The conformance layer includes **both** evaluation logic (probe→control mapping, status determination, applicability detection) and output formatting (enriched JSON, in-toto, Gemara, OSCAL). It composes probe findings into control verdicts, just as checks compose them into 0-10 scores—same probes, different evaluation surfaces, single run. + +**Action**: Added clarification to proposal.md where conformance layer is first introduced. + +**Status**: RESOLVED + +--- + +### SS-2: Mapping layers clarification (ROADMAP.md:51) + +**Comment**: "What's the value in upstreaming check-level relations? I think mapping probes to baseline controls is fine." + +**Response**: Checks today are effectively a framework in their own right (probe compositions). The unified framework abstraction means both checks and OSPS Baseline use the same internal representation — probe compositions mapped to framework controls. No "two layers" or "upstreaming" needed; it's all internal to Scorecard. + +**Action**: Updated proposal.md and ROADMAP.md to describe unified framework abstraction instead of two-layer mapping model. + +**Status**: RESOLVED + +--- + +### SS-3: Catalog extraction target (ROADMAP.md:61) + +**Comment**: "extracting to where?" + +**Response**: The catalog extraction means extracting Scorecard checks into an in-project control framework representation that uses the same unified framework abstraction as OSPS Baseline. This enables checks and OSPS controls to be treated uniformly within the evaluation layer. Not publishing external artifacts. + +**Action**: Clarified catalog extraction description in Phase 1 deliverables. + +**Status**: RESOLVED + +--- + +### SS-4: In-toto predicate compatibility (ROADMAP.md:74) + +**Comment**: "will this change the intoto format we offer now. which is wrapped around our check-based JSON output?" + +**Response**: No. The existing in-toto statement with predicate type `scorecard.dev/result/v0.1` (wrapping check-based JSON output) is **preserved unchanged**. The new evidence predicate (`scorecard.dev/evidence/v0.1`) is **additive**, not a replacement. Both predicates coexist; users choose via CLI flags. + +**Action**: Added explicit note about predicate preservation to output formats section. + +**Status**: RESOLVED + +--- + +### SS-5: Attestation scope and evidence engine (ROADMAP.md:136) + +**Comment**: "As an evidence engine, do we even need to attest to this data? Or is this for data produced by the cron or the action?" + +**Response**: Phase 1 focuses on automatically verifiable controls only. Discussion and design of attestation mechanisms (both inbound for non-automatable controls and outbound for signing Scorecard's own output) is deferred beyond Phase 1. When attestation is designed, it would apply to cron/action output. + +**Action**: Moved attestation deliverable from Phase 2 to Phase 3/TBD. Added note to Phase 1 scope. + +**Status**: DEFERRED + +--- + +### SS-6: Enforcement drift concern (osps-baseline-coverage.md:207) + +**Comment**: "Is this going to drift into policy/enforcement? Does this conflict with our goal of evidence only?" + +**Response**: No drift into enforcement. Scorecard detects signals of enforcement (e.g., "SCA tool is configured") but does not enforce policies. The boundary is: Scorecard observes and reports; downstream tools (Minder, AMPEL, Allstar) enforce. + +**Action**: Reinforced enforcement boundary in Phase 3 description and design principles. + +**Status**: RESOLVED + +--- + +### SS-7: Elevated access observability (osps-baseline-coverage.md:36) + +**Comment**: "I'd say this could be observable if it just needs the right token. If this is run in the context of an OSPO self-observation I think it's fine." + +**Response**: Controls requiring elevated access (org admin tokens, GitHub Apps) are marked as **observable** with access requirements noted. When elevated access is unavailable, these controls return status `UNKNOWN` with reason "Requires elevated repository access." This supports OSPO self-assessment scenarios. + +**Action**: Updated coverage analysis guidance to mark elevated-access controls as observable with notation. + +**Status**: RESOLVED + +--- + +### SS-8: Cron deployment costs (decisions.md:203) + +**Comment**: "I would say the cron has additional barriers, cost of writing/serving more data. I have no concerns with the action" + +**Response**: Phase 1 conformance evaluation includes CLI and GitHub Action, but defers cron to Phase 2+ due to BigQuery storage/serving costs for conformance data across 1M+ repos. Action users manage their own storage (no cost to Scorecard infrastructure). + +**Action**: Added explicit cron deferral note to Phase 1 scope. + +**Status**: RESOLVED + +--- + +### SS-9: Metadata signing (decisions.md:30) + +**Comment**: "the metadata could still be signed by a maintainer, just involves some manual effort on their part" + +**Response**: Noted as part of the broader attestation design discussion, deferred to post-Phase 1. + +**Action**: No immediate action; attestation design deferred. + +**Status**: NOTED + +--- + +### SS-10: Mapping file repository location (decisions.md:218) + +**Comment**: "this mapping file lives in which repo?" + +**Response**: Probe-to-control mappings for OSPS Baseline will live in the Scorecard repository, as part of the unified framework abstraction. Checks and OSPS Baseline both use internal probe composition definitions. + +**Action**: Clarified in unified framework abstraction description. + +**Status**: RESOLVED + +--- + +### SS-11: Design principles endorsement (proposal.md:205) + +**Comment**: "I agree with these principles" + +**Response**: Noted. No action needed. + +**Status**: ACKNOWLEDGED + +--- + +### SS-12: AGENTS.md relevance (AGENTS.md:3) + +**Comment**: "This seems unrelated to this change. Other than I assume you using it to generate some of these docs?" + +**Response**: AGENTS.md provides AI collaboration guidelines for the proposal development process. Now removed from git history and .gitignored per Steering Committee discussion. + +**Action**: AGENTS.md removed from git history in rebase, .gitignored. + +**Status**: RESOLVED + +--- + +## Scorecard Maintainer Feedback: Adam Korczynski + +The following feedback was provided by Adam Korczynski (Scorecard maintainer) on [PR #4952](https://github.com/ossf/scorecard/pull/4952). + +### AK-1: MVVSR acronym expansion (proposal.md:6) + +**Comment**: "Suggestion: Expand MVVSR here." + +**Response**: MVVSR = Mission, Vision, Values, Strategy, and Roadmap. Expanded on first use for clarity. Reference OpenSSF's MVVSR at https://openssf.org/about/ for potential alignment. + +**Action**: Changed to "Mission, Vision, Values, Strategy, and Roadmap (MVVSR) to be developed as a follow-up deliverable..." + +**Status**: RESOLVED + +--- + +### AK-2: Rationale for embedded conformance (proposal.md:16) + +**Comment**: "A bit of a general question: What are the reasons of adding framework conformance to Scorecard itself instead of having a standalone tool to which we can feed Scorecard findings and where the standalone tool then gives a verdict about framework conformance?" + +**Response**: Scorecard already performs the core evaluation work needed for framework conformance: probe execution, evidence collection, and probe composition. The conformance layer builds on Scorecard's existing architecture rather than duplicating capabilities in a separate tool. Scorecard did a lot of what we needed for evaluation, so there's no need for a new tool. + +**Action**: Added "Why framework conformance in Scorecard?" section to proposal explaining architectural rationale. + +**Status**: RESOLVED + +--- + +### AK-3: Downstream tools definition (proposal.md:106) + +**Comment**: "Would be good with a definition of downstream tools here." + +**Response**: Added definition: "Downstream tools are tools that consume Scorecard's output to make policy decisions, enforce requirements, or aggregate security posture." With examples: AMPEL, Minder, Privateer, Darnit, LFX Insights, Allstar. + +**Action**: Added definition where "downstream tools" first appears. + +**Status**: RESOLVED + +--- + +### AK-4: Processing model vs Three-tier model (proposal.md:149) + +**Comment**: "Is this ('Processing model') the current dataflow and the following section 'Three-tier evaluation model' the intended?" + +**Response**: Both are complementary views of the same architecture. Processing model (Ingest → Analyze → Evaluate → Deliver) is the temporal data flow view. Three-tier model (Evidence → Evaluation → Presentation) is the structural layers view. Neither is "current" vs "intended"—they describe different aspects of the v6 architecture. + +**Action**: Added connecting sentence explaining relationship between the two models. + +**Status**: RESOLVED + +--- + +### AK-5: Architectural constraints framing (proposal.md:189) + +**Comment**: "Not sure if this is entirely correct. Currently, I wouldn't say that Scorecard can produce conformance results, but perhaps I am understanding the context of 'constraints' incorrectly here; Are these current constraints or are they constraints that should exist with the conformance layer in Scorecard?" + +**Response**: These are the architectural target state we're building toward, not current state constraints. The section heading "Architectural constraints" was misleading. + +**Action**: Renamed section to "Architectural target state" or "Architectural principles" to clarify these describe the v6 design, not current limitations. + +**Status**: RESOLVED + +--- + +### AK-6: Option A reference (proposal.md:201) + +**Comment**: "Where is Option A?" + +**Response**: "Option A" exists only in decisions.md (architecture options discussion). Referencing it in proposal.md creates confusion. + +**Action**: Removed "Option A" mention from proposal.md, inlined the actual description of what we're accomplishing (unified framework abstraction). + +**Status**: RESOLVED + +--- + +### AK-7: ORBIT WG context (ROADMAP.md:91) + +**Comment**: "Why does Scorecard need to operate within the ORBIT WG ecosystem? Perhaps add a bit about what the ORBIT WG ecosystem is - that may clear it up." + +**Response**: Scorecard is **not** part of ORBIT WG. Ecosystem interoperability with ORBIT tools is an overarching OpenSSF goal, and Scorecard interoperates through published output formats. Added one-sentence clarification. + +**Action**: Added explanation after first ORBIT mention in both proposal.md and ROADMAP.md. + +**Status**: RESOLVED + +--- + +### AK-8: Success criteria ambiguity (proposal.md:393) + +**Comment**: "Would be nice to make this more explicit: What is the success criteria for here? The proposal or the implementation?" + +**Response**: These are **proposal acceptance criteria**. The proposal is accepted when Phase 1 implementation successfully delivers the described outcomes: Level 1 conformance reports, validated output formats, downstream consumer validation, open questions resolved, and no breaking changes to existing functionality. + +**Action**: Add clarifying note to success criteria section: "The following criteria define proposal acceptance (successful Phase 1 implementation):" + +**Status**: RESOLVED + +--- + +## ORBIT WG feedback + +### Eddie Knight's feedback (ORBIT WG TSC Chair) + +The following feedback was provided by Eddie Knight (ORBIT WG Technical Steering Committee Chair, maintainer of Gemara, Privateer, and OSPS Baseline) on [PR #4952](https://github.com/ossf/scorecard/pull/4952). + +#### EK-1: Mapping file location + +> "Regarding mappings between Baseline catalog<->Scorecard checks, it is possible to easily put that into a new file with Scorecard maintainers as codeowners, pending approval from OSPS Baseline maintainers for the change." + +Eddie is offering to host the Baseline-to-Scorecard mapping in the OSPS Baseline repository (or a shared location) with Scorecard maintainers as CODEOWNERS. The current proposal places the mapping in the Scorecard repo (`pkg/osps/mappings/v2026-02-19.yaml`). + +Mappings currently exist within the Baseline Catalog and are proposed for addition to the Scorecard repository as well. The mappings could be maintained in one or both of the projects. This affects ownership, versioning cadence, and who can update the mapping when controls or probes change. + +The trade-offs: + +- **In Scorecard repo**: Scorecard maintainers fully own the mapping. Mapping updates are coupled to Scorecard releases. Other tools cannot easily consume the mapping. +- **In Baseline repo (or shared)**: Mapping is co-owned. Versioned alongside the Baseline spec. End users and other tools (Privateer, Darnit, Minder) can consume the same mapping. Scorecard maintainers retain CODEOWNERS authority. + +#### EK-2: Output format — no "OSPS output format" + +> "There is not an 'OSPS output format,' and even the relevant Gemara schemas (which are quite opinionated) are still designed to support output in multiple output formats within the SDK, such as SARIF. I would expect that you'd keep your current output logic, and then _maybe_ add basic Gemara json/yaml as another option." + +The current proposal defines `--format=osps` as a new output format. Eddie clarifies that the ORBIT ecosystem does not define a special "OSPS output format" — instead, the Gemara SDK supports multiple output formats (including SARIF). The suggestion is to keep Scorecard's existing output logic and optionally add Gemara JSON/YAML as another format option. + +This is a significant clarification that affects the output requirements, the Phase 1 deliverables, and how we frame the conformance layer. + +#### EK-3: Technical relationship with Privateer plugin + +> "There is a stated goal of not duplicating the code from the plugin ossf/pvtr-github-repo-scanner, but the implementation plan as it's currently written does require duplication. In the current proposal, there would not be a technical relationship between the two codebases." + +Eddie identifies a contradiction: the proposal says "do not duplicate Privateer" but proposes building a parallel conformance engine with no code-level relationship to the Privateer plugin. The current plan would result in two separate codebases evaluating the same OSPS controls independently. + +#### EK-4: Catalog extraction needs an implementation plan + +> "There is cursory mention of a scorecard _catalog extraction_, which I'm hugely in favor of, but I don't see an implementation plan for that." + +The proposal mentions "Scorecard control catalog extraction plan" as a Phase 1 deliverable but does not specify what this means concretely or how it would be achieved. + +#### EK-5: Alternative architecture — shared plugin model + +> "An alternative plan would be for us to spend a week consolidating checks/probes into the pvtr plugin (with relevant CODEOWNERS), then update Scorecard to selectively execute the plugin under the covers." + +Eddie proposes a fundamentally different architecture: + +1. Consolidate Scorecard checks/probes into the [Privateer plugin](https://github.com/ossf/pvtr-github-repo-scanner) as shared evaluation logic +2. Scorecard executes the plugin under the covers for Baseline evaluation and then Scorecard handles follow-up logic such as scoring and storing the results +3. Privateer and LFX Insights can optionally run Scorecard checks via the same plugin + +**Claimed benefits:** +- Extract the Scorecard control catalog for independent versioning and cross-catalog mapping to Baseline +- Instantly integrate Gemara into Scorecard +- Allow bidirectional check execution (Scorecard runs Privateer checks; Privateer runs Scorecard checks) +- Simplify contribution overhead for individual checks +- Improve both codebases through shared logic + +**This is the central architectural decision for the proposal.** The Steering Committee needs to evaluate this against the current plan (Scorecard builds its own conformance engine). + +### Adolfo García Veytia's feedback (AMPEL maintainer) + +The following feedback was provided by Adolfo García Veytia (@puerco, maintainer of [AMPEL](https://github.com/carabiner-dev/ampel)) on [PR #4952](https://github.com/ossf/scorecard/pull/4952). + +#### AP-1: Mapping file registry — single source preferred + +> "It's great that you also see the need for machine-readable data. This would help projects like AMPEL write policies that enforce the baseline controls based on the results from Scorecard and other analysis tools." +> +> "Initially, we were trying to build the mappings into baseline itself. I still think it's the way to go as it would be better to have a single registry and data format of those mappings (in this case baseline's). Unfortunately, the way baseline considers its mappings [was demoted](https://github.com/ossf/security-baseline/pull/476) so we don't have that registry anymore." + +Adolfo strongly supports machine-readable mapping data and prefers a single registry in the Baseline itself, though the Baseline's own mapping support was recently demoted (PR #476 in security-baseline). This aligns with Eddie's offer (EK-1) to host mappings in the Baseline repo, but adds the context that there is no longer an official registry for tool-to-control mappings. + +AMPEL already maintains its own [Scorecard-to-Baseline policy mappings](https://github.com/carabiner-dev/policies/tree/main/groups/osps-baseline) (36 OSPS control policies, 5 of which directly consume Scorecard probe results). An official upstream mapping from Scorecard would benefit the entire ecosystem. + +#### AP-2: Output format — use in-toto predicates, not a custom format + +> "As others have mentioned, there is no _OSPS output format_ but there are two formal/in process of formalizing in-toto predicate types that are useful for this: +> +> **[Simple Verification Results](https://github.com/in-toto/attestation/blob/main/spec/predicates/svr.md)** — a simple predicate that communicates just the verified control labels along with the tool that performed the evaluation. It is a generalization of the VSA for non-SLSA controls. +> +> **[The "Baseline" Predicate](https://github.com/in-toto/attestation/pull/502)** — Still not merged, this predicate type was proposed by some of the baseline maintainers to capture an interoperability format more in line with the requirements in this spec, including manual assessments (what is named in this PR as 'ATTESTED')." + +Adolfo identifies two concrete in-toto predicate types that Scorecard should consider for output instead of inventing a custom format: + +1. **Simple Verification Results (SVR)**: Already merged in the in-toto attestation spec. Communicates verified control labels and the evaluating tool. Generalizes SLSA VSA to non-SLSA controls. +2. **Baseline Predicate**: Proposed by Baseline maintainers (PR #502, not yet merged). Designed for interoperability and includes support for manual assessments (ATTESTED status). + +This is the most concrete guidance on output format so far and directly informs CQ-18. + +#### AP-3: Attestation question conflates identity and tooling + +> "The question here is conflating two domains. One question is _who_ signs the attestation, and how can those identities be trusted (identity). The other is _what_ (tool) generates the attestations, and more importantly, from scorecard's perspective, when. This hints at a policy implementation and the answers will most likely differ for projects and controls. Happy to chat about this one day." + +Adolfo clarifies that OQ-1 (attestation mechanism identity) is actually two separate questions: +1. **Identity**: Who signs the attestation, and how are those identities trusted? (Sigstore/OIDC, personal keys, etc.) +2. **Tooling**: What tool generates the attestation, and when? (Scorecard during scan, CI pipeline, manual process) + +The answers will differ per project and per control. This decomposition should inform how OQ-1 is resolved. + +#### AP-4: AMPEL already consumes Scorecard data for Baseline enforcement + +> "I agree with this role statement. Just as minder, ampel also can enforce Scorecard's data ([see an example here](https://github.com/carabiner-dev/policies/blob/ab1eb42ef179c7a0016d6b7ed72991774a48f151/scorecard/sast.json#L4)) and we also [maintain a mapping of some of scorecard's probes vs baseline controls](https://github.com/carabiner-dev/policies/blob/ab1eb42ef179c7a0016d6b7ed72991774a48f151/groups/osps-baseline/osps-vm-06.hjson#L5) that would greatly benefit from an official/upstream map. +> +> The probes can enrich the baseline ecosystem substantially and having the data accessible from other tools encourages other projects in the ecosystem to help maintain and improve them." + +AMPEL is an active consumer of Scorecard data today: +- 5 production policies directly evaluate Scorecard probe results (SAST, binary artifacts, code review, dangerous workflows, token permissions) +- 36 OSPS Baseline policy mappings, several of which reference Scorecard checks +- An official upstream Scorecard-to-Baseline mapping would directly benefit AMPEL's policy library + +This validates the proposal's direction of making Scorecard's probe results and control mappings available to the broader ecosystem. + +#### AP-5: Probe composition supports framework-agnostic evaluation + +> "This point is key. Baseline looks for outcomes. Compliance can be supported by Scorecard probe data. +> +> The baseline control can be a 1:1 map to a probe's data, other times it will be a composite set of probes. If you add new probes to look for something new that's useful to test a baseline control, we just need to add another composition definition to say _OSPS-XX-XXX can be [probe X] or [probe set 1] or [probe set 2]_. +> +> This is akin to the way checks work now, but by generalizing it, the probe data can inform other framework testing tools, beyond baseline." +> — on proposal.md, "What Scorecard SHOULD NOT do" section + +Adolfo validates the probe composition model and identifies the key generalization: the same pattern used by existing checks (`probes/entries.go`) can be extended to OSPS Baseline and other frameworks. A control maps to one or more probe compositions, and new detection paths can be added without changing the composition structure. + +**Stephen's response:** Agreed. "Probe sets" or "compositions" is the right vocabulary, without introducing additional layers of complexity. The existing check composition pattern in `probes/entries.go` is the model. + +#### AP-6: Conformance engine should be framework-agnostic + +> "I'm assuming _conformance_ here means 'framework compliance'. +> +> This is cool, but also ensure that Scorecard's view of the world can be used at the check and probe level to enable projects and organizations to evaluate adherence to other frameworks. Especially useful for internal/unpublished variants (profiles) of frameworks that organizations define." +> — on proposal.md, architectural constraints section + +Adolfo requests that the conformance engine not be hard-wired to OSPS Baseline. Organizations may want to evaluate against internal or unpublished framework variants (profiles). + +**Stephen's response:** Agreed — the conformance engine should support arbitrary frameworks and organizational profiles. The probe findings are framework-agnostic by design; OSPS Baseline is the first (non-"checks") evaluation layer over them. Made explicit in the proposal. + +#### AP-7: Predicate types for check and probe evaluations + +> "The current predicate type is the full scorecard run evaluation. For completeness' sake, it would be nice to have one type for a list of check evaluations and one for probe evaluations. +> +> These are only useful, though, if they have more data than what an SVR has to offer, so I would wait until there is an actual need for them." +> — on proposal.md, output formats section + +Adolfo suggests dedicated in-toto predicate types for check-level and probe-level results, but self-qualifies that they should wait for concrete need beyond SVR. + +**Stephen's response:** Agreed. Probe-level findings are available via `--format=probe` but have no in-toto wrapper today. Worth adding when there's concrete need. This may suggest a `--framework` or `--evaluation` CLI option to select evaluation layers and determine output shape. Added as a future design concept. + +#### AP-8: Scorecard as consumer of control catalogs + +> "From reading the proposal, wouldn't Scorecard rather become a _consumer_ of control catalogs?" +> — on proposal.md, Scorecard control catalog extraction plan + +Adolfo challenges the "catalog extraction" framing, suggesting Scorecard should position itself as a consumer of external control catalogs rather than a publisher of its own. + +**Stephen's response:** Both directions — Scorecard *consumes* the OSPS Baseline catalog (via security-baseline) for conformance evaluation, and Scorecard's own probe definitions (`probes/*/def.yml`) are already machine-readable YAML with structured metadata. The "extraction plan" is about packaging those existing definitions for consumption so that external tools like AMPEL can discover what Scorecard evaluates and compose mappings against it. Clarified in the proposal. + +### Mike Lieberman's feedback + +The following feedback was provided by Mike Lieberman (@mlieberman85) on [PR #4952](https://github.com/ossf/scorecard/pull/4952). + +#### ML-1: No "OSPS output format" exists + +> "What is OSPS output format?" +> — on ROADMAP.md, Phase 1 deliverable + +Mike echoes Eddie's (EK-2) and Adolfo's (AP-2) point: there is no defined "OSPS output format." This is the third reviewer to flag this, confirming it needs to be reframed. The output format question (CQ-18) now has concrete alternatives: Gemara SDK formats (Eddie), in-toto SVR/Baseline predicates (Adolfo), or extending existing Scorecard formats. + +--- + +## Clarifying questions from ORBIT WG feedback + +The following clarifying questions require Steering Committee decisions +informed by Eddie's, Adolfo's, and Mike's feedback. + +#### CQ-16: Allstar's role in OSPS conformance enforcement + +**Stakeholders:** Stephen (can answer alone) + +[Allstar](https://github.com/ossf/allstar) is a Scorecard sub-project that continuously monitors GitHub organizations and enforces Scorecard check results as policies (branch protection, binary artifacts, security policy, dangerous workflows). It already enforces a subset of controls aligned with OSPS Baseline. + +With OSPS conformance output, Allstar could potentially enforce Baseline conformance at the organization level — e.g., opening issues or auto-remediating when a repository falls below Level 1 conformance. Should the proposal explicitly include Allstar as a Phase 1 consumer of OSPS output, or should that be deferred? And more broadly, should Allstar be considered part of the "enforcement" boundary that Scorecard itself does not cross, even though it is a Scorecard sub-project? + +**Stephen's response:** + + +#### CQ-17: Mapping file location — Scorecard repo or shared? + +**Stakeholders:** Stephen, Eddie Knight, OSPS Baseline maintainers + +Eddie offers to host the Baseline-to-Scorecard mapping in the Baseline repository with Scorecard maintainers as CODEOWNERS (EK-1). The current proposal places it in the Scorecard repo. + +Options: +1. **Scorecard repo** (`pkg/osps/mappings/`): Scorecard owns the mapping entirely. Mapping is coupled to Scorecard releases and probe changes. +2. **Baseline repo** (or shared location): Co-owned with ORBIT WG. Other tools can consume the same mapping. Scorecard maintainers retain CODEOWNERS authority over their portion. +3. **Both**: Scorecard maintains a local mapping for runtime use; a shared mapping in the Baseline repo serves as the cross-tool reference. Keep them in sync. + +Which approach do you prefer? + +_Note that this question is negated if consolidating check logic within `pvtr-github-repo-scanner`, because then the mappings are managed within the control catalog in Gemara format._ + +**Stephen's response:** + +**Decision: Option 3 (both) — two-layer mapping model.** + +- *Upstream* ([security-baseline](https://github.com/ossf/security-baseline) repo): Check-level relations — "OSPS-AC-03 relates to Scorecard's Branch-Protection check." Scorecard maintainers contribute via PR. The Baseline repo already has `guideline-mappings` referencing Scorecard in 9 controls (mapping to 7 checks). Scorecard can PR the missing ones. +- *Internal* (Scorecard repo): Probe-level mappings — "OSPS-AC-03.01 is evaluated by probes X + Y with logic Z." These depend on probe implementation details and must live in Scorecard. + +**Language nuance** (per [security-baseline PR #476](https://github.com/ossf/security-baseline/pull/476)): Mappings were renamed to "relations" to guard against legal issues. Use "informs" / "provides evidence toward" rather than "satisfies" / "demonstrates compliance with." + +Taking a dependency on `github.com/ossf/security-baseline` is acceptable — it is a shared OpenSSF project with useful connectors. + +**Go module concern:** go.mod lives in cmd/ but module path is repo root. Import from cmd/pkg/ is unusual. Called out as potential concern, not blocking. + +#### CQ-18: Output format — `--format=osps` vs. ecosystem formats + +**Stakeholders:** Stephen, Spencer (OQ-4 constrains this), Eddie Knight, Adolfo García Veytia + +Three reviewers (Eddie, Adolfo, Mike) independently flagged that no "OSPS output format" exists. Eddie suggests Gemara SDK formats (EK-2). Adolfo identifies two concrete in-toto predicate types (AP-2): the [Simple Verification Results (SVR)](https://github.com/in-toto/attestation/blob/main/spec/predicates/svr.md) predicate (merged) and the [Baseline Predicate](https://github.com/in-toto/attestation/pull/502) (proposed, not yet merged). + +Options: +1. **Keep `--format=osps`**: Define a Scorecard-specific conformance output format. Risk: inventing a format that three reviewers have said doesn't belong. +2. **Use `--format=gemara`** (or similar): Integrate the Gemara SDK and output Gemara assessment results in JSON/YAML. Aligns with ORBIT ecosystem, creates a Gemara SDK dependency. +3. **Use in-toto predicates**: Output conformance results as in-toto attestations using SVR or the Baseline predicate. Aligns with in-toto ecosystem and Adolfo's guidance. The Baseline predicate is not yet merged. +4. **Extend existing formats**: Add conformance data to `--format=json` and `--format=sarif` outputs. No new format flag needed. +5. **Combination**: Use Gemara SDK for structured output + in-toto predicates for attestation output. These are not mutually exclusive. + +Which approach do you prefer? + +**Stephen's response:** + +**Decision: Option 5 (combination) — the evidence model is the core deliverable; output formats are presentation layers.** + +Phase 1 ships: +- **Enriched JSON** (Scorecard-native, no external dependency) +- **In-toto predicates** — SVR first; track [Baseline Predicate PR #502](https://github.com/in-toto/attestation/pull/502). Multiple predicate types supported simultaneously. Existing Scorecard predicate type (`scorecard.dev/result/v0.1`) preserved for backwards compatibility. +- **Gemara output** — dependency already transitive via `github.com/ossf/security-baseline` (gemara v0.7.0). The existing formatter pattern (`As()` methods) makes adding this straightforward. +- **OSCAL Assessment Results** — using [go-oscal](https://github.com/defenseunicorns/go-oscal). The security-baseline repo already exports OSCAL Catalog format (control definitions) via go-oscal v0.6.3. Scorecard would produce OSCAL Assessment Results (findings per control for a given repo) — a complementary OSCAL model. AMPEL has native OSCAL support. + +There is no "OSPS output format" (confirming Eddie's, Adolfo's, and Mike's feedback). The `--format=osps` flag is replaced by the specific format flags above. + +#### CQ-19: Architectural direction — build vs. integrate + +**Stakeholders:** Stephen, Spencer, Eddie Knight, Steering Committee, at least 1 non-Steering maintainer — this is the gating decision; most other open questions depend on its outcome + +This is the central decision. Eddie proposes consolidating Scorecard checks/probes into the Privateer plugin and having Scorecard execute the plugin (EK-5). The current proposal has Scorecard building its own conformance engine. + +**Option A: Scorecard builds its own conformance engine** (current proposal) +- Scorecard adds a mapping file, conformance evaluation logic, and output format +- No code-level dependency on Privateer +- Scorecard controls its own release cadence and architecture +- Risk: duplicates evaluation logic, no technical relationship with Privateer (EK-3) + +**Option B: Shared plugin model** (Eddie's alternative) +- Scorecard checks/probes are consolidated into the Privateer plugin +- Scorecard executes the plugin under the covers +- Bidirectional: Privateer users can also run Scorecard checks e.g., LFX Insights +- Gemara integration comes for free via the plugin +- Risk: Scorecard releases are coupled to plugin's release cadence; CODEOWNERS in the second repo must be meticulously managed to avoid surprises; multi-platform support (GitLab, Azure DevOps, local) will require maintenance of independent plugins with isolated data collection for each platform + +**Option C: Hybrid** +- Scorecard maintains its own probe execution (its core competency) +- Scorecard exports its probe results in a format the Privateer plugin can consume (Gemara L5) +- The Privateer plugin consumes Scorecard output as supplementary evidence +- Control catalog is extracted and shared, but evaluation logic stays separate +- Users will choose between the Privateer plugin and Scorecard for Baseline evaluations +- No code-level coupling, but interoperable output + +Which option do you prefer? What are your concerns about taking a dependency on the Privateer plugin codebase? + +**Stephen's response:** + +**Decision: Option C (hybrid), designed so that scaling back to Option A remains straightforward if needed.** + +The architecture must ensure: +1. Scorecard owns all probe execution (non-negotiable core competency) +2. Scorecard owns its own conformance evaluation logic (mapping, PASS/FAIL, applicability engine all live in Scorecard) +3. Interoperability is purely at the output layer — Gemara, in-toto, SARIF, OSCAL are presentation formats, not architectural dependencies +4. Evaluation logic is self-contained — Scorecard can produce conformance results using its own probes and mappings, independent of external evaluation engines + +**Dependency guidance:** Only adopt reasonably stable dependencies when needed. `github.com/ossf/security-baseline` is an acceptable data dependency for control definitions. + +**Flexibility:** Under this structure, scaling back to a fully independent model (Option A) remains straightforward — deprioritize or drop specific output formatters without affecting the evaluation layer. + + +#### CQ-20: Catalog extraction — what does it mean concretely? + +**Stakeholders:** Stephen, Eddie Knight, Steering Committee + +Eddie is "hugely in favor" of extracting the Scorecard control catalog (EK-4) but the proposal lacks an implementation plan. Concretely, this could mean: + +1. **Machine-readable probe definitions**: Export `probes/*/def.yml` as a versioned catalog (already exists in the repo, but not packaged for external consumption) +2. **Gemara L2 control definitions**: Map Scorecard probes to Gemara Layer 2 schema entries, making them available in the Gemara catalog +3. **Shared evaluation steps**: Extract Scorecard's probe logic into a reusable Go library or Privateer plugin steps that other tools can execute +4. **API-level catalog**: Expose probe definitions via the Scorecard API so tools can discover what Scorecard can evaluate + +What level of extraction do you envision? Is option 2 (Gemara L2 integration) the right target, or should we start simpler? + +**Stephen's response:** + + +#### CQ-21: Privateer code duplication — is it acceptable? + +**Stakeholders:** Stephen, Spencer, Eddie Knight, Steering Committee — flows from CQ-19 + +Eddie points out that the current proposal would result in two codebases evaluating the same OSPS controls independently (EK-3). Even if the proposal says "don't duplicate Privateer," building a separate conformance engine effectively does that. + +Is some duplication acceptable if it means Scorecard retains architectural independence? Or is avoiding duplication a hard constraint that should drive us toward the shared plugin model (CQ-19 Option B)? + +**Stephen's response:** + +Resolved by CQ-19 decision. Option C (hybrid) accepts that some evaluation overlap may occur. Scorecard SHOULD NOT duplicate evaluation that downstream tools handle (RFC 2119 SHOULD NOT, not MUST NOT). Scorecard retains architectural independence — interoperability is at the output layer, not the evaluation layer. + + +#### CQ-22: Attestation decomposition — identity vs. tooling + +**Stakeholders:** Stephen, Spencer, Adolfo García Veytia, Eddie Knight + +Adolfo points out that OQ-1 (attestation mechanism identity) conflates two questions (AP-3): + +1. **Identity**: Who signs the attestation, and how are those identities trusted? (Sigstore/OIDC, personal keys, platform-native) +2. **Tooling**: What tool generates the attestation, and when? (Scorecard during scan, CI pipeline post-scan, manual maintainer process) + +The answers will differ per project and per control. Should OQ-1 be decomposed into these two sub-questions, and should the design allow different identity/tooling combinations per control? + +Adolfo has offered to discuss this in depth. + +**Stephen's response:** + +Acknowledged. OQ-1 should be decomposed into identity and tooling sub-questions as Adolfo suggests. The design should allow different identity/tooling combinations per control. Detailed resolution deferred to discussion with Adolfo and Spencer. + + +#### CQ-23: Mapping registry — where should the canonical mapping live? + +**Stakeholders:** Stephen, Eddie Knight, Adolfo García Veytia, Baseline maintainers + +Three perspectives have emerged on where Scorecard-to-Baseline mappings should live: + +- **Eddie (EK-1)**: Host in the Baseline repo with Scorecard maintainers as CODEOWNERS +- **Adolfo (AP-1)**: Prefers a single registry in the Baseline itself, but notes the Baseline's mapping support was [demoted](https://github.com/ossf/security-baseline/pull/476) +- **Current proposal**: Host in Scorecard repo (`pkg/osps/mappings/`) + +Additionally, AMPEL already maintains [independent Scorecard-to-Baseline mappings](https://github.com/carabiner-dev/policies/tree/main/groups/osps-baseline) in its policy library. An official upstream mapping would benefit both AMPEL and the wider ecosystem. + +This extends CQ-17 with Adolfo's context about the demoted Baseline registry. Should the Scorecard mapping effort also advocate for restoring a shared registry in the Baseline spec? + +**Stephen's response:** + +Resolved by the two-layer mapping model (see CQ-17). Check-level relations are contributed upstream to `ossf/security-baseline` via PR, using the existing `guideline-mappings` structure. Probe-level mappings live in Scorecard. This approach works with the current state of the security-baseline repo without requiring restoration of the demoted mapping registry. + + +--- + +## Scorecard user feedback + +### Felix Lange's feedback (Scorecard community meeting) + +The following feedback was provided by Felix Lange during the Scorecard +community meeting on 2026-03-05. + +#### FL-1: Confidence scoring instead of binary UNKNOWN + +Felix suggested generalizing the UNKNOWN-first model into a confidence score +that captures partial certainty, referencing [SAP Fosstars](https://sap.github.io/fosstars-rating-core/confidence.html). +In the Fosstars model, a confidence score (0-10) accompanies every rating; if +confidence falls below a threshold, the label becomes UNCLEAR regardless of the +numeric score. + +**Stephen's response:** Interesting direction. The probe evidence model already +provides the raw data for confidence derivation (each probe's outcome is +independently observable). Added as a future design concept — formal confidence +scoring may be added when consumer demand warrants it. + +#### FL-2: Single run for all output + +Output should allow consumers to obtain OSPS conformance evaluations and check +details (like Maintained) without having to run Scorecard twice. The API +(api.scorecard.dev) should also avoid requiring multiple requests. + +**Stephen's response:** Agreed. Added as architectural constraint #5 — a single +Scorecard run produces both check scores and conformance results. This applies +to CLI, Action, and API surfaces. + +#### FL-3: Existing checks should remain prominent + +Checks like Maintained help users identify abandoned projects and are valuable +for risk assessment even when they don't map directly to OSPS controls. These +should be preserved in a prominent manner. + +**Stephen's response:** Existing checks are fully preserved — check scores and +conformance labels are parallel evaluation layers. All checks continue to +produce scores as they do today, regardless of whether their probes map to OSPS +controls. No check is elevated or deprioritized relative to others based on its +OSPS Baseline coverage. + +#### FL-4: Simple for consumers to bring alternative frameworks + +It should be straightforward for consumers to evaluate against frameworks other +than OSPS Baseline, including internal or unpublished variants. + +**Stephen's response:** Aligns with Adolfo's feedback (AP-6). The conformance +engine is framework-agnostic by design — mapping definitions are the only +framework-specific component. A `--framework` CLI option is noted as a future +design concept. + +--- + +## Decision priority analysis + +The open questions have dependencies between them. Answering them in the +wrong order will result in rework. The recommended sequence follows. + +### Tier 1 — Gating decisions + +| Question | Status | Resolution | +|----------|--------|------------| +| **CQ-19** | **RESOLVED** | Option C (hybrid), designed so that scaling back to Option A remains straightforward. Scorecard owns probe execution and evaluation; interoperability at output layer only. | +| **OQ-1** | **OPEN** | Attestation identity model. Spencer flagged as blocking. CQ-22 decomposes into identity vs. tooling. | + +### Tier 2 — Downstream of CQ-19 + +| Question | Status | Resolution | +|----------|--------|------------| +| **CQ-18** | **RESOLVED** | Enriched JSON + in-toto predicates + Gemara + OSCAL Assessment Results. No "OSPS output format." | +| **CQ-17/CQ-23** | **RESOLVED** | Two-layer mapping model: check-level relations in security-baseline, probe-level mappings in Scorecard. | +| **CQ-22** | **PARTIALLY RESOLVED** | OQ-1 decomposed into identity vs. tooling sub-questions (per Adolfo). Detailed resolution deferred to discussion with Adolfo and Spencer. | +| **OQ-2** | **OPEN** | Enforcement detection scope. Affects Phase 3 scope. Needs Spencer + Stephen + Steering Committee. | + +### Tier 3 — Important but non-blocking for Phase 1 start + +| Question | Status | Notes | +|----------|--------|-------| +| **CQ-20** | **OPEN** | Catalog extraction scope. Flows from CQ-19 (now resolved). | +| **CQ-21** | **RESOLVED** | Some duplication acceptable. RFC 2119 SHOULD NOT, not MUST NOT. | +| **CQ-13** | **RESOLVED** | All consumers equal. RFC 2119 SHOULD NOT duplicate evaluation. | +| **CQ-11** | **OPEN** | Output stability guarantees. CQ-18 now resolved; this can proceed. | + +### Tier 4 — Stephen can answer alone (any time) + +| Question | Notes | +|----------|-------| +| **CQ-9** | NOT_OBSERVABLE controls — implementation detail, UNKNOWN-first principle already agreed. | +| **CQ-12** | Probe gap priority ordering — coverage doc already proposes an order. | +| **CQ-14** | Darnit vs. Minder delineation — ecosystem positioning Stephen can articulate. | +| **CQ-15** | Existing issues as Phase 1 work items — backlog triage. | +| **CQ-16** | Allstar's role — Scorecard sub-project under same Steering Committee. | + +### Effectively resolved + +| Question | Resolution | +|----------|-----------| +| **OQ-3** | Drop `scan_scope` from the schema (Spencer's feedback). | +| **OQ-4** | Evidence is probe-based only, not check-based (adopted). | +| **CQ-10** | Superseded by CQ-17 (two-layer mapping model). | + +### Recommended next steps + +1. **Resolve OQ-1/CQ-22** with Spencer, Adolfo, and the Steering Committee. Spencer flagged OQ-1 as blocking. Adolfo's decomposition (identity vs. tooling) clarifies what needs to be decided. Adolfo has offered to discuss. +2. **Resolve CQ-20** (catalog extraction scope) — now unblocked by CQ-19 resolution. +3. **Resolve CQ-11** (output stability guarantees) — now unblocked by CQ-18 resolution. +4. **Answer the Tier 4 questions** at any time — they are independent and don't block others. +5. **Begin Phase 1 implementation** — the gating architectural decisions (CQ-19, CQ-18, CQ-17) are resolved. diff --git a/openspec/changes/osps-baseline-conformance/proposal.md b/openspec/changes/osps-baseline-conformance/proposal.md new file mode 100644 index 00000000000..8e54a86425d --- /dev/null +++ b/openspec/changes/osps-baseline-conformance/proposal.md @@ -0,0 +1,446 @@ +# Proposal: OSPS Baseline Conformance for OpenSSF Scorecard + +## Summary + +**Mission:** Scorecard produces trusted, structured security evidence for the +open source ecosystem. _(Full Mission, Vision, Values, Strategy, and Roadmap +(MVVSR) to be developed as a follow-up deliverable for Steering Committee +review.)_ + +Scorecard is an **open source security evidence engine**. It accepts diverse +inputs about a project's security practices, normalizes them through probe-based +analysis, and packages the resulting evidence in interoperable formats for +downstream tools to act on. (**Downstream tools** are tools that consume +Scorecard's output to make policy decisions, enforce requirements, or aggregate +security posture—examples: AMPEL, Minder, Privateer, Darnit, LFX Insights, +Allstar.) OSPS Baseline conformance is the first use case that proves this +architecture, and the central initiative for Scorecard's 2026 roadmap. + +This is fundamentally a **product-level shift** — the defining change for +**Scorecard v6**. Scorecard today answers "how well does this repo follow best +practices?" (graded 0-10 heuristics). OSPS conformance requires answering "does +this project meet these MUST statements at this maturity level?" +(PASS/FAIL/UNKNOWN/NOT_APPLICABLE per control, with evidence). Check scores and +conformance labels are parallel evaluation layers over the same probe evidence — +existing checks and scores are unchanged. + +The conformance layer includes **both** evaluation logic (probe→control mapping, +status determination, applicability detection) and output formatting (enriched +JSON, in-toto, Gemara, OSCAL). It composes probe findings into control verdicts, +just as checks compose them into 0-10 scores—same probes, different evaluation +surfaces, single run. + +### Why v6 + +Scorecard v6 represents a major evolution: from a scoring tool to an evidence +engine. The key changes that warrant a major version: + +1. **New evaluation layer** — conformance labels (PASS/FAIL/UNKNOWN) alongside + existing check scores (0-10), produced in a single run +2. **Framework-agnostic architecture** — probe evidence can be composed against + OSPS Baseline or other frameworks via pluggable mapping definitions +3. **Interoperable output formats** — in-toto, Gemara, OSCAL Assessment Results + alongside existing JSON and SARIF +4. **Probe catalog as public interface** — probe definitions become a consumable + artifact for external tools + +Existing checks, probes, scores, and output formats are preserved. v6 is +additive — no breaking changes to existing surfaces. + +## Motivation + +### Why now + +1. **OSPS Baseline is the emerging standard.** The OSPS Baseline (v2026.02.19) defines controls across 3 maturity levels. It is maintained within the ORBIT Working Group and is becoming the reference framework for open source project security posture. (While Scorecard is not part of ORBIT WG, ecosystem interoperability with ORBIT tools is an overarching OpenSSF goal, and Scorecard interoperates through published output formats.) See the [OSPS Baseline maintenance process](https://baseline.openssf.org/maintenance.html) for the versioning cadence. + +2. **The ecosystem is moving.** The [Privateer plugin for GitHub repositories](https://github.com/ossf/pvtr-github-repo-scanner) already evaluates 39 of 52 control requirements and powers LFX Insights security results. The OSPS Baseline GitHub Action can upload SARIF. Best Practices Badge is staging Baseline-phase work. Scorecard's large install base is an advantage, but only if it ships a conformance surface. + +3. **ORBIT WG alignment.** Scorecard sits within the OpenSSF alongside the ORBIT WG. The ORBIT charter's mission is "to develop and maintain interoperable resources related to the identification and presentation of security-relevant data." Scorecard producing interoperable conformance results is a natural fit. + +4. **Regulatory pressure.** The EU Cyber Resilience Act (CRA) and similar regulatory frameworks increasingly expect evidence-based security posture documentation. Scorecard produces structured evidence that downstream tools and processes may use when evaluating regulatory readiness. Scorecard does not itself guarantee CRA compliance or any other regulatory compliance. + +### Why framework conformance in Scorecard? + +Scorecard already performs the core evaluation work needed for framework conformance: probe execution, evidence collection, and probe composition. Rather than create a separate tool that duplicates this capability, the conformance layer builds on Scorecard's existing architecture: + +- **Probe execution**: Scorecard's probes already collect the evidence needed for control evaluation +- **Composition model**: Checks demonstrate probe composition (e.g., the Binary-Artifacts check composes multiple binary detection probes); OSPS controls use the same composition pattern +- **Evidence normalization**: Probes already normalize diverse signals (GitHub API, file analysis, etc.) into structured findings +- **Applicability detection**: Scorecard already evaluates preconditions (e.g., "has made a release") for checks; controls need the same capability + +The conformance layer is a natural extension of Scorecard's probe-based architecture, not a bolted-on feature. Scorecard did a lot of what we needed for evaluation, so there's no need for a new tool. + +### What Scorecard brings that others don't + +- **Deep automated analysis.** 50+ probes with structured results provide granular evidence that the Privateer GitHub plugin's shallower checks cannot match (e.g., per-workflow token permission analysis, detailed branch protection rule inspection, CI/CD injection pattern detection). +- **Multi-platform support.** GitHub, GitLab, Azure DevOps, and local directory scanning. +- **Massive install base.** Scorecard Action, public API, and cron-based scanning infrastructure. +- **Existing policy machinery.** The `policy/` package and structured results were designed for exactly this kind of downstream consumption. + +### Ecosystem tooling comparison + +Several tools operate in adjacent spaces. Understanding their capabilities clarifies what is and isn't Scorecard's job. + +| Dimension | **Scorecard** | **[Allstar](https://github.com/ossf/allstar)** | **[Minder](https://github.com/mindersec/minder)** | **[Darnit](https://github.com/kusari-oss/darnit)** | **[AMPEL](https://github.com/carabiner-dev/ampel)** | **[Privateer GitHub Plugin](https://github.com/ossf/pvtr-github-repo-scanner)** | +|-----------|--------------|---------|---------|-----------|-----------|-------------| +| **Purpose** | Security health measurement | GitHub policy enforcement | Policy enforcement + remediation platform | Compliance audit + remediation | Attestation-based policy enforcement | Baseline conformance evaluation | +| **Action** | Analyzes repositories (read-only) | Monitors orgs, opens issues, auto-fixes settings | Enforces policies, auto-remediates | Audits + fixes repositories | Verifies attestations against policies | Evaluates repos against OSPS controls | +| **Data source** | Collects from APIs/code | Collects from GitHub API + runs Scorecard checks | Collects from APIs + consumes findings from other tools | Analyzes repo state | Consumes attestations only | Collects from GitHub API + Security Insights | +| **Output** | Scores (0-10) + probe findings | GitHub issues + auto-remediated settings | Policy evaluation results + remediation PRs | PASS/FAIL + attestations + fixes | PASS/FAIL + results attestation | Gemara L4 assessment results | +| **OSPS Baseline** | Partial (via probes) | Indirect (enforces subset via Scorecard checks) | Via Rego policy rules | Full (62 controls) | 36 policies mapping to controls (5 consume Scorecard probes) | 39 of 52 controls | +| **In-toto** | Produces attestations | N/A | Consumes attestations | Produces attestations | Consumes + verifies | N/A | +| **OSCAL** | No | No | No | No | Native support | N/A | +| **Sigstore** | No | No | Verifies signatures | Signs attestations | Verifies signatures | N/A | +| **Gemara** | Not yet (planned) | No | No | No | No | L2 + L4 native | +| **Maturity** | Production (v5.3.0) | Production (v4.5, Scorecard sub-project) | Sandbox (OpenSSF, donated Oct 2024) | Alpha (v0.1.0, Jan 2026) | Production (v1.0.0) | Production, powers LFX Insights | +| **Language** | Go | Go | Go | Python | Go | Go | + +**Integration model:** + +```mermaid +flowchart LR + Scorecard["Scorecard
(Evidence Engine)"] -->|checks| Allstar["Allstar
(Enforce on GitHub)"] + Scorecard -->|evidence| Privateer["Privateer
(Baseline evaluation)"] + Scorecard -->|evidence| Minder["Minder
(Enforce + Remediate)"] + Scorecard -->|evidence| AMPEL["AMPEL
(Attestation-based
policy enforcement)"] + Scorecard -->|evidence| Darnit["Darnit
(Audit + Remediate)"] + Darnit -->|attestation| AMPEL +``` + +Scorecard is the **evidence engine** (produces structured security evidence). +All downstream tools consume Scorecard evidence on equal terms through published +output formats. [Allstar](https://github.com/ossf/allstar) is a Scorecard +sub-project that enforces Scorecard check results as policies. +[Minder](https://github.com/mindersec/minder) enforces security policies across +repositories. [AMPEL](https://github.com/carabiner-dev/ampel) validates +attestations against policies in CI/CD pipelines — it already maintains +[policies consuming Scorecard probe results](https://github.com/carabiner-dev/policies/tree/main/scorecard) +and [OSPS Baseline policy mappings](https://github.com/carabiner-dev/policies/tree/main/groups/osps-baseline). +[Darnit](https://github.com/kusari-oss/darnit) audits compliance and +remediates. [Privateer](https://github.com/ossf/pvtr-github-repo-scanner) +evaluates Baseline conformance. They are complementary, not competing. + +### What Scorecard SHOULD NOT do + +Scorecard SHOULD NOT (per [RFC 2119](https://www.rfc-editor.org/rfc/rfc2119)) +duplicate evaluation that downstream tools handle. There may be scenarios where +overlapping evaluation makes sense (e.g., Scorecard brings deeper analysis or +different evidence sources), but the default posture is complementarity. + +- **Duplicate policy enforcement or remediation.** Downstream tools — [Privateer](https://github.com/ossf/pvtr-github-repo-scanner), [Minder](https://github.com/mindersec/minder), [AMPEL](https://github.com/carabiner-dev/ampel), [Darnit](https://github.com/kusari-oss/darnit), and others — consume Scorecard evidence through published output formats. Scorecard *produces* findings and attestations; downstream tools enforce, remediate, and audit. +- **Privilege any downstream consumer.** All tools consume Scorecard output on equal terms. No tool has a special integration relationship. +- **Turn OSPS controls into Scorecard checks.** OSPS conformance is a layer that consumes existing Scorecard signals, not 59 new checks. + +## Current state + +### Coverage snapshot + +A fresh analysis of Scorecard's current coverage against OSPS Baseline v2026.02.19 is tracked in `docs/osps-baseline-coverage.md`. Previous coverage estimates against older Baseline versions should be treated as out-of-date. + +### Existing Scorecard surfaces that matter + +- **Checks** produce 0-10 scores — useful as signal but not conformance results +- **Probes** produce structured boolean findings — the right granularity for control mapping +- **Output formats** (JSON, SARIF, probe, in-toto) — conformance evidence is delivered through these and new formats (Gemara, OSCAL) +- **[Allstar](https://github.com/ossf/allstar)** (Scorecard sub-project) — continuously monitors GitHub organizations and enforces Scorecard checks as policies with auto-remediation. Allstar already enforces several controls aligned with OSPS Baseline (branch protection, security policy, binary artifacts, dangerous workflows). OSPS conformance output could enable Allstar to enforce Baseline conformance at the organization level. +- **Multi-repo scanning** (`--repos`, `--org`) — needed for OSPS-QA-04.02 (subproject conformance) +- **Serve mode** — HTTP surface for pipeline integration + +## Open questions and design decisions + +The following design questions have been addressed through maintainer review and Steering Committee discussion: + +**Attestation mechanism for non-automatable controls** — Phase 1 focuses on automatically verifiable controls only. Discussion and design of attestation mechanisms (both inbound for non-automatable controls and outbound for signing Scorecard's own output) is deferred to Phase 3 or beyond. This avoids blocking on identity model questions (OIDC vs. repo-local metadata vs. platform-native signals) while making progress on controls Scorecard can definitively evaluate. + +**Enforcement detection boundary** — Scorecard detects signals of enforcement (e.g., "SCA tool is configured," "SAST results required before merge") but does not itself enforce policies. The boundary is: Scorecard observes and reports; downstream tools (Minder, AMPEL, Allstar) enforce. Phase 3 includes enforcement detection for SCA and SAST policies, but Scorecard remains an evidence engine, not an enforcement tool. + +**Predicate strategy** — The Steering Committee rejected in-toto SVR (too minimal, no probe-level evidence) and decided to create a new Scorecard-owned, framework-agnostic predicate type (`scorecard.dev/evidence/v0.1`). This supports OSPS Baseline, SLSA, and custom frameworks with probe-level evidence. The existing `scorecard.dev/result/v0.1` predicate (check-based) is preserved unchanged. + +**Architecture** — Scorecard owns all probe execution and conformance evaluation logic, with interoperability purely at the output layer (in-toto, Gemara, OSCAL). This hybrid approach enables scaling back to a fully independent model if needed. + +**Unified framework abstraction** — Checks and OSPS Baseline both use the same internal probe composition interface. No separate "mapping layers" or upstream contributions needed; probe-to-control mappings are maintained in Scorecard. + +**Evidence model** — Probe-based only, not check-based. Conformance results reference probe findings as evidence, not check scores. + +For the full review history including feedback from Spencer Schrock, Adam Korczynski, Eddie Knight (ORBIT WG), Adolfo García Veytia (AMPEL), Mike Lieberman, and Felix Lange, see [`decisions.md`](decisions.md). + +## Architecture + +### Processing model + +Scorecard's processing model has four steps: + +1. **Ingest** — Accept diverse signals about a project (repository APIs, + metadata files, platform signals, external services) +2. **Analyze** — Normalize signals through probes that understand multiple + ways to satisfy the same outcome +3. **Evaluate** — Produce parallel assessments: check scores (0-10) and + conformance labels (PASS/FAIL/UNKNOWN) +4. **Deliver** — Package evidence in interoperable formats (JSON, in-toto, + Gemara, SARIF, OSCAL) for downstream consumption + +### Three-tier evaluation model + +```mermaid +flowchart TD + Probes["Probe findings
(atomic boolean measurements)"] + Probes --> Checks["Check scoring
(0-10, existing)"] + Probes --> Conformance["Conformance evaluation
(PASS/FAIL/UNKNOWN, new)"] + Checks --> Formats["Output formats
(JSON, in-toto, Gemara,
SARIF, OSCAL, probe, default)"] + Conformance --> Formats +``` + +Check scores and conformance labels are *parallel interpretations* of the same +probe evidence, not competing modes. Both can appear in the same output. + +The conformance evaluation layer is framework-agnostic by design. OSPS Baseline +is the first use case, but the same probe evidence can be composed differently +for other frameworks and organizational profiles. Probe findings carry no +framework-specific semantics — only the mapping definitions (which probes +compose into which control outcomes) are framework-specific. + +The three-tier model describes the structural layers of the architecture. The +Processing model (described earlier in this section) provides the temporal data +flow view, showing how data moves through these layers (Ingest → Analyze → +Evaluate → Deliver). These are complementary perspectives on the same +architecture. + +### Architectural target state + +1. Scorecard owns all probe execution (non-negotiable core competency) +2. Scorecard owns its own conformance evaluation logic (mapping, PASS/FAIL, + applicability engine all live in Scorecard) +3. Interoperability is purely at the output layer — Gemara, in-toto, SARIF, + OSCAL are presentation formats, not architectural dependencies +4. Evaluation logic is self-contained — Scorecard can produce conformance + results using its own probes and mappings, independent of external + evaluation engines +5. A single Scorecard run produces both check scores and conformance results — + users MUST NOT need to run Scorecard twice or make separate API requests + to obtain both evaluation layers + +**Dependency guidance:** Only adopt reasonably stable dependencies when needed. +The [security-baseline](https://github.com/ossf/security-baseline) repo is an +acceptable data dependency for control definitions (see Scope). + +### Design principles + +1. **Evidence is the product.** Scorecard's core output is structured, + normalized probe findings. Check scores and conformance labels are parallel + evaluation layers over the same evidence. +2. **Probes normalize diversity.** Each probe understands multiple ways a + control outcome can be satisfied. A source type taxonomy (file-based, + API-based, metadata-based, external-service, convention-based) guides probe + design. +3. **UNKNOWN-first honesty.** If Scorecard cannot observe a control, the + status is UNKNOWN with an explanation — never a false PASS or FAIL. +4. **All consumers are equal.** Downstream tools — Privateer, AMPEL, Minder, + Darnit, and others — consume Scorecard evidence through published output + formats. +5. **No metadata monopolies.** Probes may evaluate multiple sources for the + same data. No single metadata file is required for meaningful results, + though they may enrich results. +6. **Formats are presentation.** Output formats (JSON, in-toto, Gemara, SARIF, + OSCAL) are views over the evidence model, optimized for different consumer + types. No single format is privileged. + +The following are implementation constraints (not top-level principles): +**Additive, not breaking** — existing checks, probes, scores, and output formats +do not change behavior. **Data-driven mapping** — the mapping between OSPS +controls and Scorecard probes is a versioned data file, not hard-coded logic. + +## Scope + +### In scope + +1. **OSPS conformance engine** — new package that maps controls to Scorecard probes, evaluates per-control status, handles applicability +2. **Evidence model and output formats** — the evidence model is the core deliverable; output formats are presentation layers over it: + - Enriched JSON (Scorecard-native, no external dependency) + - In-toto evidence predicate (`scorecard.dev/evidence/v0.1`) — Scorecard-owned, framework-agnostic predicate supporting OSPS Baseline, SLSA, and custom frameworks with probe-level evidence. Rationale: own predicate destiny, don't depend on unmerged external PRs, support BYO frameworks. + - Gemara output (transitive dependency via security-baseline) + - OSCAL Assessment Results (using [go-oscal](https://github.com/defenseunicorns/go-oscal)) + - **Note:** Existing `scorecard.dev/result/v0.1` predicate (check-based JSON) preserved unchanged. The new evidence predicate is additive, not a replacement. Both coexist; users choose via CLI flags. +3. **Unified framework abstraction for OSPS Baseline v2026.02.19** — Checks and OSPS Baseline both use the same internal interface/representation (probe compositions): + - Probe-to-control mappings maintained in Scorecard for OSPS Baseline controls + - Framework evaluation layer produces conformance results (PASS/FAIL/UNKNOWN/NOT_APPLICABLE) + - A control may map to a single probe (1:1) or a composition of probes with evaluation logic (many-to-1) + - This follows the same composition pattern used by [existing checks](https://github.com/ossf/scorecard/blob/main/probes/entries.go) + - Checks themselves are effectively a framework (the "Scorecard framework"); OSPS Baseline is another framework over the same probe evidence +4. **security-baseline dependency** — `github.com/ossf/security-baseline` as a data dependency for control definitions, Gemara types, and OSCAL catalog models +5. **Applicability engine** — detects preconditions (e.g., "has made a release") and outputs NOT_APPLICABLE +6. **Metadata ingestion layer** — supports Security Insights as one source among several for metadata-dependent controls (OSPS-BR-03.01, BR-03.02, QA-04.01). Architecture invites contributions for alternative sources (SBOMs, VEX, platform APIs). No single metadata file is required for meaningful results. +7. **Attestation mechanism (v1)** — accepts repo-local metadata for non-automatable controls +8. **Scorecard probe catalog** — Scorecard *consumes* external control catalogs (OSPS Baseline via security-baseline) for conformance evaluation. The catalog extraction plan packages Scorecard's own probe definitions (`probes/*/def.yml`) as a consumable artifact so external tools (e.g., AMPEL) can discover what Scorecard evaluates and compose their own mappings against it. +9. **New probes and probe enhancements** for gap controls: + - Secrets detection (OSPS-BR-07.01) + - Governance/docs presence (OSPS-GV-02.01, GV-03.01, DO-01.01, DO-02.01) + - Dependency manifest presence (OSPS-QA-02.01) + - Security policy deepening (OSPS-VM-02.01, VM-03.01, VM-01.01) + - Release asset inspection (multiple L2/L3 controls) + - Signed manifest support (OSPS-BR-06.01) + - Enforcement detection (OSPS-VM-05.*, VM-06.*) +10. **Multi-repo project-level conformance** (OSPS-QA-04.02) + +### Future design concepts + +The following concepts are stated as design direction but deferred for detailed +design: + +- **Source type taxonomy** — Probes could be designed with a source type + taxonomy (file-based, API-based, metadata-based, external-service, + convention-based) that guides probe design and helps contributors understand + where to add new detection paths. The probe interface should be designed to + accept multiple sources from the start, with the option to add sources later. +- **Framework selection CLI option** — A `--framework` or `--evaluation` + option could let users select which evaluation layer(s) to run (checks, + OSPS Baseline, or a custom framework profile) and determine the output + shape (e.g., check-based vs. probe-based predicate type). +- **Probe-level in-toto predicate type** — The existing + `scorecard.dev/result/v0.1` predicate wraps check-level results. + A dedicated probe-level predicate type could wrap flat probe findings for + framework evaluation tools. Worth adding when there is concrete need + beyond what SVR provides. +- **Confidence scoring** — The current model produces binary conformance + labels (PASS/FAIL/UNKNOWN). A confidence score (inspired by + [Fosstars](https://sap.github.io/fosstars-rating-core/confidence.html)) + could express partial certainty — e.g., "PASS with confidence 7/10" when + 3 of 4 mapped probes returned findings. The probe evidence model already + provides the raw data for confidence derivation (each probe's outcome is + independently observable). A formal confidence score may be added when + consumer demand warrants it. + +### Out of scope + +- Policy enforcement and remediation (Minder's, AMPEL's, and Darnit's domain) +- Replacing the Privateer plugin for GitHub repositories +- Changing existing check scores or behavior +- OSPS Baseline specification changes (ORBIT WG's domain) + +## Phased delivery + +Phases are ordered by outcome, not calendar quarter. Maintainer bandwidth dictates delivery timing. + +### Phase 1: Conformance foundation + Level 1 coverage + +**Outcome:** Scorecard produces a useful OSPS Baseline Level 1 conformance report for any public GitHub repository, available across CLI and GitHub Action. + +**Deployment surfaces for Phase 1:** +- ✅ CLI (local execution) +- ✅ GitHub Action (repository-specific results, no storage cost to Scorecard project) +- ❌ Cron service deferred to Phase 2+ (storage/serving cost for conformance data across 1M+ repos needs evaluation) + +Phase 1 still delivers value: organizations can self-assess via Action or CLI without waiting for cron coverage. + +- Evidence model and output formats: + - Enriched JSON (Scorecard-native) + - In-toto evidence predicate (`scorecard.dev/evidence/v0.1`) + - Gemara output (transitive via [security-baseline](https://github.com/ossf/security-baseline) dependency) + - OSCAL Assessment Results (via [go-oscal](https://github.com/defenseunicorns/go-oscal)) +- Unified framework abstraction for OSPS Baseline v2026.02.19: + - Probe-to-control mappings maintained in Scorecard + - Checks and OSPS Baseline use same internal probe composition interface +- Applicability engine (detect "has made a release" and other preconditions) +- Map existing probes to OSPS controls where coverage exists today +- New probes for Level 1 gaps (prioritized by coverage impact): + - Governance/docs presence (GV-02.01, GV-03.01, DO-01.01, DO-02.01) + - Dependency manifest presence (QA-02.01) + - Security policy deepening (VM-02.01, VM-03.01, VM-01.01) + - Secrets detection (BR-07.01) — consume platform signals (e.g., GitHub secret scanning API) where possible +- Metadata ingestion layer v1 — Security Insights as first supported source (BR-03.01, BR-03.02, QA-04.01); architecture supports additional metadata sources +- Scorecard control catalog extraction — Extract Scorecard checks into an in-project control framework representation that uses the same unified framework abstraction as OSPS Baseline. This enables checks and OSPS Baseline controls to be treated uniformly within the evaluation layer. + +### Phase 2: Release integrity + Level 2 core + +**Outcome:** Scorecard evaluates release-related OSPS controls, covering the core of Level 2 and becoming useful for downstream due diligence workflows. + +- Release asset inspection layer (detect compiled assets, SBOMs, licenses with releases) +- Signed manifest support (BR-06.01) +- Release notes/changelog detection (BR-04.01) +- Attestation mechanism v1 for non-automatable controls +- Evidence bundle output v1 (conformance results + in-toto statement + SARIF for failures) +- Additional metadata sources for the ingestion layer + +### Phase 3: Enforcement detection + Level 3 + multi-repo + +**Outcome:** Scorecard covers Level 3 controls including enforcement detection and project-level aggregation. + +- SCA policy + enforcement detection (VM-05.*) +- SAST policy + enforcement detection (VM-06.*) +- Multi-repo project-level conformance aggregation (QA-04.02) +- Attestation integration GA + +## Relationship to ORBIT ecosystem + +```mermaid +flowchart TD + subgraph ORBIT["ORBIT WG Ecosystem"] + Baseline["OSPS Baseline
(controls)"] + Gemara["Gemara
(schemas: L2/L4)"] + SI["Security Insights
(metadata)"] + + subgraph Evaluation["Evaluation"] + Privateer["Privateer GitHub Plugin
(LFX Insights driver)"] + subgraph ScorecardEcosystem["Scorecard Ecosystem"] + Scorecard["OpenSSF Scorecard
(evidence engine:
deep analysis, multi-platform)"] + Allstar["Allstar
(GitHub policy enforcement,
Scorecard sub-project)"] + end + end + + subgraph Enforcement["Enforcement & Audit"] + Minder["Minder
(enforce + remediate)"] + AMPEL["AMPEL
(attestation-based
policy enforcement)"] + Darnit["Darnit
(audit + remediate)"] + end + end + + Baseline -->|defines controls| Privateer + Baseline -->|defines controls| Scorecard + Baseline -->|defines controls| Minder + Baseline -->|defines controls| Darnit + Baseline -->|informs policies| AMPEL + Gemara -->|provides schemas| Privateer + Gemara -->|provides schemas| Scorecard + SI -->|provides metadata| Privateer + SI -->|provides metadata| Scorecard + Scorecard -->|checks| Allstar + Scorecard -->|evidence| Privateer + Scorecard -->|evidence| Minder + Scorecard -->|evidence| AMPEL + Scorecard -->|evidence| Darnit + Darnit -->|attestation| AMPEL +``` + +**Scorecard's role**: Produce deep, probe-based security evidence that +downstream tools can consume through published output formats. Scorecard ingests +diverse signals, normalizes them through probes, and delivers evidence in +interoperable formats (JSON, in-toto, Gemara, SARIF, OSCAL). + +**All consumers are equal.** Privateer, AMPEL, Minder, Darnit, and future tools +consume Scorecard evidence on the same terms through published output formats. + +**What Scorecard does NOT do**: Enforce policies or remediate (Minder's and +AMPEL's role), perform compliance auditing and remediation (Darnit's role), or +guarantee compliance with any regulatory framework. + +## Success criteria + +The following criteria define proposal acceptance (successful Phase 1 implementation): + +1. Scorecard produces a valid OSPS Baseline Level 1 conformance report for any public GitHub repository across CLI, Action, and API surfaces +2. Evidence model supports multiple output formats (enriched JSON, in-toto, Gemara, OSCAL) — each validated with at least one downstream consumer +3. Conformance evidence is consumable by any downstream tool through published output formats (validated with ORBIT WG) +4. All open questions (OQ-1 through OQ-4) are resolved with documented decisions +5. No changes to existing check scores or behavior +6. Additive, not breaking: existing checks, probes, scores, and output formats do not change behavior + +## Approval process + +- **[blocking]** Sign-off from Stephen Augustus and Spencer (Steering Committee) +- **[blocking]** Review from at least 1 non-Steering Scorecard maintainer +- **[non-blocking]** Reviews from maintainers of tools in the WG ORBIT ecosystem +- **[informational]** Notify WG ORBIT members (TAC sign-off not required) + +## Feedback, decisions, and next steps + +All reviewer feedback, maintainer clarifying questions, and the decision +priority analysis are tracked in [`decisions.md`](decisions.md).