From 4851dd43f5b22ef5e30dc6ee75d3769a221521e7 Mon Sep 17 00:00:00 2001
From: Stephen Augustus <foo@auggie.dev>
Date: Sat, 21 Mar 2026 13:59:37 +0100
Subject: [PATCH 01/28] :seedling: Add OpenSpec scaffolding for PVTR
 integration

- Bootstrap openspec/ directory structure with initial specs:
  - openspec/specs/platform-clients/spec.md: platform client abstraction
  - openspec/changes/pvtr-integration/specs/pvtr-baseline/spec.md:
    OSPS Baseline integration requirements and scenarios
- Add AGENTS.md to .gitignore (AI collaboration guidelines tracked separately)

Signed-off-by: Stephen Augustus <foo@auggie.dev>
Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Stephen Augustus <foo@auggie.dev>
---
 .gitignore                                    |  3 +
 .../specs/pvtr-baseline/spec.md               | 56 +++++++++++++++++++
 openspec/specs/platform-clients/spec.md       | 26 +++++++++
 3 files changed, 85 insertions(+)
 create mode 100644 openspec/changes/pvtr-integration/specs/pvtr-baseline/spec.md
 create mode 100644 openspec/specs/platform-clients/spec.md

diff --git a/.gitignore b/.gitignore
index 84f2e0bf66b..fb2ac85231e 100644
--- a/.gitignore
+++ b/.gitignore
@@ -64,3 +64,6 @@ newRelease.json
 # Ignore golang's vendored files
 /vendor/
 /tools/vendor/
+
+# AI tooling instructions
+AGENTS.md
diff --git a/openspec/changes/pvtr-integration/specs/pvtr-baseline/spec.md b/openspec/changes/pvtr-integration/specs/pvtr-baseline/spec.md
new file mode 100644
index 00000000000..883ed5a2c4e
--- /dev/null
+++ b/openspec/changes/pvtr-integration/specs/pvtr-baseline/spec.md
@@ -0,0 +1,56 @@
+# OSPS Baseline Integration
+
+## Purpose
+
+Enable Scorecard probes to be mapped to OSPS Baseline controls, allowing Scorecard to report repository compliance against the Open Source Project Security Baseline specification.
+
+## Requirements
+
+### Requirement: Probe Annotation
+Probe definition files (`def.yml`) SHALL support an optional `osps_baseline` field that maps the probe to one or more OSPS Baseline control IDs.
+
+### Requirement: Mapping Types
+Each probe-to-control mapping SHALL specify a mapping type: `direct`, `partial`, or `informational`.
+
+### Requirement: Maturity Level Tracking
+Each mapping SHALL include the OSPS Baseline maturity level (1, 2, or 3) of the associated control.
+
+### Requirement: Baseline Output Format
+Scorecard SHALL support an `osps-baseline` output format that groups probe results by OSPS Baseline control and reports per-control compliance status.
+
+### Requirement: Maturity Level Calculation
+The baseline output SHALL calculate the highest maturity level achieved (i.e., the highest level where all controls at that level pass).
+
+### Requirement: Backward Compatibility
+The `osps_baseline` field SHALL be optional. Probes without this field SHALL continue to function as before with no behavior change.
+
+### Requirement: Generated Mapping Documentation
+The build system SHALL generate a mapping document (`docs/osps-baseline-mapping.yaml`) from probe annotations.
+
+## Scenarios
+
+### Scenario: Probe maps to a baseline control
+- GIVEN a probe with `osps_baseline` annotation mapping to control OSPS-BR-01
+- WHEN Scorecard runs with `--format osps-baseline`
+- THEN the output includes OSPS-BR-01 with the probe's finding outcome
+
+### Scenario: Control has no mapped probes
+- GIVEN an OSPS Baseline control with no corresponding Scorecard probes
+- WHEN Scorecard runs with `--format osps-baseline`
+- THEN the control is listed with status `not_assessed`
+
+### Scenario: Multiple probes map to one control
+- GIVEN two probes both annotated with control OSPS-AC-01, one `direct` and one `partial`
+- WHEN both probes return `OutcomeTrue`
+- THEN the control status is `pass`
+
+### Scenario: Partial coverage
+- GIVEN two probes mapped to a control, one returning `OutcomeTrue` and one `OutcomeFalse`
+- WHEN the failing probe has mapping type `direct`
+- THEN the control status is `fail`
+
+### Scenario: Probe without baseline annotation
+- GIVEN a probe with no `osps_baseline` field in its `def.yml`
+- WHEN Scorecard runs with `--format osps-baseline`
+- THEN the probe's results are excluded from the baseline output
+- AND the probe continues to function normally in other output formats
diff --git a/openspec/specs/platform-clients/spec.md b/openspec/specs/platform-clients/spec.md
new file mode 100644
index 00000000000..9d273d1ff97
--- /dev/null
+++ b/openspec/specs/platform-clients/spec.md
@@ -0,0 +1,26 @@
+# Platform Clients
+
+## Purpose
+
+Platform clients provide an abstraction layer over repository hosting platforms (GitHub, GitLab, Azure DevOps, local directories), allowing checks and probes to operate platform-agnostically through the `clients.RepoClient` interface.
+
+## Requirements
+
+### Requirement: RepoClient Interface
+All platform clients SHALL implement the `clients.RepoClient` interface.
+
+### Requirement: Platform Agnosticism
+Checks and probes SHALL NOT contain platform-specific logic; all platform differences SHALL be handled within client implementations.
+
+### Requirement: Authentication
+Clients SHALL support authentication via environment variables (`GITHUB_AUTH_TOKEN`, `GITLAB_AUTH_TOKEN`, `AZURE_DEVOPS_AUTH_TOKEN`).
+
+### Requirement: Rate Limiting
+The GitHub client SHALL support round-robin token rotation for rate limit management.
+
+## Supported Platforms
+
+- **GitHub** (`clients/githubrepo/`) - Stable. REST and GraphQL APIs.
+- **GitLab** (`clients/gitlabrepo/`) - Stable.
+- **Azure DevOps** (`clients/azuredevopsrepo/`) - Experimental.
+- **Local Directory** (`clients/localdir/`) - Stable. File-system-only checks.

From c0c51857b1642e73970ce4fe1ffafd1745bfaf2e Mon Sep 17 00:00:00 2001
From: Stephen Augustus <foo@auggie.dev>
Date: Thu, 26 Feb 2026 16:58:03 -0800
Subject: [PATCH 02/28] :seedling: Rename proposal from pvtr-integration to
 osps-baseline-conformance
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

The scope of this work is OSPS Baseline conformance within the ORBIT
ecosystem — Privateer/PVTR interoperability is one aspect, not the
whole story.

Signed-off-by: Stephen Augustus <foo@auggie.dev>
Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Stephen Augustus <foo@auggie.dev>
---
 .../specs/pvtr-baseline/spec.md                                   | 0
 1 file changed, 0 insertions(+), 0 deletions(-)
 rename openspec/changes/{pvtr-integration => osps-baseline-conformance}/specs/pvtr-baseline/spec.md (100%)

diff --git a/openspec/changes/pvtr-integration/specs/pvtr-baseline/spec.md b/openspec/changes/osps-baseline-conformance/specs/pvtr-baseline/spec.md
similarity index 100%
rename from openspec/changes/pvtr-integration/specs/pvtr-baseline/spec.md
rename to openspec/changes/osps-baseline-conformance/specs/pvtr-baseline/spec.md

From 2aacfb10d7f629241cf6a3ffafc0ff51f3ee7d86 Mon Sep 17 00:00:00 2001
From: Stephen Augustus <foo@auggie.dev>
Date: Thu, 26 Feb 2026 17:03:02 -0800
Subject: [PATCH 03/28] :seedling: Rewrite OSPS Baseline conformance proposal
 to address full roadmap

Complete rewrite of the proposal and spec to cover the full scope of the
2026 roadmap, not just Privateer/PVTR interoperability:

- Conformance engine producing PASS/FAIL/UNKNOWN/NOT_APPLICABLE/ATTESTED
- OSPS output format (--format=osps)
- Versioned control-to-probe mapping files
- Applicability engine for precondition detection
- Security Insights ingestion for ORBIT ecosystem interop
- Attestation mechanism for non-automatable controls
- Gemara Layer 4 compatibility
- CI gating support
- Phased delivery aligned with quarterly milestones
- ORBIT ecosystem positioning (complement PVTR, don't duplicate)

Highlights Spencer's review notes as numbered open questions (OQ-1
through OQ-4):
- OQ-1: Attestation identity model (OIDC? tokens? workflows?)
- OQ-2: Enforcement detection vs. being an enforcement tool
- OQ-3: scan_scope field usefulness in output schema
- OQ-4: Evidence should be probe-based only, not check-based

Renames spec subdirectory from pvtr-baseline to osps-conformance.

Signed-off-by: Stephen Augustus <foo@auggie.dev>
Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Stephen Augustus <foo@auggie.dev>
---
 .../osps-baseline-conformance/proposal.md     | 204 ++++++++++++++++++
 .../specs/osps-conformance/spec.md            | 129 +++++++++++
 .../specs/pvtr-baseline/spec.md               |  56 -----
 3 files changed, 333 insertions(+), 56 deletions(-)
 create mode 100644 openspec/changes/osps-baseline-conformance/proposal.md
 create mode 100644 openspec/changes/osps-baseline-conformance/specs/osps-conformance/spec.md
 delete mode 100644 openspec/changes/osps-baseline-conformance/specs/pvtr-baseline/spec.md

diff --git a/openspec/changes/osps-baseline-conformance/proposal.md b/openspec/changes/osps-baseline-conformance/proposal.md
new file mode 100644
index 00000000000..125db78741c
--- /dev/null
+++ b/openspec/changes/osps-baseline-conformance/proposal.md
@@ -0,0 +1,204 @@
+# Proposal: OSPS Baseline Conformance for OpenSSF Scorecard
+
+## Summary
+
+Add OSPS Baseline conformance evaluation to Scorecard, making it a credible tool for determining whether open source projects meet the security requirements defined by the Open Source Project Security (OSPS) Baseline specification. This is the central initiative for Scorecard's 2026 roadmap.
+
+This is fundamentally a **product-level shift**: Scorecard today answers "how well does this repo follow best practices?" (graded 0-10 heuristics). OSPS conformance requires answering "does this project meet these MUST statements at this maturity level?" (PASS/FAIL/UNKNOWN/NOT_APPLICABLE per control, with evidence). The two models coexist — existing checks and scores are unchanged — but the conformance layer is a new product surface.
+
+## Motivation
+
+### Why now
+
+1. **OSPS Baseline is the emerging standard.** The OSPS Baseline (v2025-10-10) defines 59 controls across 3 maturity levels. It is maintained within the ORBIT Working Group and is becoming the reference framework for open source project security posture.
+
+2. **The ecosystem is moving.** The PVTR GitHub Repo Scanner already evaluates 39 of 52 control requirements and powers LFX Insights security results. The OSPS Baseline GitHub Action can upload SARIF. Best Practices Badge is staging Baseline-phase work. Scorecard's large install base is an advantage, but only if it ships a conformance surface.
+
+3. **ORBIT WG alignment.** Scorecard sits within the OpenSSF alongside the ORBIT WG. The ORBIT charter's mission is "to develop and maintain interoperable resources related to the identification and presentation of security-relevant data." Scorecard producing interoperable conformance results is a natural fit.
+
+4. **Regulatory pressure.** The EU Cyber Resilience Act (CRA) and similar regulatory frameworks increasingly expect evidence-based security posture documentation. OSPS Baseline conformance output positions Scorecard as a tool that produces CRA-relevant evidence artifacts.
+
+### What Scorecard brings that others don't
+
+- **Deep automated analysis.** 50+ probes with structured results provide granular evidence that PVTR's shallower checks cannot match (e.g., per-workflow token permission analysis, detailed branch protection rule inspection, CI/CD injection pattern detection).
+- **Multi-platform support.** GitHub, GitLab, Azure DevOps, and local directory scanning.
+- **Massive install base.** Scorecard Action, public API, and cron-based scanning infrastructure.
+- **Existing policy machinery.** The `policy/` package and structured results were designed for exactly this kind of downstream consumption.
+
+### What Scorecard must not do
+
+- **Duplicate PVTR's role as a Privateer plugin.** PVTR is the Baseline evaluator in the ORBIT ecosystem diagram. Scorecard should complement it with deeper analysis and interoperable output, not fork the evaluation model.
+- **Duplicate remediation engines.** Tools like Darn handle remediation. Scorecard exports stable, machine-readable findings for remediation tools to consume.
+- **Turn OSPS controls into Scorecard checks.** OSPS conformance is a layer that consumes existing Scorecard signals, not 59 new checks.
+
+## Current state
+
+### Coverage snapshot (Scorecard signals vs. OSPS v2025-10-10)
+
+| Level | Controls | Covered (✅) | Partial (⚠️) | Not covered (❌) |
+|-------|----------|-------------|-------------|-----------------|
+| 1     | 24       | 6           | 9           | 9               |
+| 2     | 18       | 1           | 7           | 10              |
+| 3     | 17       | 0           | 5           | 12              |
+
+The full control-by-control mapping is in Appendix A of `docs/roadmap-ideas.md`.
+
+### Existing Scorecard surfaces that matter
+
+- **Checks** produce 0-10 scores — useful as signal but not conformance results
+- **Probes** produce structured boolean findings — the right granularity for control mapping
+- **Output formats** (JSON, SARIF, probe, in-toto) — OSPS output is a new format alongside these
+- **Multi-repo scanning** (`--repos`, `--org`) — needed for OSPS-QA-04.02 (subproject conformance)
+- **Serve mode** — HTTP surface for pipeline integration
+
+## Open questions from maintainer review
+
+The following questions were raised by Spencer (Steering Committee member) during review of the roadmap and need to be resolved before or during implementation:
+
+### OQ-1: Attestation mechanism identity
+
+> "The attestation/provenance layer. What is doing the attestation? Is this some OIDC? A personal token? A workflow (won't have the right tokens)?"
+> — Spencer, on Section 5.1
+
+This is a fundamental design question. Options include:
+- **Repo-local metadata files** (e.g., Security Insights, `.osps-attestations.yml`): simplest, no cryptographic identity, maintainer self-declares by committing the file.
+- **Signed attestations via Sigstore/OIDC**: strongest guarantees, but requires workflow identity and the right tokens — which Spencer correctly notes may not be available in all contexts.
+- **Platform-native signals**: e.g., GitHub's private vulnerability reporting enabled status, which the platform attests implicitly.
+
+**Recommendation to discuss**: Start with repo-local metadata files (unsigned) for the v1 attestation mechanism, with a defined extension point for signed attestations in a future iteration. This avoids blocking on the identity question while still making non-automatable controls reportable.
+
+### OQ-2: Scorecard's role in enforcement detection vs. enforcement
+
+> "I thought the other doc said Scorecard wasn't an enforcement tool?"
+> — Spencer, on Q4 deliverables (enforcement detection)
+
+This is a critical framing question. The roadmap proposes *detecting* whether enforcement exists (e.g., "are SAST results required to pass before merge?"), not *performing* enforcement. But the line between "detecting enforcement" and "being an enforcement tool" needs to be drawn clearly.
+
+**Recommendation to discuss**: Scorecard detects and reports whether enforcement mechanisms are in place. It does not itself enforce. The `--fail-on=fail` CI gating is a reporting exit code, not an enforcement action — the CI system is the enforcer. This distinction should be documented explicitly.
+
+### OQ-3: `scan_scope` field in output schema
+
+> "Not sure I see the importance [of `scan_scope`]"
+> — Spencer, on Section 9 (output schema)
+
+The `scan_scope` field (repo|org|repos) in the proposed OSPS output schema may not carry meaningful information. If the output always describes a single repository's conformance, the scope is implicit.
+
+**Recommendation to discuss**: Drop `scan_scope` from the schema unless multi-repo aggregation (OSPS-QA-04.02) produces a fundamentally different output shape. Revisit in Q4 when project-level aggregation is implemented.
+
+### OQ-4: Evidence model — probes only, not checks
+
+> "[Evidence] should be probe-based only, not check"
+> — Spencer, on Section 9 (output schema)
+
+Spencer's position is that OSPS evidence references should point to probe findings, not check-level results. This aligns with the architectural direction of Scorecard v5 (probes as the measurement unit, checks as scoring aggregations).
+
+**Recommendation**: Adopt this. The `evidence` array in the OSPS output schema should reference probes and their findings only. Checks may be listed in a `derived_from` field for human context but are not evidence.
+
+## Scope
+
+### In scope
+
+1. **OSPS conformance engine** — new package that maps controls to Scorecard probes, evaluates per-control status, handles applicability
+2. **OSPS output format** — `--format=osps` producing a JSON conformance report
+3. **Versioned mapping file** — data-driven YAML mapping OSPS control IDs to Scorecard probes, applicability rules, and evaluation logic
+4. **Applicability engine** — detects preconditions (e.g., "has made a release") and outputs NOT_APPLICABLE
+5. **Security Insights ingestion** — reads `security-insights.yml` to satisfy metadata-dependent controls, aligning with the ORBIT ecosystem data plane
+6. **Attestation mechanism (v1)** — accepts repo-local metadata for non-automatable controls (pending OQ-1 resolution)
+7. **New probes and probe enhancements** for gap controls:
+   - Secrets detection (OSPS-BR-07.01)
+   - Governance/docs presence (OSPS-GV-02.01, GV-03.01, DO-01.01, DO-02.01)
+   - Dependency manifest presence (OSPS-QA-02.01)
+   - Security policy deepening (OSPS-VM-02.01, VM-03.01, VM-01.01)
+   - Release asset inspection (multiple L2/L3 controls)
+   - Signed manifest support (OSPS-BR-06.01)
+   - Enforcement detection (OSPS-VM-05.*, VM-06.* — pending OQ-2 resolution)
+8. **CI gating** — `--fail-on=fail` exit code for pipeline integration
+9. **Multi-repo project-level conformance** (OSPS-QA-04.02)
+10. **Gemara Layer 4 compatibility** — output structurally compatible with ORBIT assessment result schemas
+
+### Out of scope
+
+- Remediation automation (Darn's domain)
+- Replacing PVTR as a Privateer plugin
+- Changing existing check scores or behavior
+- Platform enforcement (Minder's domain)
+- OSPS Baseline specification changes (ORBIT WG's domain)
+
+## Phased delivery
+
+### Q1 2026: OSPS conformance alpha + ecosystem handshake
+
+- OSPS output format (alpha) with `--format=osps`
+- Versioned mapping file for v2025-10-10 (alpha)
+- Applicability engine v1
+- Design + document ORBIT interop commitments (Security Insights, Gemara compatibility, PVTR complementarity)
+- Map existing probes to OSPS controls where coverage already exists
+
+### Q2 2026: Level 1 coverage spike + declared metadata v1
+
+- Secrets detection probe (OSPS-BR-07.01)
+- Governance + docs presence probes (GV-02.01, GV-03.01, DO-01.01, DO-02.01)
+- Dependency manifest presence probe (QA-02.01)
+- Security policy deepening (VM-02.01, VM-03.01, VM-01.01)
+- Security Insights ingestion v1 (BR-03.01, BR-03.02, QA-04.01)
+- Attestation mechanism v1 for non-automatable controls
+- CI gating: `--fail-on=fail` + coverage summary
+
+### Q3 2026: Release integrity + Level 2 core
+
+- Release asset inspection layer (detect compiled assets, SBOMs, licenses with releases)
+- Signed manifest support (BR-06.01)
+- Release notes/changelog detection (BR-04.01)
+- Evidence bundle output v1 (OSPS result JSON + in-toto statement + SARIF for failures)
+
+### Q4 2026: Enforcement detection + Level 3 + multi-repo
+
+- SCA policy + enforcement detection (VM-05.*)
+- SAST policy + enforcement detection (VM-06.*)
+- Multi-repo project-level conformance aggregation (QA-04.02)
+- Attestation integration GA
+
+## Relationship to ORBIT ecosystem
+
+```
+┌─────────────────────────────────────────────────────┐
+│                   ORBIT WG Ecosystem                │
+│                                                     │
+│  ┌──────────┐  ┌───────────┐  ┌──────────────────┐  │
+│  │  OSPS    │  │  Gemara   │  │    Security       │  │
+│  │ Baseline │  │  Schemas  │  │    Insights       │  │
+│  │ (controls│  │  (L2/L4)  │  │    (metadata)     │  │
+│  └────┬─────┘  └─────┬─────┘  └────────┬─────────┘  │
+│       │              │                  │            │
+│       ▼              ▼                  ▼            │
+│  ┌─────────────────────────────────────────────┐     │
+│  │         PVTR GitHub Repo Scanner            │     │
+│  │  (Privateer plugin, LFX Insights driver)    │     │
+│  └─────────────────────┬───────────────────────┘     │
+│                        │                             │
+│                  consumes ▲                           │
+│                        │                             │
+│  ┌─────────────────────┴───────────────────────┐     │
+│  │         OpenSSF Scorecard                   │     │
+│  │  (deep analysis, conformance output,        │     │
+│  │   multi-platform, large install base)       │     │
+│  └─────────────────────────────────────────────┘     │
+│                                                     │
+│  ┌──────────┐  ┌───────────┐                         │
+│  │  Minder  │  │   Darn    │                         │
+│  │ (enforce)│  │ (remediate│                         │
+│  └──────────┘  └───────────┘                         │
+└─────────────────────────────────────────────────────┘
+```
+
+**Scorecard's role**: Produce deep, probe-based conformance evidence that PVTR, Minder, and downstream consumers can use. Scorecard reads Security Insights (shared data plane), outputs Gemara L4-compatible results (shared schema), and fills analysis gaps where PVTR has `NotImplemented` steps.
+
+**What Scorecard does NOT do**: Replace PVTR as the Privateer plugin, enforce policies (Minder's role), or perform remediation (Darn's role).
+
+## Success criteria
+
+1. `scorecard --format=osps --osps-level=1` produces a valid conformance report for any public GitHub repository
+2. Level 1 auto-check coverage reaches ≥80% of controls (currently ~25%)
+3. OSPS output is consumable by PVTR as supplementary evidence (validated with ORBIT WG)
+4. All four open questions (OQ-1 through OQ-4) are resolved with documented decisions
+5. No changes to existing check scores or behavior
diff --git a/openspec/changes/osps-baseline-conformance/specs/osps-conformance/spec.md b/openspec/changes/osps-baseline-conformance/specs/osps-conformance/spec.md
new file mode 100644
index 00000000000..074174da97b
--- /dev/null
+++ b/openspec/changes/osps-baseline-conformance/specs/osps-conformance/spec.md
@@ -0,0 +1,129 @@
+# OSPS Baseline Conformance
+
+## Purpose
+
+Enable Scorecard to evaluate repositories against the OSPS Baseline specification, producing per-control conformance results (PASS/FAIL/UNKNOWN/NOT_APPLICABLE/ATTESTED) with probe-based evidence, interoperable with the ORBIT WG ecosystem.
+
+## Requirements
+
+### Conformance Engine
+
+#### Requirement: Conformance evaluation
+Scorecard SHALL include a conformance engine that evaluates OSPS Baseline controls against Scorecard probe findings and outputs a per-control status.
+
+#### Requirement: Status values
+Each control evaluation SHALL produce one of: `PASS`, `FAIL`, `UNKNOWN`, `NOT_APPLICABLE`, or `ATTESTED`.
+
+#### Requirement: UNKNOWN-first honesty
+When Scorecard cannot observe a control due to insufficient permissions, missing platform support, or lack of data, the status SHALL be `UNKNOWN` with an explanation. The engine SHALL NOT produce `PASS` or `FAIL` for unobservable controls.
+
+#### Requirement: Applicability detection
+The conformance engine SHALL detect applicability preconditions (e.g., "project has made a release") and produce `NOT_APPLICABLE` when preconditions are not met.
+
+#### Requirement: Backward compatibility
+The conformance engine SHALL be additive. Existing checks, probes, scores, and output formats SHALL NOT change behavior.
+
+### Mapping
+
+#### Requirement: Versioned mapping file
+The mapping between OSPS Baseline controls and Scorecard probes SHALL be maintained as a data-driven, versioned YAML file (e.g., `pkg/osps/mappings/v2025-10-10.yaml`), not hard-coded.
+
+#### Requirement: Mapping contents
+Each mapping entry SHALL specify:
+- the OSPS control ID
+- the maturity level (1, 2, or 3)
+- the Scorecard probes that provide evidence
+- applicability conditions (if any)
+- the evaluation logic (how probe outcomes map to control status)
+
+#### Requirement: Unmapped controls
+Controls without mapped probes SHALL appear in output with status `UNKNOWN` and a note indicating no automated evaluation is available.
+
+### Output
+
+#### Requirement: OSPS output format
+Scorecard SHALL support `--format=osps` producing a JSON conformance report containing:
+- OSPS Baseline version
+- target maturity level
+- per-control status, evidence, limitations, and remediation
+- summary counts
+- tool metadata (Scorecard version, timestamp)
+
+#### Requirement: Probe-based evidence
+Evidence references in OSPS output SHALL reference probes and their findings. Check-level results SHALL NOT be used as evidence. Checks MAY be listed in a `derived_from` field for human context.
+
+> **Open Question (OQ-4)**: Spencer's position — evidence should be probe-based only, not check-based. This spec adopts that position. Need to confirm this is the consensus view.
+
+#### Requirement: Gemara Layer 4 compatibility
+The OSPS output schema SHALL be structurally compatible with Gemara Layer 4 assessment results, enabling consumption by ORBIT ecosystem tools without transformation.
+
+#### Requirement: CI gating
+Scorecard SHALL support a `--fail-on=fail` flag (or equivalent) when using OSPS output. `UNKNOWN` statuses SHALL NOT cause failure by default; this SHALL be configurable.
+
+### Metadata and Attestation
+
+#### Requirement: Security Insights ingestion
+Scorecard SHALL read Security Insights files (`security-insights.yml` or `.github/security-insights.yml`) to satisfy controls that depend on declared project metadata.
+
+#### Requirement: Attestation for non-automatable controls
+The conformance engine SHALL accept attestation evidence from a repo-local metadata file for controls that cannot be automated, producing status `ATTESTED` with evidence links.
+
+> **Open Question (OQ-1)**: The identity and trust model for attestations is unresolved. What does the attestation? OIDC? A personal token? A workflow (which won't have the right tokens)? See proposal.md for options. Spencer flagged this as a blocking design question.
+
+### Ecosystem Interoperability
+
+#### Requirement: Complementarity with PVTR
+Scorecard SHALL NOT duplicate the PVTR GitHub Repo Scanner's role as a Privateer plugin. Scorecard provides deep probe-based analysis; PVTR can consume Scorecard's OSPS output as supplementary evidence.
+
+#### Requirement: No enforcement
+Scorecard evaluates and reports conformance. It SHALL NOT enforce policies. The `--fail-on=fail` exit code is a reporting mechanism; the CI system is the enforcer.
+
+> **Open Question (OQ-2)**: Spencer asked whether "enforcement detection" (Q4 roadmap: detecting whether SCA/SAST gating exists) conflicts with Scorecard's stated non-enforcement role. Proposed distinction: Scorecard *detects* enforcement mechanisms, it does not *perform* enforcement. Needs maintainer consensus.
+
+## Scenarios
+
+### Scenario: Full Level 1 conformance report
+- GIVEN a public GitHub repository with a Security Insights file
+- WHEN `scorecard --repo=github.com/org/repo --format=osps --osps-level=1` is run
+- THEN the output contains all Level 1 controls with status PASS, FAIL, UNKNOWN, or NOT_APPLICABLE
+- AND each result includes probe-based evidence references
+
+### Scenario: Permission-limited scan produces UNKNOWN
+- GIVEN a scan token without admin access
+- WHEN evaluating OSPS-AC-01.01 (MFA enforcement)
+- THEN the status is `UNKNOWN`
+- AND the limitations field explains "requires org admin visibility"
+
+### Scenario: Release applicability triggers NOT_APPLICABLE
+- GIVEN a repository that has never made a release
+- WHEN evaluating OSPS-DO-01.01 (user guides for released software)
+- THEN the status is `NOT_APPLICABLE`
+- AND the applicability facts record `has_release=false`
+
+### Scenario: Attestation for non-automatable control
+- GIVEN a repo-local metadata file attesting a security assessment was performed
+- WHEN evaluating OSPS-SA-03.01
+- THEN the status is `ATTESTED`
+- AND the evidence includes the attestation source and link
+
+### Scenario: Missing Security Insights file
+- GIVEN a repository without a Security Insights file
+- WHEN evaluating controls dependent on Security Insights data
+- THEN those controls evaluate to `UNKNOWN` with limitation "requires security-insights.yml"
+- AND controls not dependent on Security Insights evaluate normally
+
+### Scenario: Unmapped control
+- GIVEN an OSPS control with no corresponding Scorecard probes in the mapping file
+- WHEN the conformance engine evaluates it
+- THEN the status is `UNKNOWN` with note "no automated evaluation available"
+
+### Scenario: CI gating on conformance
+- GIVEN `scorecard --format=osps --osps-level=1 --fail-on=fail`
+- WHEN any Level 1 control evaluates to `FAIL`
+- THEN the process exits with non-zero exit code
+- AND `UNKNOWN` controls do not cause failure by default
+
+### Scenario: Existing Scorecard behavior unchanged
+- GIVEN any repository
+- WHEN `scorecard --repo=github.com/org/repo --format=json` is run (without `--format=osps`)
+- THEN the output is identical to Scorecard without OSPS conformance changes
diff --git a/openspec/changes/osps-baseline-conformance/specs/pvtr-baseline/spec.md b/openspec/changes/osps-baseline-conformance/specs/pvtr-baseline/spec.md
deleted file mode 100644
index 883ed5a2c4e..00000000000
--- a/openspec/changes/osps-baseline-conformance/specs/pvtr-baseline/spec.md
+++ /dev/null
@@ -1,56 +0,0 @@
-# OSPS Baseline Integration
-
-## Purpose
-
-Enable Scorecard probes to be mapped to OSPS Baseline controls, allowing Scorecard to report repository compliance against the Open Source Project Security Baseline specification.
-
-## Requirements
-
-### Requirement: Probe Annotation
-Probe definition files (`def.yml`) SHALL support an optional `osps_baseline` field that maps the probe to one or more OSPS Baseline control IDs.
-
-### Requirement: Mapping Types
-Each probe-to-control mapping SHALL specify a mapping type: `direct`, `partial`, or `informational`.
-
-### Requirement: Maturity Level Tracking
-Each mapping SHALL include the OSPS Baseline maturity level (1, 2, or 3) of the associated control.
-
-### Requirement: Baseline Output Format
-Scorecard SHALL support an `osps-baseline` output format that groups probe results by OSPS Baseline control and reports per-control compliance status.
-
-### Requirement: Maturity Level Calculation
-The baseline output SHALL calculate the highest maturity level achieved (i.e., the highest level where all controls at that level pass).
-
-### Requirement: Backward Compatibility
-The `osps_baseline` field SHALL be optional. Probes without this field SHALL continue to function as before with no behavior change.
-
-### Requirement: Generated Mapping Documentation
-The build system SHALL generate a mapping document (`docs/osps-baseline-mapping.yaml`) from probe annotations.
-
-## Scenarios
-
-### Scenario: Probe maps to a baseline control
-- GIVEN a probe with `osps_baseline` annotation mapping to control OSPS-BR-01
-- WHEN Scorecard runs with `--format osps-baseline`
-- THEN the output includes OSPS-BR-01 with the probe's finding outcome
-
-### Scenario: Control has no mapped probes
-- GIVEN an OSPS Baseline control with no corresponding Scorecard probes
-- WHEN Scorecard runs with `--format osps-baseline`
-- THEN the control is listed with status `not_assessed`
-
-### Scenario: Multiple probes map to one control
-- GIVEN two probes both annotated with control OSPS-AC-01, one `direct` and one `partial`
-- WHEN both probes return `OutcomeTrue`
-- THEN the control status is `pass`
-
-### Scenario: Partial coverage
-- GIVEN two probes mapped to a control, one returning `OutcomeTrue` and one `OutcomeFalse`
-- WHEN the failing probe has mapping type `direct`
-- THEN the control status is `fail`
-
-### Scenario: Probe without baseline annotation
-- GIVEN a probe with no `osps_baseline` field in its `def.yml`
-- WHEN Scorecard runs with `--format osps-baseline`
-- THEN the probe's results are excluded from the baseline output
-- AND the probe continues to function normally in other output formats

From 21c1319ff02618017b6ee3902988a3308ec95d45 Mon Sep 17 00:00:00 2001
From: Stephen Augustus <foo@auggie.dev>
Date: Thu, 26 Feb 2026 17:05:23 -0800
Subject: [PATCH 04/28] :seedling: Add remaining openspec scaffolding and
 .gitignore update

- Add openspec/specs/core-checks/spec.md and openspec/specs/probes/spec.md
  documenting existing Scorecard architecture for spec-driven development
- Update .gitignore to exclude roadmap drafting notes

Signed-off-by: Stephen Augustus <foo@auggie.dev>
Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Stephen Augustus <foo@auggie.dev>
---
 .gitignore                         |  3 +++
 openspec/specs/core-checks/spec.md | 29 +++++++++++++++++++++
 openspec/specs/probes/spec.md      | 42 ++++++++++++++++++++++++++++++
 3 files changed, 74 insertions(+)
 create mode 100644 openspec/specs/core-checks/spec.md
 create mode 100644 openspec/specs/probes/spec.md

diff --git a/.gitignore b/.gitignore
index fb2ac85231e..01e1083606d 100644
--- a/.gitignore
+++ b/.gitignore
@@ -67,3 +67,6 @@ newRelease.json
 
 # AI tooling instructions
 AGENTS.md
+
+# Roadmap drafting ideas
+docs/roadmap-ideas.md
diff --git a/openspec/specs/core-checks/spec.md b/openspec/specs/core-checks/spec.md
new file mode 100644
index 00000000000..3f4401d065e
--- /dev/null
+++ b/openspec/specs/core-checks/spec.md
@@ -0,0 +1,29 @@
+# Core Checks System
+
+## Purpose
+
+The checks system provides high-level security assessments of open source repositories. Each check produces a score from 0-10 representing how well a project follows a particular security practice.
+
+## Requirements
+
+### Requirement: Check Registration
+The system SHALL allow checks to self-register via `init()` functions using `registerCheck()`.
+
+### Requirement: Parallel Execution
+The system SHALL execute all enabled checks concurrently using goroutines.
+
+### Requirement: Three-Tier Architecture
+Each check SHALL follow the raw data collection -> probe execution -> evaluation pipeline.
+
+### Requirement: Score Range
+All checks SHALL produce a score between 0 (MinResultScore) and 10 (MaxResultScore), or an inconclusive/error result.
+
+### Requirement: Automated Assessment
+All checks SHALL be fully automatable and require no interaction from repository maintainers.
+
+### Requirement: Actionable Results
+All check results SHALL include actionable remediation guidance.
+
+## Current Checks
+
+Binary-Artifacts, Branch-Protection, CI-Tests, CII-Best-Practices, Code-Review, Contributors, Dangerous-Workflow, Dependency-Update-Tool, Fuzzing, License, Maintained, Packaging, Pinned-Dependencies, SAST, Security-Policy, Signed-Releases, Token-Permissions, Vulnerabilities, Webhooks (experimental), SBOM (experimental).
diff --git a/openspec/specs/probes/spec.md b/openspec/specs/probes/spec.md
new file mode 100644
index 00000000000..bb32479d203
--- /dev/null
+++ b/openspec/specs/probes/spec.md
@@ -0,0 +1,42 @@
+# Probes System
+
+## Purpose
+
+Probes are granular, individual heuristics that assess a specific behavior a project may or may not exhibit. They are the atomic units of analysis within Scorecard, composed into higher-level checks.
+
+## Requirements
+
+### Requirement: Boolean Naming
+Probe names SHALL use camelCase and be phrased as boolean statements (e.g., `hasUnverifiedBinaryArtifacts`).
+
+### Requirement: Three-File Structure
+Each probe SHALL consist of exactly three files: `def.yml` (documentation), `impl.go` (implementation), `impl_test.go` (tests).
+
+### Requirement: Finding Outcomes
+Probes SHALL return one or more `finding.Finding` values with outcomes from: `OutcomeTrue`, `OutcomeFalse`, `OutcomeNotApplicable`, `OutcomeNotAvailable`.
+
+### Requirement: Lifecycle Management
+Each probe SHALL declare a lifecycle state in `def.yml`: `Experimental`, `Stable`, or `Deprecated`.
+
+### Requirement: Registration
+Probes SHALL register via `probes.MustRegister()` (check-associated) or `probes.MustRegisterIndependent()` (standalone), and be cataloged in `probes/entries.go`.
+
+### Requirement: Remediation
+Probes SHALL provide remediation guidance in their `def.yml` definition.
+
+## Scenarios
+
+### Scenario: Probe returns true finding
+- GIVEN a repository that exhibits the behavior described by the probe
+- WHEN the probe executes
+- THEN it returns at least one finding with `OutcomeTrue`
+
+### Scenario: Probe returns false finding
+- GIVEN a repository that does not exhibit the behavior
+- WHEN the probe executes
+- THEN it returns at least one finding with `OutcomeFalse`
+
+### Scenario: Probe lacks data
+- GIVEN a repository where the relevant data is unavailable
+- WHEN the probe executes
+- THEN it returns a finding with `OutcomeNotAvailable`

From 7a5c7fdc59078471c92f84c40b6b28b6672f53db Mon Sep 17 00:00:00 2001
From: Stephen Augustus <foo@auggie.dev>
Date: Fri, 27 Feb 2026 05:19:57 -0500
Subject: [PATCH 05/28] :seedling: Add maintainer review feedback to OSPS
 conformance proposal

Stephen's responses to clarifying questions (CQ-1 through CQ-8) and
feedback on the proposal draft:

- Both scoring and conformance modes coexist; no deprecation needed now
- Target OSPS Baseline v2026.02.19 (latest), align with maintenance cadence
- Provide degraded-but-useful evaluation without Security Insights
- Invest in Gemara SDK integration for multi-tool consumption
- Prioritize Level 1 conformance; consume external signals where possible
- Approval requires Stephen + Spencer + 1 non-Steering maintainer
- Q2 outcome should be OSPS Baseline Level 1 conformance
- Land capabilities across all surfaces (CLI, Action, API)

Key changes requested:
- Correct PVTR references (it's the Privateer plugin, not a separate tool)
- Add Darnit and AMPEL comparison
- Replace quarterly timelines with phase-based outcomes
- Plan to extract Scorecard's control catalog for other tools
- Use Mermaid for diagrams
- Create separate OSPS Baseline coverage analysis in docs/
- Create docs/ROADMAP.md for public consumption

Signed-off-by: Stephen Augustus <foo@auggie.dev>
Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Stephen Augustus <foo@auggie.dev>
---
 .../osps-baseline-conformance/proposal.md     | 132 ++++++++++++++++++
 1 file changed, 132 insertions(+)

diff --git a/openspec/changes/osps-baseline-conformance/proposal.md b/openspec/changes/osps-baseline-conformance/proposal.md
index 125db78741c..778f0d9ff8f 100644
--- a/openspec/changes/osps-baseline-conformance/proposal.md
+++ b/openspec/changes/osps-baseline-conformance/proposal.md
@@ -202,3 +202,135 @@ Spencer's position is that OSPS evidence references should point to probe findin
 3. OSPS output is consumable by PVTR as supplementary evidence (validated with ORBIT WG)
 4. All four open questions (OQ-1 through OQ-4) are resolved with documented decisions
 5. No changes to existing check scores or behavior
+
+---
+
+## Maintainer review
+
+### Stephen's notes
+
+<!-- Stephen: Use this section to record your overall impressions, concerns,
+     and positions on the proposal. Edit freely — this is your space. -->
+
+**Overall assessment:**
+
+
+**Key concerns or risks:**
+
+
+**Things I agree with:**
+
+
+**Things I disagree with or want to change:**
+
+- "PVTR" is shorthand for "Privateer". Throughout this proposal it makes it appear as if https://github.com/ossf/pvtr-github-repo-scanner is separate from Privateer, when it is really THE Privateer plugin for GitHub repositories. Any references to PVTR should be corrected.
+- This proposal does not contain an even consideration of the capabilities of [Darnit](https://github.com/kusari-oss/darnit) and [AMPEL](https://github.com/carabiner-dev/ampel). We should do that comparison to get a better idea of what should be in or out of scope for Scorecard.
+- The timeline that is in this proposal is not accurate, as we're already about to enter Q2 2026. We should focus on phases and outcomes, and let maintainer bandwidth dictate delivery timing.
+- Scorecard has an existing set of checks and probes, which is essentially a control catalog. We should make a plan to extract the Scorecard control catalog so that it can be used by other tools that can handle evaluation tasks.
+- Use Mermaid when creating diagrams.
+- We need to understand what level of coverage Scorecard currently has for OSPS Baseline and that analysis should be created in a separate file (in `docs/`). Assume that any existing findings are out-of-date.
+- `docs/roadmap-ideas.md` will not be committed to the repo, as it is a rough draft which needs to be refined for public consumption. We should create `docs/ROADMAP.md` with a 2026 second-level heading with contains the publicly-consummable roadmap.
+
+**Priority ordering — what matters most to ship first:**
+
+
+### Clarifying questions
+
+The following questions need your input before this proposal can move to design. Please fill in your response under each question.
+
+#### CQ-1: Scorecard as a conformance tool — product identity
+
+The proposal frames this as a "product-level shift" where Scorecard gains a second mode: conformance evaluation alongside its existing scoring. Does this framing match your vision, or do you see conformance as eventually *replacing* the scoring model? Should we be thinking about deprecating 0-10 scores long-term, or do both modes coexist indefinitely?
+
+**Stephen's response:**
+
+I believe the scoring model will continue to be useful to consumers and it should be maintained. For now, both modes should coexist. There is no need to make a decision about this for the current iteration of the proposal.
+
+#### CQ-2: OSPS Baseline version targeting
+
+The roadmap targets OSPS Baseline v2025-10-10. The PVTR scanner targets v2025-02-25. The Baseline is a living spec with periodic releases. How should Scorecard handle version drift? Options:
+- Support only the latest version at any given time
+- Support multiple versions concurrently via the versioned mapping file
+- Pin to a version and update on a defined cadence (e.g., quarterly)
+
+**Stephen's response:**
+
+The current version of the OSPS Baseline is [v2026.02.19](https://baseline.openssf.org/versions/2026-02-19).
+
+We should align with the latest version at first and have a process for aligning with new versions on a defined cadence. We should understand the [OSPS Baseline maintenance process](https://baseline.openssf.org/maintenance.html) and align with it.
+
+The OSPS Baseline [FAQ](https://baseline.openssf.org/faq.html) and [Implementation Guidance for Maintainers](https://baseline.openssf.org/maintainers.html) may have guidance we should consider incorporating.
+
+#### CQ-3: Security Insights as a hard dependency
+
+Many OSPS controls depend on Security Insights data (official channels, distribution points, subproject inventory, core team). PVTR treats the Security Insights file as nearly required — most of its evaluation steps begin with `HasSecurityInsightsFile`.
+
+Should Scorecard:
+- Treat Security Insights the same way (controls that need it go UNKNOWN without it)?
+- Provide a degraded but still useful evaluation without it?
+- Accept alternative metadata sources (e.g., `.project`, custom config)?
+
+This also raises a broader adoption question: most projects today don't have a `security-insights.yml`. How do we avoid making the OSPS output useless for the majority of repositories?
+
+**Stephen's response:**
+
+We should provide a degraded, but still-useful evaluation without a Security Insights file, especially since our probes today can already cover a lot of ground without it. It would be good for us to eventually support alternative metadata sources, but this should not be an immediate goal.
+
+#### CQ-4: PVTR relationship — complement vs. converge
+
+The proposal positions Scorecard as complementary to PVTR. But there's a deeper question: should this stay as two separate tools indefinitely, or is the long-term goal convergence (e.g., PVTR consuming Scorecard as a library, or Scorecard becoming a Privateer plugin itself)? Your position on this affects how tightly we couple the output formats and whether we invest in Gemara SDK integration.
+
+**Stephen's response:**
+
+Multiple tools should be able to consume Scorecard, so yes, we should invest in Gemara SDK integration.
+
+#### CQ-5: Scope of new probes in 2026
+
+The roadmap calls for significant new probe development (secrets detection, governance/docs presence, dependency manifests, release asset inspection, enforcement detection). That's a lot of new surface area. Should we:
+- Build all of these within Scorecard?
+- Prioritize a subset and defer the rest?
+- Look for ways to consume signals from external tools (e.g., GitHub's secret scanning API, SBOM tools) rather than building detection from scratch?
+
+If prioritizing, which new probes matter most to you?
+
+**Stephen's response:**
+
+We should prioritize OSPS Baseline Level 1 conformance work.
+We should consider any signals that can be consumed from external sources.
+
+#### CQ-6: Community and governance process
+
+This is a major initiative touching Scorecard's product direction. What's the governance process for getting this approved?
+- Does this need a formal proposal to the Scorecard maintainer group?
+- Should this be presented at an ORBIT WG meeting?
+- Do we need sign-off from the OpenSSF TAC?
+- Who else beyond you and Spencer needs to weigh in?
+
+**Stephen's response:**
+
+We should have Stephen and Spencer sign off on this proposal as Steering Committee members. In addition, we should have reviews from:
+- [blocking] At least 1 non-Steering Scorecard maintainer
+- [non-blocking] Maintainers of tools in the WG ORBIT ecosystem
+
+This does not require review from the TAC, but we should inform WG ORBIT members.
+
+#### CQ-7: The "minimum viable conformance report"
+
+If we had to ship the smallest useful thing in Q1, what would it be? The roadmap proposes the full OSPS output format + mapping file + applicability engine. But a simpler starting point might be:
+- Just the mapping file (documentation-only, no runtime)
+- A `--format=osps` that only reports on controls Scorecard already covers (no new probes, lots of UNKNOWN)
+- Something else?
+
+What would make Q1 a success in your eyes?
+
+**Stephen's response:**
+
+As previously mentioned, the quarterly targets are not currently accurate. One of our Q2 outcomes should be OSPS Baseline Level 1 conformance.
+
+#### CQ-8: Existing Scorecard Action and API impact
+
+Scorecard runs at scale via the Scorecard Action (GitHub Action) and the public API (api.scorecard.dev). Should OSPS conformance be available through these surfaces from day one, or should it start as a CLI-only feature? The API and Action have their own release and stability considerations.
+
+**Stephen's response:**
+
+We need to land these capabilities for as much surface area as possible.

From 007cd747f06b46a9ceae39cc0770bee40913fd89 Mon Sep 17 00:00:00 2001
From: Stephen Augustus <foo@auggie.dev>
Date: Fri, 27 Feb 2026 05:28:47 -0500
Subject: [PATCH 06/28] :seedling: Address maintainer feedback on OSPS
 conformance proposal
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Changes based on Stephen's review:

- Replace all "PVTR" references with "Privateer plugin for GitHub
  repositories" — it's the Privateer plugin, not a separate tool
- Add ecosystem tooling comparison section covering Darnit (compliance
  audit + remediation), AMPEL (attestation-based policy enforcement),
  Privateer plugin (Baseline evaluation), and Scorecard (measurement)
- Replace quarterly timeline (Q1-Q4) with phase-based delivery
  (Phase 1-3) focused on outcomes, not calendar dates
- Update OSPS Baseline version from v2025-10-10 to v2026.02.19
- Convert ASCII ecosystem diagram to Mermaid
- Add Scorecard control catalog extraction to scope
- Add Gemara SDK integration to scope
- Update coverage snapshot to reference docs/osps-baseline-coverage.md
  (to be created with fresh analysis)
- Add approval process section based on governance answers
- Update Security Insights requirement to degraded-but-useful mode
- Add integration pipeline diagram (Scorecard -> Darnit -> AMPEL)

Signed-off-by: Stephen Augustus <foo@auggie.dev>
Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Stephen Augustus <foo@auggie.dev>
---
 .../osps-baseline-conformance/proposal.md     | 187 +++++++++++-------
 .../specs/osps-conformance/spec.md            |   8 +-
 2 files changed, 117 insertions(+), 78 deletions(-)

diff --git a/openspec/changes/osps-baseline-conformance/proposal.md b/openspec/changes/osps-baseline-conformance/proposal.md
index 778f0d9ff8f..2a908ebba67 100644
--- a/openspec/changes/osps-baseline-conformance/proposal.md
+++ b/openspec/changes/osps-baseline-conformance/proposal.md
@@ -10,9 +10,9 @@ This is fundamentally a **product-level shift**: Scorecard today answers "how we
 
 ### Why now
 
-1. **OSPS Baseline is the emerging standard.** The OSPS Baseline (v2025-10-10) defines 59 controls across 3 maturity levels. It is maintained within the ORBIT Working Group and is becoming the reference framework for open source project security posture.
+1. **OSPS Baseline is the emerging standard.** The OSPS Baseline (v2026.02.19) defines controls across 3 maturity levels. It is maintained within the ORBIT Working Group and is becoming the reference framework for open source project security posture. See the [OSPS Baseline maintenance process](https://baseline.openssf.org/maintenance.html) for the versioning cadence.
 
-2. **The ecosystem is moving.** The PVTR GitHub Repo Scanner already evaluates 39 of 52 control requirements and powers LFX Insights security results. The OSPS Baseline GitHub Action can upload SARIF. Best Practices Badge is staging Baseline-phase work. Scorecard's large install base is an advantage, but only if it ships a conformance surface.
+2. **The ecosystem is moving.** The [Privateer plugin for GitHub repositories](https://github.com/ossf/pvtr-github-repo-scanner) already evaluates 39 of 52 control requirements and powers LFX Insights security results. The OSPS Baseline GitHub Action can upload SARIF. Best Practices Badge is staging Baseline-phase work. Scorecard's large install base is an advantage, but only if it ships a conformance surface.
 
 3. **ORBIT WG alignment.** Scorecard sits within the OpenSSF alongside the ORBIT WG. The ORBIT charter's mission is "to develop and maintain interoperable resources related to the identification and presentation of security-relevant data." Scorecard producing interoperable conformance results is a natural fit.
 
@@ -20,28 +20,53 @@ This is fundamentally a **product-level shift**: Scorecard today answers "how we
 
 ### What Scorecard brings that others don't
 
-- **Deep automated analysis.** 50+ probes with structured results provide granular evidence that PVTR's shallower checks cannot match (e.g., per-workflow token permission analysis, detailed branch protection rule inspection, CI/CD injection pattern detection).
+- **Deep automated analysis.** 50+ probes with structured results provide granular evidence that the Privateer GitHub plugin's shallower checks cannot match (e.g., per-workflow token permission analysis, detailed branch protection rule inspection, CI/CD injection pattern detection).
 - **Multi-platform support.** GitHub, GitLab, Azure DevOps, and local directory scanning.
 - **Massive install base.** Scorecard Action, public API, and cron-based scanning infrastructure.
 - **Existing policy machinery.** The `policy/` package and structured results were designed for exactly this kind of downstream consumption.
 
+### Ecosystem tooling comparison
+
+Several tools operate in adjacent spaces. Understanding their capabilities clarifies what is and isn't Scorecard's job.
+
+| Dimension | **Scorecard** | **[Darnit](https://github.com/kusari-oss/darnit)** | **[AMPEL](https://github.com/carabiner-dev/ampel)** | **[Privateer GitHub Plugin](https://github.com/ossf/pvtr-github-repo-scanner)** |
+|-----------|--------------|-----------|-----------|-------------|
+| **Purpose** | Security health measurement | Compliance audit + remediation | Attestation-based policy enforcement | Baseline conformance evaluation |
+| **Action** | Analyzes repositories (read-only) | Audits + fixes repositories | Verifies attestations against policies | Evaluates repos against OSPS controls |
+| **Data source** | Collects from APIs/code | Analyzes repo state | Consumes attestations only | Collects from GitHub API + Security Insights |
+| **Output** | Scores (0-10) + probe findings | PASS/FAIL + attestations + fixes | PASS/FAIL + results attestation | Gemara L4 assessment results |
+| **OSPS Baseline** | Partial (via probes) | Full (62 controls) | Via policy rules | 39 of 52 controls |
+| **In-toto** | Produces attestations | Produces attestations | Consumes + verifies | N/A |
+| **OSCAL** | No | No | Native support | N/A |
+| **Sigstore** | No | Signs attestations | Verifies signatures | N/A |
+| **Gemara** | Not yet (planned) | No | No | L2 + L4 native |
+| **Maturity** | Production (v5.3.0) | Alpha (v0.1.0, Jan 2026) | Production (v1.0.0) | Production, powers LFX Insights |
+| **Language** | Go | Python | Go | Go |
+
+**Integration model — three-layer pipeline:**
+
+```mermaid
+flowchart LR
+    Scorecard["Scorecard\n(Measure)"] -->|findings + attestations| Darnit["Darnit\n(Audit + Remediate)"]
+    Scorecard -->|findings + attestations| AMPEL["AMPEL\n(Enforce)"]
+    Darnit -->|compliance attestation| AMPEL
+    Scorecard -->|conformance evidence| Privateer["Privateer Plugin\n(Baseline evaluation)"]
+```
+
+Scorecard is the **data source** (measures repository security). Darnit audits compliance and remediates. AMPEL enforces policies on attestations. The Privateer plugin evaluates Baseline conformance. They are complementary, not competing.
+
 ### What Scorecard must not do
 
-- **Duplicate PVTR's role as a Privateer plugin.** PVTR is the Baseline evaluator in the ORBIT ecosystem diagram. Scorecard should complement it with deeper analysis and interoperable output, not fork the evaluation model.
-- **Duplicate remediation engines.** Tools like Darn handle remediation. Scorecard exports stable, machine-readable findings for remediation tools to consume.
+- **Duplicate the Privateer plugin's role.** The [Privateer plugin for GitHub repositories](https://github.com/ossf/pvtr-github-repo-scanner) is the Baseline evaluator in the ORBIT ecosystem. Scorecard should complement it with deeper analysis and interoperable output, not fork the evaluation model.
+- **Duplicate remediation.** Darnit handles compliance auditing and automated remediation (PR creation, file generation, AI-assisted fixes). Scorecard is read-only.
+- **Duplicate attestation policy enforcement.** AMPEL verifies attestations against policies and gates CI/CD pipelines. Scorecard *produces* attestations for AMPEL to consume.
 - **Turn OSPS controls into Scorecard checks.** OSPS conformance is a layer that consumes existing Scorecard signals, not 59 new checks.
 
 ## Current state
 
-### Coverage snapshot (Scorecard signals vs. OSPS v2025-10-10)
+### Coverage snapshot
 
-| Level | Controls | Covered (✅) | Partial (⚠️) | Not covered (❌) |
-|-------|----------|-------------|-------------|-----------------|
-| 1     | 24       | 6           | 9           | 9               |
-| 2     | 18       | 1           | 7           | 10              |
-| 3     | 17       | 0           | 5           | 12              |
-
-The full control-by-control mapping is in Appendix A of `docs/roadmap-ideas.md`.
+A fresh analysis of Scorecard's current coverage against OSPS Baseline v2026.02.19 is tracked in `docs/osps-baseline-coverage.md`. Previous coverage estimates against older Baseline versions should be treated as out-of-date.
 
 ### Existing Scorecard surfaces that matter
 
@@ -102,9 +127,10 @@ Spencer's position is that OSPS evidence references should point to probe findin
 2. **OSPS output format** — `--format=osps` producing a JSON conformance report
 3. **Versioned mapping file** — data-driven YAML mapping OSPS control IDs to Scorecard probes, applicability rules, and evaluation logic
 4. **Applicability engine** — detects preconditions (e.g., "has made a release") and outputs NOT_APPLICABLE
-5. **Security Insights ingestion** — reads `security-insights.yml` to satisfy metadata-dependent controls, aligning with the ORBIT ecosystem data plane
+5. **Security Insights ingestion** — reads `security-insights.yml` to satisfy metadata-dependent controls, aligning with the ORBIT ecosystem data plane; provides degraded-but-useful evaluation when absent
 6. **Attestation mechanism (v1)** — accepts repo-local metadata for non-automatable controls (pending OQ-1 resolution)
-7. **New probes and probe enhancements** for gap controls:
+7. **Scorecard control catalog extraction** — plan and mechanism to make Scorecard's control definitions consumable by other tools
+8. **New probes and probe enhancements** for gap controls:
    - Secrets detection (OSPS-BR-07.01)
    - Governance/docs presence (OSPS-GV-02.01, GV-03.01, DO-01.01, DO-02.01)
    - Dependency manifest presence (OSPS-QA-02.01)
@@ -112,96 +138,109 @@ Spencer's position is that OSPS evidence references should point to probe findin
    - Release asset inspection (multiple L2/L3 controls)
    - Signed manifest support (OSPS-BR-06.01)
    - Enforcement detection (OSPS-VM-05.*, VM-06.* — pending OQ-2 resolution)
-8. **CI gating** — `--fail-on=fail` exit code for pipeline integration
-9. **Multi-repo project-level conformance** (OSPS-QA-04.02)
-10. **Gemara Layer 4 compatibility** — output structurally compatible with ORBIT assessment result schemas
+9. **CI gating** — `--fail-on=fail` exit code for pipeline integration
+10. **Multi-repo project-level conformance** (OSPS-QA-04.02)
+11. **Gemara SDK integration** — output structurally compatible with ORBIT assessment result schemas; invest in Gemara SDK for multi-tool consumption
 
 ### Out of scope
 
 - Remediation automation (Darn's domain)
-- Replacing PVTR as a Privateer plugin
+- Replacing the Privateer plugin for GitHub repositories
 - Changing existing check scores or behavior
 - Platform enforcement (Minder's domain)
 - OSPS Baseline specification changes (ORBIT WG's domain)
 
 ## Phased delivery
 
-### Q1 2026: OSPS conformance alpha + ecosystem handshake
+Phases are ordered by outcome, not calendar quarter. Maintainer bandwidth dictates delivery timing.
 
-- OSPS output format (alpha) with `--format=osps`
-- Versioned mapping file for v2025-10-10 (alpha)
-- Applicability engine v1
-- Design + document ORBIT interop commitments (Security Insights, Gemara compatibility, PVTR complementarity)
-- Map existing probes to OSPS controls where coverage already exists
+### Phase 1: Conformance foundation + Level 1 coverage
 
-### Q2 2026: Level 1 coverage spike + declared metadata v1
+**Outcome:** Scorecard produces a useful OSPS Baseline Level 1 conformance report for any public GitHub repository, available across CLI, Action, and API surfaces.
 
-- Secrets detection probe (OSPS-BR-07.01)
-- Governance + docs presence probes (GV-02.01, GV-03.01, DO-01.01, DO-02.01)
-- Dependency manifest presence probe (QA-02.01)
-- Security policy deepening (VM-02.01, VM-03.01, VM-01.01)
-- Security Insights ingestion v1 (BR-03.01, BR-03.02, QA-04.01)
-- Attestation mechanism v1 for non-automatable controls
+- OSPS output format with `--format=osps`
+- Versioned mapping file for OSPS Baseline v2026.02.19
+- Applicability engine (detect "has made a release" and other preconditions)
+- Map existing probes to OSPS controls where coverage exists today
+- New probes for Level 1 gaps (prioritized by coverage impact):
+  - Governance/docs presence (GV-02.01, GV-03.01, DO-01.01, DO-02.01)
+  - Dependency manifest presence (QA-02.01)
+  - Security policy deepening (VM-02.01, VM-03.01, VM-01.01)
+  - Secrets detection (BR-07.01) — consume platform signals (e.g., GitHub secret scanning API) where possible
+- Security Insights ingestion v1 (BR-03.01, BR-03.02, QA-04.01) with degraded-but-useful evaluation when absent
 - CI gating: `--fail-on=fail` + coverage summary
+- Design + document ORBIT interop commitments (Security Insights, Gemara compatibility, Privateer complementarity)
+- Scorecard control catalog extraction plan (enabling other tools to consume Scorecard's control definitions)
+
+### Phase 2: Release integrity + Level 2 core
 
-### Q3 2026: Release integrity + Level 2 core
+**Outcome:** Scorecard evaluates release-related OSPS controls, covering the core of Level 2 and becoming useful for downstream due diligence workflows.
 
 - Release asset inspection layer (detect compiled assets, SBOMs, licenses with releases)
 - Signed manifest support (BR-06.01)
 - Release notes/changelog detection (BR-04.01)
+- Attestation mechanism v1 for non-automatable controls (pending OQ-1 resolution)
 - Evidence bundle output v1 (OSPS result JSON + in-toto statement + SARIF for failures)
+- Gemara SDK integration for interoperable output
 
-### Q4 2026: Enforcement detection + Level 3 + multi-repo
+### Phase 3: Enforcement detection + Level 3 + multi-repo
 
-- SCA policy + enforcement detection (VM-05.*)
-- SAST policy + enforcement detection (VM-06.*)
+**Outcome:** Scorecard covers Level 3 controls including enforcement detection and project-level aggregation.
+
+- SCA policy + enforcement detection (VM-05.* — pending OQ-2 resolution)
+- SAST policy + enforcement detection (VM-06.* — pending OQ-2 resolution)
 - Multi-repo project-level conformance aggregation (QA-04.02)
 - Attestation integration GA
 
 ## Relationship to ORBIT ecosystem
 
-```
-┌─────────────────────────────────────────────────────┐
-│                   ORBIT WG Ecosystem                │
-│                                                     │
-│  ┌──────────┐  ┌───────────┐  ┌──────────────────┐  │
-│  │  OSPS    │  │  Gemara   │  │    Security       │  │
-│  │ Baseline │  │  Schemas  │  │    Insights       │  │
-│  │ (controls│  │  (L2/L4)  │  │    (metadata)     │  │
-│  └────┬─────┘  └─────┬─────┘  └────────┬─────────┘  │
-│       │              │                  │            │
-│       ▼              ▼                  ▼            │
-│  ┌─────────────────────────────────────────────┐     │
-│  │         PVTR GitHub Repo Scanner            │     │
-│  │  (Privateer plugin, LFX Insights driver)    │     │
-│  └─────────────────────┬───────────────────────┘     │
-│                        │                             │
-│                  consumes ▲                           │
-│                        │                             │
-│  ┌─────────────────────┴───────────────────────┐     │
-│  │         OpenSSF Scorecard                   │     │
-│  │  (deep analysis, conformance output,        │     │
-│  │   multi-platform, large install base)       │     │
-│  └─────────────────────────────────────────────┘     │
-│                                                     │
-│  ┌──────────┐  ┌───────────┐                         │
-│  │  Minder  │  │   Darn    │                         │
-│  │ (enforce)│  │ (remediate│                         │
-│  └──────────┘  └───────────┘                         │
-└─────────────────────────────────────────────────────┘
+```mermaid
+flowchart TD
+    subgraph ORBIT["ORBIT WG Ecosystem"]
+        Baseline["OSPS Baseline\n(controls)"]
+        Gemara["Gemara\n(schemas: L2/L4)"]
+        SI["Security Insights\n(metadata)"]
+
+        subgraph Evaluation["Evaluation"]
+            Privateer["Privateer GitHub Plugin\n(LFX Insights driver)"]
+            Scorecard["OpenSSF Scorecard\n(deep analysis, conformance output,\nmulti-platform, large install base)"]
+        end
+
+        Minder["Minder\n(enforce)"]
+        Darn["Darn\n(remediate)"]
+    end
+
+    Baseline -->|defines controls| Privateer
+    Baseline -->|defines controls| Scorecard
+    Gemara -->|provides schemas| Privateer
+    Gemara -->|provides schemas| Scorecard
+    SI -->|provides metadata| Privateer
+    SI -->|provides metadata| Scorecard
+    SI -->|provides metadata| Minder
+    Scorecard -->|conformance evidence| Privateer
+    Scorecard -->|findings| Minder
+    Scorecard -->|findings| Darn
 ```
 
-**Scorecard's role**: Produce deep, probe-based conformance evidence that PVTR, Minder, and downstream consumers can use. Scorecard reads Security Insights (shared data plane), outputs Gemara L4-compatible results (shared schema), and fills analysis gaps where PVTR has `NotImplemented` steps.
+**Scorecard's role**: Produce deep, probe-based conformance evidence that the Privateer plugin, Minder, and downstream consumers can use. Scorecard reads Security Insights (shared data plane), outputs Gemara L4-compatible results (shared schema), and fills analysis gaps where the Privateer plugin has `NotImplemented` steps.
 
-**What Scorecard does NOT do**: Replace PVTR as the Privateer plugin, enforce policies (Minder's role), or perform remediation (Darn's role).
+**What Scorecard does NOT do**: Replace the Privateer plugin, enforce policies (Minder's role), or perform remediation (Darn's role).
 
 ## Success criteria
 
 1. `scorecard --format=osps --osps-level=1` produces a valid conformance report for any public GitHub repository
-2. Level 1 auto-check coverage reaches ≥80% of controls (currently ~25%)
-3. OSPS output is consumable by PVTR as supplementary evidence (validated with ORBIT WG)
-4. All four open questions (OQ-1 through OQ-4) are resolved with documented decisions
-5. No changes to existing check scores or behavior
+2. OSPS Baseline Level 1 conformance is achieved (Phase 1 outcome)
+3. OSPS output is available across CLI, Action, and API surfaces
+4. OSPS output is consumable by the Privateer plugin as supplementary evidence (validated with ORBIT WG)
+5. All four open questions (OQ-1 through OQ-4) are resolved with documented decisions
+6. No changes to existing check scores or behavior
+
+## Approval process
+
+- **[blocking]** Sign-off from Stephen Augustus and Spencer (Steering Committee)
+- **[blocking]** Review from at least 1 non-Steering Scorecard maintainer
+- **[non-blocking]** Reviews from maintainers of tools in the WG ORBIT ecosystem
+- **[informational]** Notify WG ORBIT members (TAC sign-off not required)
 
 ---
 
@@ -248,7 +287,7 @@ I believe the scoring model will continue to be useful to consumers and it shoul
 
 #### CQ-2: OSPS Baseline version targeting
 
-The roadmap targets OSPS Baseline v2025-10-10. The PVTR scanner targets v2025-02-25. The Baseline is a living spec with periodic releases. How should Scorecard handle version drift? Options:
+The roadmap previously targeted OSPS Baseline v2025-10-10. The Privateer GitHub plugin targets v2025-02-25. The Baseline is a living spec with periodic releases. How should Scorecard handle version drift? Options:
 - Support only the latest version at any given time
 - Support multiple versions concurrently via the versioned mapping file
 - Pin to a version and update on a defined cadence (e.g., quarterly)
@@ -263,7 +302,7 @@ The OSPS Baseline [FAQ](https://baseline.openssf.org/faq.html) and [Implementati
 
 #### CQ-3: Security Insights as a hard dependency
 
-Many OSPS controls depend on Security Insights data (official channels, distribution points, subproject inventory, core team). PVTR treats the Security Insights file as nearly required — most of its evaluation steps begin with `HasSecurityInsightsFile`.
+Many OSPS controls depend on Security Insights data (official channels, distribution points, subproject inventory, core team). The Privateer GitHub plugin treats the Security Insights file as nearly required — most of its evaluation steps begin with `HasSecurityInsightsFile`.
 
 Should Scorecard:
 - Treat Security Insights the same way (controls that need it go UNKNOWN without it)?
@@ -278,7 +317,7 @@ We should provide a degraded, but still-useful evaluation without a Security Ins
 
 #### CQ-4: PVTR relationship — complement vs. converge
 
-The proposal positions Scorecard as complementary to PVTR. But there's a deeper question: should this stay as two separate tools indefinitely, or is the long-term goal convergence (e.g., PVTR consuming Scorecard as a library, or Scorecard becoming a Privateer plugin itself)? Your position on this affects how tightly we couple the output formats and whether we invest in Gemara SDK integration.
+The proposal positions Scorecard as complementary to the Privateer plugin. But there's a deeper question: should this stay as two separate tools indefinitely, or is the long-term goal convergence (e.g., the Privateer plugin consuming Scorecard as a library, or Scorecard becoming a Privateer plugin itself)? Your position on this affects how tightly we couple the output formats and whether we invest in Gemara SDK integration.
 
 **Stephen's response:**
 
diff --git a/openspec/changes/osps-baseline-conformance/specs/osps-conformance/spec.md b/openspec/changes/osps-baseline-conformance/specs/osps-conformance/spec.md
index 074174da97b..781b44114f6 100644
--- a/openspec/changes/osps-baseline-conformance/specs/osps-conformance/spec.md
+++ b/openspec/changes/osps-baseline-conformance/specs/osps-conformance/spec.md
@@ -26,7 +26,7 @@ The conformance engine SHALL be additive. Existing checks, probes, scores, and o
 ### Mapping
 
 #### Requirement: Versioned mapping file
-The mapping between OSPS Baseline controls and Scorecard probes SHALL be maintained as a data-driven, versioned YAML file (e.g., `pkg/osps/mappings/v2025-10-10.yaml`), not hard-coded.
+The mapping between OSPS Baseline controls and Scorecard probes SHALL be maintained as a data-driven, versioned YAML file (e.g., `pkg/osps/mappings/v2026-02-19.yaml`), not hard-coded.
 
 #### Requirement: Mapping contents
 Each mapping entry SHALL specify:
@@ -72,13 +72,13 @@ The conformance engine SHALL accept attestation evidence from a repo-local metad
 
 ### Ecosystem Interoperability
 
-#### Requirement: Complementarity with PVTR
-Scorecard SHALL NOT duplicate the PVTR GitHub Repo Scanner's role as a Privateer plugin. Scorecard provides deep probe-based analysis; PVTR can consume Scorecard's OSPS output as supplementary evidence.
+#### Requirement: Complementarity with the Privateer plugin
+Scorecard SHALL NOT duplicate the [Privateer plugin for GitHub repositories](https://github.com/ossf/pvtr-github-repo-scanner). Scorecard provides deep probe-based analysis; the Privateer plugin can consume Scorecard's OSPS output as supplementary evidence.
 
 #### Requirement: No enforcement
 Scorecard evaluates and reports conformance. It SHALL NOT enforce policies. The `--fail-on=fail` exit code is a reporting mechanism; the CI system is the enforcer.
 
-> **Open Question (OQ-2)**: Spencer asked whether "enforcement detection" (Q4 roadmap: detecting whether SCA/SAST gating exists) conflicts with Scorecard's stated non-enforcement role. Proposed distinction: Scorecard *detects* enforcement mechanisms, it does not *perform* enforcement. Needs maintainer consensus.
+> **Open Question (OQ-2)**: Spencer asked whether "enforcement detection" (Phase 3: detecting whether SCA/SAST gating exists) conflicts with Scorecard's stated non-enforcement role. Proposed distinction: Scorecard *detects* enforcement mechanisms, it does not *perform* enforcement. Needs maintainer consensus.
 
 ## Scenarios
 

From d67011be6a1578ffd00b6ea3a011172cc4ec0c3a Mon Sep 17 00:00:00 2001
From: Stephen Augustus <foo@auggie.dev>
Date: Fri, 27 Feb 2026 05:35:24 -0500
Subject: [PATCH 07/28] :seedling: Add OSPS Baseline coverage analysis and
 public roadmap

Create docs/osps-baseline-coverage.md with a control-by-control analysis
of Scorecard's current probe coverage against OSPS Baseline v2026.02.19.
Coverage summary: 8 COVERED, 17 PARTIAL, 31 GAP, 3 NOT_OBSERVABLE across
59 controls.

Create docs/ROADMAP.md with a publicly-consumable 2026 roadmap organized
into three phases: conformance foundation + Level 1, release integrity +
Level 2, and enforcement detection + Level 3.

Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Stephen Augustus <foo@auggie.dev>
---
 docs/ROADMAP.md                | 116 ++++++++++++++++++++++++
 docs/osps-baseline-coverage.md | 161 +++++++++++++++++++++++++++++++++
 2 files changed, 277 insertions(+)
 create mode 100644 docs/ROADMAP.md
 create mode 100644 docs/osps-baseline-coverage.md

diff --git a/docs/ROADMAP.md b/docs/ROADMAP.md
new file mode 100644
index 00000000000..3ce3c4745e5
--- /dev/null
+++ b/docs/ROADMAP.md
@@ -0,0 +1,116 @@
+# OpenSSF Scorecard Roadmap
+
+## 2026
+
+### Theme: OSPS Baseline Conformance
+
+Scorecard's primary initiative for 2026 is adding
+[OSPS Baseline](https://baseline.openssf.org/) conformance evaluation,
+enabling Scorecard to answer the question: _does this project meet the
+security requirements defined by the OSPS Baseline at a given maturity level?_
+
+This is a new product surface alongside Scorecard's existing 0-10 scoring
+model. Existing checks, probes, and scores are unchanged. The conformance
+layer consumes existing Scorecard signals and adds a per-control
+PASS/FAIL/UNKNOWN/NOT_APPLICABLE/ATTESTED output aligned with the
+[ORBIT WG](https://github.com/ossf/wg-orbit) ecosystem.
+
+**Target Baseline version:** [v2026.02.19](https://baseline.openssf.org/versions/2026-02-19)
+
+**Current coverage:** See [docs/osps-baseline-coverage.md](osps-baseline-coverage.md)
+for a control-by-control analysis.
+
+### Phased delivery
+
+Phases are ordered by outcome. Maintainer bandwidth dictates delivery timing.
+
+#### Phase 1: Conformance foundation and Level 1 coverage
+
+**Outcome:** Scorecard produces a useful OSPS Baseline Level 1 conformance
+report for any public GitHub repository, available across CLI, Action, and
+API surfaces.
+
+Deliverables:
+
+- OSPS output format (`--format=osps`)
+- Versioned mapping file (YAML) mapping OSPS controls to Scorecard probes
+- Applicability engine detecting preconditions (e.g., "has made a release")
+- New probes for Level 1 gaps:
+  - Governance and documentation presence (OSPS-GV-02.01, GV-03.01,
+    DO-01.01, DO-02.01)
+  - Dependency manifest presence (OSPS-QA-02.01)
+  - Security policy deepening (OSPS-VM-02.01, VM-03.01)
+  - Secrets detection (OSPS-BR-07.01) — consuming platform signals where
+    available
+- Security Insights ingestion (OSPS-BR-03.01, BR-03.02, QA-04.01)
+- CI gating via `--fail-on=fail`
+- Scorecard control catalog extraction plan
+
+#### Phase 2: Release integrity and Level 2 core
+
+**Outcome:** Scorecard evaluates release-related OSPS controls, covering the
+core of Level 2 and becoming useful for downstream due diligence workflows.
+
+Deliverables:
+
+- Release asset inspection layer
+- Signed manifest support (OSPS-BR-06.01)
+- Release notes and changelog detection (OSPS-BR-04.01)
+- Attestation mechanism for non-automatable controls
+- Evidence bundle output (OSPS result JSON + in-toto statement)
+- Gemara SDK integration for interoperable output
+
+#### Phase 3: Enforcement detection, Level 3, and multi-repo
+
+**Outcome:** Scorecard covers Level 3 controls including enforcement detection
+and project-level aggregation.
+
+Deliverables:
+
+- SCA policy and enforcement detection (OSPS-VM-05.*)
+- SAST policy and enforcement detection (OSPS-VM-06.*)
+- Multi-repo project-level conformance aggregation (OSPS-QA-04.02)
+- Attestation integration GA
+
+### Ecosystem alignment
+
+Scorecard operates within the ORBIT WG ecosystem as a measurement and
+evidence tool. It does not duplicate:
+
+- **[Privateer plugin for GitHub repositories](https://github.com/ossf/pvtr-github-repo-scanner)** — Baseline evaluation powered by Gemara and Security Insights
+- **[Darnit](https://github.com/kusari-oss/darnit)** — Compliance audit and remediation
+- **[AMPEL](https://github.com/carabiner-dev/ampel)** — Attestation-based policy enforcement
+
+Scorecard's role is to produce deep, probe-based conformance evidence that
+these tools and downstream consumers can use.
+
+### Design principles
+
+1. **UNKNOWN-first honesty.** If Scorecard cannot observe a control, the
+   status is UNKNOWN with an explanation — never a false PASS or FAIL.
+2. **Probes are the evidence unit.** OSPS evidence references probes and
+   their findings, not check-level scores.
+3. **Additive, not breaking.** Existing checks, probes, scores, and output
+   formats do not change behavior.
+4. **Data-driven mapping.** The mapping between OSPS controls and Scorecard
+   probes is a versioned YAML file, not hard-coded logic.
+5. **Degraded-but-useful without Security Insights.** Projects without a
+   `security-insights.yml` still get a meaningful (if incomplete) report.
+
+### Open questions
+
+The following design questions are under active discussion among maintainers:
+
+- **Attestation identity model** — How non-automatable controls are attested
+  (repo-local metadata vs. signed attestations via Sigstore/OIDC)
+- **Enforcement detection scope** — How Scorecard detects enforcement
+  mechanisms without being an enforcement tool itself
+- **Evidence format** — Ensuring output compatibility with Gemara Layer 4
+  assessment schemas
+
+### How to contribute
+
+See the [proposal](../openspec/changes/osps-baseline-conformance/proposal.md)
+and [spec](../openspec/changes/osps-baseline-conformance/specs/osps-conformance/spec.md)
+for detailed requirements. Discussion and feedback are welcome via GitHub
+issues and the Scorecard community meetings.
diff --git a/docs/osps-baseline-coverage.md b/docs/osps-baseline-coverage.md
new file mode 100644
index 00000000000..9c80bffa8db
--- /dev/null
+++ b/docs/osps-baseline-coverage.md
@@ -0,0 +1,161 @@
+# OSPS Baseline Coverage Analysis
+
+Analysis of Scorecard's current probe and check coverage against the
+[OSPS Baseline v2026.02.19](https://baseline.openssf.org/versions/2026-02-19).
+
+This is a living document. As probes are added or enhanced, update the
+coverage status and evidence columns accordingly.
+
+## Coverage legend
+
+| Symbol | Meaning |
+|--------|---------|
+| COVERED | Scorecard has probes that fully satisfy this control |
+| PARTIAL | Scorecard has probes that provide evidence but do not fully satisfy the control |
+| GAP | No existing probe provides meaningful evidence for this control |
+| NOT_OBSERVABLE | Control requires data Scorecard cannot access (e.g., org-level admin permissions) |
+
+## Summary
+
+| Level | Total controls | COVERED | PARTIAL | GAP | NOT_OBSERVABLE |
+|-------|---------------|---------|---------|-----|----------------|
+| 1 | 25 | 6 | 8 | 9 | 2 |
+| 2 | 17 | 2 | 5 | 9 | 1 |
+| 3 | 17 | 0 | 4 | 13 | 0 |
+| **Total** | **59** | **8** | **17** | **31** | **3** |
+
+**Automated coverage rate (COVERED + PARTIAL): 42% (25 of 59)**
+
+**Full coverage rate (COVERED only): 14% (8 of 59)**
+
+## Level 1 controls (25)
+
+| OSPS ID | Control (short) | Status | Scorecard probes/checks providing evidence | Gap / notes |
+|---------|----------------|--------|---------------------------------------------|-------------|
+| OSPS-AC-01.01 | MFA for sensitive resources | NOT_OBSERVABLE | None | Requires org admin API access; Scorecard tokens typically lack this. Must be UNKNOWN unless org-admin token is provided. |
+| OSPS-AC-02.01 | Least-privilege defaults for new collaborators | NOT_OBSERVABLE | None | Requires org-level permission visibility. Must be UNKNOWN. |
+| OSPS-AC-03.01 | Prevent direct commits to primary branch | COVERED | `requiresPRsToChangeCode`, `branchesAreProtected` | Branch-Protection check. Maps directly when PR-only merges are enforced. |
+| OSPS-AC-03.02 | Prevent primary branch deletion | COVERED | `blocksDeleteOnBranches` | Branch-Protection check. Direct mapping. |
+| OSPS-BR-01.01 | Sanitize untrusted CI/CD input | PARTIAL | `hasDangerousWorkflowScriptInjection` | Dangerous-Workflow check detects script injection patterns. Does not cover all sanitization cases (e.g., non-shell contexts). |
+| OSPS-BR-01.03 | Untrusted code snapshots cannot access privileged credentials | PARTIAL | `hasDangerousWorkflowUntrustedCheckout` | Dangerous-Workflow check detects untrusted checkouts with access to secrets. Does not cover all credential isolation scenarios. |
+| OSPS-BR-03.01 | Official channel URIs use encrypted transport | GAP | None | Requires a source-of-truth for official URIs (Security Insights). No probe exists. |
+| OSPS-BR-03.02 | Distribution URIs use authenticated channels | GAP | None | Same as BR-03.01. Requires declared distribution channels. |
+| OSPS-BR-07.01 | Prevent unintentional storage of secrets in VCS | GAP | None | No secrets detection probe exists today. Could consume platform signals (e.g., GitHub secret scanning API). |
+| OSPS-DO-01.01 | User guides for released software | GAP | None | No documentation presence probe. Would need file/path heuristics. |
+| OSPS-DO-02.01 | Defect reporting guide | GAP | None | No issue template / bug report documentation probe. |
+| OSPS-GV-02.01 | Public discussion mechanism | GAP | None | Could check whether issues/discussions are enabled, but no probe exists. |
+| OSPS-GV-03.01 | Documented contribution process | GAP | None | No CONTRIBUTING file presence/content probe. |
+| OSPS-LE-02.01 | OSI/FSF license for source code | COVERED | `hasFSFOrOSIApprovedLicense` | License check. Direct mapping. |
+| OSPS-LE-02.02 | OSI/FSF license for released assets | PARTIAL | `hasFSFOrOSIApprovedLicense` | License check verifies repo license, but does not verify license is shipped with release artifacts. |
+| OSPS-LE-03.01 | License file in repository | COVERED | `hasLicenseFile` | License check. Direct mapping. |
+| OSPS-LE-03.02 | License included with released assets | PARTIAL | `hasLicenseFile` | Detects license in repo, not in release artifacts. Needs release asset inspection. |
+| OSPS-QA-01.01 | Repo publicly readable at static URL | COVERED | (implicit) | Scorecard can only scan public repos. If Scorecard runs, this is satisfied. |
+| OSPS-QA-01.02 | Public commit history with authorship and timestamps | COVERED | (implicit) | VCS provides this by nature. Scorecard relies on commit history for multiple probes. |
+| OSPS-QA-02.01 | Direct dependency list present | PARTIAL | `pinsDependencies` | Pinned-Dependencies check detects dependency manifests but focuses on pinning, not mere presence. |
+| OSPS-QA-04.01 | Docs list subprojects | GAP | None | Requires Security Insights or similar metadata. No probe exists. |
+| OSPS-QA-05.01 | No generated executable artifacts in VCS | PARTIAL | `hasBinaryArtifacts`, `hasUnverifiedBinaryArtifacts` | Binary-Artifacts check. Detects binary files but may not distinguish "generated executables" from other binaries. |
+| OSPS-QA-05.02 | No unreviewable binary artifacts in VCS | PARTIAL | `hasUnverifiedBinaryArtifacts` | Detects unverified binaries. "Unreviewable" vs "reviewable" classification is not yet granular enough. |
+| OSPS-VM-02.01 | Security contacts documented | PARTIAL | `securityPolicyPresent`, `securityPolicyContainsLinks` | Security-Policy check detects SECURITY.md presence and links, but does not verify actual contact methods (email, form, etc.). |
+| OSPS-BR-01.04 | (Note: This is Level 3, not Level 1. Listed under Level 3 below.) | | | |
+
+## Level 2 controls (17)
+
+| OSPS ID | Control (short) | Status | Scorecard probes/checks providing evidence | Gap / notes |
+|---------|----------------|--------|---------------------------------------------|-------------|
+| OSPS-AC-04.01 | Default lowest CI/CD permissions | PARTIAL | `topLevelPermissions`, `jobLevelPermissions`, `hasNoGitHubWorkflowPermissionUnknown` | Token-Permissions check evaluates workflow permissions. "Defaults to lowest" semantics need verification. |
+| OSPS-BR-02.01 | Releases have unique version identifier | GAP | None | No release versioning probe. Needs release API inspection. |
+| OSPS-BR-04.01 | Releases have descriptive changelog | GAP | None | No changelog/release notes detection probe. |
+| OSPS-BR-05.01 | Standardized tooling for dependency ingestion | GAP | None | No probe detects whether standard package managers are used in CI. |
+| OSPS-BR-06.01 | Releases signed or accounted for in signed manifest | PARTIAL | `releasesAreSigned`, `releasesHaveProvenance`, `releasesHaveVerifiedProvenance` | Signed-Releases check covers signatures and provenance. Does not yet check for "signed manifest including hashes" as an alternative. |
+| OSPS-DO-06.01 | Docs describe dependency selection/tracking | GAP | None | Documentation control. No probe exists. |
+| OSPS-DO-07.01 | Build instructions in documentation | GAP | None | Documentation control. No probe exists. |
+| OSPS-GV-01.01 | Docs list members with sensitive access | NOT_OBSERVABLE | None | Requires org-level data or attestation. Not automatable via Scorecard. |
+| OSPS-GV-01.02 | Docs list roles and responsibilities | GAP | None | Documentation control. May require attestation. |
+| OSPS-GV-03.02 | Contributor guide with acceptability requirements | PARTIAL | (related: CONTRIBUTING presence could be inferred) | No probe today; could extend a contributing-file probe to check for content structure. |
+| OSPS-LE-01.01 | Legal authorization per commit (DCO/CLA) | GAP | None | No DCO/CLA detection probe. Would check for Signed-off-by trailers or CLA bot enforcement. |
+| OSPS-QA-03.01 | Status checks pass or bypassed before merge | COVERED | `runsStatusChecksBeforeMerging` | Branch-Protection check. Direct mapping. |
+| OSPS-QA-06.01 | Automated tests run prior to acceptance | COVERED | `testsRunInCI` | CI-Tests check. Maps directly. |
+| OSPS-SA-01.01 | Design docs with actions/actors | GAP | None | Documentation/assessment control. Requires attestation. |
+| OSPS-SA-02.01 | Docs describe external interfaces | GAP | None | Documentation control. Requires attestation. |
+| OSPS-SA-03.01 | Security assessment performed | GAP | None | Process control. Requires attestation with evidence link. |
+| OSPS-VM-01.01 | CVD policy with response timeframe | PARTIAL | `securityPolicyContainsVulnerabilityDisclosure`, `securityPolicyContainsText` | Security-Policy check detects disclosure language. Does not verify explicit timeframe commitment. |
+| OSPS-VM-03.01 | Private vulnerability reporting method | PARTIAL | `securityPolicyContainsLinks` | Detects links in SECURITY.md. Does not verify private reporting is actually enabled (e.g., GitHub PSIRT feature). |
+| OSPS-VM-04.01 | Publicly publish vulnerability data | GAP | None | No probe checks for GitHub Security Advisories, OSV entries, or CVE publication. |
+
+## Level 3 controls (17)
+
+| OSPS ID | Control (short) | Status | Scorecard probes/checks providing evidence | Gap / notes |
+|---------|----------------|--------|---------------------------------------------|-------------|
+| OSPS-AC-04.02 | Job-level least privilege in CI/CD | PARTIAL | `jobLevelPermissions` | Token-Permissions check evaluates job-level permissions. "Minimum necessary" is hard to assess without understanding job intent. |
+| OSPS-BR-01.04 | Sanitize trusted collaborator CI/CD input | PARTIAL | `hasDangerousWorkflowScriptInjection` | Dangerous-Workflow check partially covers this, but focuses on untrusted input, not trusted collaborator input. |
+| OSPS-BR-02.02 | Release assets tied to release identifier | GAP | None | No release asset naming/association probe. |
+| OSPS-BR-07.02 | Secrets management policy | GAP | None | Documentation/policy control. Requires attestation. |
+| OSPS-DO-03.01 | Instructions to verify release integrity/authenticity | GAP | None | Documentation control. Could partially automate by checking for verification docs alongside signed releases. |
+| OSPS-DO-03.02 | Instructions to verify release author identity | GAP | None | Documentation control. |
+| OSPS-DO-04.01 | Support scope/duration per release | GAP | None | Documentation control. |
+| OSPS-DO-05.01 | EOL security update statement | GAP | None | Documentation control. |
+| OSPS-GV-04.01 | Policy to review collaborators before escalated perms | GAP | None | Governance policy. Requires attestation. |
+| OSPS-QA-02.02 | SBOM shipped with compiled release assets | PARTIAL | `hasReleaseSBOM`, `hasSBOM` | SBOM probes exist but may not specifically verify compiled-asset association. |
+| OSPS-QA-04.02 | Subprojects enforce >= primary requirements | GAP | None | Requires multi-repo scanning and cross-repo comparison. |
+| OSPS-QA-06.02 | Docs describe when/how tests run | GAP | None | Documentation control. |
+| OSPS-QA-06.03 | Policy requiring tests for major changes | GAP | None | Documentation/policy control. |
+| OSPS-QA-07.01 | Non-author approval before merging | PARTIAL | `codeApproved`, `codeReviewOneReviewers`, `requiresApproversForPullRequests` | Code-Review and Branch-Protection probes cover this. "Non-author" semantics need verification. |
+| OSPS-VM-04.02 | VEX for non-affecting vulnerabilities | GAP | None | No VEX detection probe. |
+| OSPS-VM-05.01 | SCA remediation threshold policy | GAP | None | Policy control. |
+| OSPS-VM-05.02 | SCA violations addressed pre-release | GAP | None | Policy + enforcement control. |
+| OSPS-VM-05.03 | Automated SCA eval + block violations | PARTIAL | `hasOSVVulnerabilities` | Vulnerabilities check detects known vulns. Does not verify gating/blocking enforcement. |
+| OSPS-VM-06.01 | SAST remediation threshold policy | GAP | None | Policy control. |
+| OSPS-VM-06.02 | Automated SAST eval + block violations | PARTIAL | `sastToolConfigured`, `sastToolRunsOnAllCommits` | SAST check detects tool presence and execution. Does not verify gating/blocking enforcement. |
+
+## Phase 1 priorities (Level 1 gap closure)
+
+The following Level 1 gaps should be addressed first, ordered by implementation feasibility:
+
+1. **OSPS-GV-03.01** (contribution process): Add probe for CONTRIBUTING file presence
+2. **OSPS-GV-02.01** (public discussion): Add probe for issues/discussions enabled
+3. **OSPS-DO-02.01** (defect reporting): Add probe for issue templates or bug report docs
+4. **OSPS-DO-01.01** (user guides): Add probe for documentation presence heuristics
+5. **OSPS-BR-07.01** (secrets in VCS): Consume GitHub secret scanning API or add detection heuristics
+6. **OSPS-BR-03.01 / BR-03.02** (encrypted transport): Requires Security Insights ingestion for declared URIs
+7. **OSPS-QA-04.01** (subproject listing): Requires Security Insights or equivalent metadata
+
+## Probes not mapped to any OSPS control
+
+The following probes exist in Scorecard but do not directly map to any OSPS Baseline control:
+
+| Probe | Check | Notes |
+|-------|-------|-------|
+| `archived` | Maintained | Project archival status — relates to "while active" preconditions |
+| `hasRecentCommits` | Maintained | Activity signal — relates to "while active" preconditions |
+| `issueActivityByProjectMember` | Maintained | Activity signal — relates to "while active" preconditions |
+| `createdRecently` | Maintained | Age signal |
+| `contributorsFromOrgOrCompany` | Contributors | Diversity signal |
+| `dependencyUpdateToolConfigured` | Dependency-Update-Tool | Best practice, not a Baseline control |
+| `fuzzed` | Fuzzing | Testing best practice, not a Baseline control |
+| `hasOpenSSFBadge` | CII-Best-Practices | Meta-badge, not a Baseline control |
+| `packagedWithAutomatedWorkflow` | Packaging | Distribution best practice |
+| `webhooksUseSecrets` | Webhook | Security practice, not a Baseline control |
+| `hasPermissiveLicense` | (uncategorized) | License type classification |
+| `unsafeblock` | (independent) | Language-specific safety |
+| `dismissesStaleReviews` | Branch-Protection | Review hygiene beyond Baseline scope |
+| `requiresCodeOwnersReview` | Branch-Protection | CODEOWNERS enforcement beyond Baseline scope |
+| `requiresLastPushApproval` | Branch-Protection | Push approval beyond Baseline scope |
+| `requiresUpToDateBranches` | Branch-Protection | Branch freshness beyond Baseline scope |
+| `branchProtectionAppliesToAdmins` | Branch-Protection | Admin override prevention beyond Baseline scope |
+| `blocksForcePushOnBranches` | Branch-Protection | Force-push protection; related to AC-03 but not explicitly required |
+
+These probes remain valuable for Scorecard's existing scoring model and may become relevant for future Baseline versions.
+
+## Notes
+
+- The OSPS Baseline v2026.02.19 contains 59 controls. The older roadmap-ideas.md
+  mapped against v2025-10-10 which had a slightly different control set. This
+  analysis supersedes any previous coverage estimates.
+- Controls marked NOT_OBSERVABLE cannot produce PASS or FAIL without elevated
+  permissions. The conformance engine must return UNKNOWN with an explanation.
+- Many Level 2 and Level 3 controls are documentation or policy controls that
+  require attestation rather than automated detection. The attestation mechanism
+  (OQ-1 in the proposal) is critical for these.
+- The "while active" precondition on many controls maps to Scorecard's Maintained
+  check probes (`archived`, `hasRecentCommits`, `issueActivityByProjectMember`).
+  These could serve as applicability detectors.

From 4b6f19e789798bef2bfc4897ae075202fa33dd1e Mon Sep 17 00:00:00 2001
From: Stephen Augustus <foo@auggie.dev>
Date: Fri, 27 Feb 2026 05:45:43 -0500
Subject: [PATCH 08/28] :seedling: Remove roadmap-ideas.md reference from
 public docs; add CQ-9 through CQ-12

Remove reference to docs/roadmap-ideas.md from the coverage analysis
document since it is not committed to the repo.

Add four new clarifying questions to the proposal: NOT_OBSERVABLE
controls in Phase 1 (CQ-9), mapping file ownership (CQ-10), OSPS
output schema stability guarantees (CQ-11), and Phase 1 probe gap
prioritization (CQ-12).

Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Stephen Augustus <foo@auggie.dev>
---
 docs/osps-baseline-coverage.md                |  6 +--
 .../osps-baseline-conformance/proposal.md     | 45 +++++++++++++++++++
 2 files changed, 48 insertions(+), 3 deletions(-)

diff --git a/docs/osps-baseline-coverage.md b/docs/osps-baseline-coverage.md
index 9c80bffa8db..a0b85be7ac5 100644
--- a/docs/osps-baseline-coverage.md
+++ b/docs/osps-baseline-coverage.md
@@ -148,9 +148,9 @@ These probes remain valuable for Scorecard's existing scoring model and may beco
 
 ## Notes
 
-- The OSPS Baseline v2026.02.19 contains 59 controls. The older roadmap-ideas.md
-  mapped against v2025-10-10 which had a slightly different control set. This
-  analysis supersedes any previous coverage estimates.
+- The OSPS Baseline v2026.02.19 contains 59 controls. Previous coverage
+  estimates against older Baseline versions should be treated as out-of-date.
+  This analysis supersedes any prior mapping.
 - Controls marked NOT_OBSERVABLE cannot produce PASS or FAIL without elevated
   permissions. The conformance engine must return UNKNOWN with an explanation.
 - Many Level 2 and Level 3 controls are documentation or policy controls that
diff --git a/openspec/changes/osps-baseline-conformance/proposal.md b/openspec/changes/osps-baseline-conformance/proposal.md
index 2a908ebba67..d3a1a86d245 100644
--- a/openspec/changes/osps-baseline-conformance/proposal.md
+++ b/openspec/changes/osps-baseline-conformance/proposal.md
@@ -373,3 +373,48 @@ Scorecard runs at scale via the Scorecard Action (GitHub Action) and the public
 **Stephen's response:**
 
 We need to land these capabilities for as much surface area as possible.
+
+#### CQ-9: Coverage analysis and Phase 1 scope validation
+
+The coverage analysis (`docs/osps-baseline-coverage.md`) identifies 25 Level 1 controls. Of those, 6 are COVERED, 8 are PARTIAL, 9 are GAP, and 2 are NOT_OBSERVABLE. The Phase 1 plan targets closing the 9 GAP controls. Given that 2 controls (AC-01.01, AC-02.01) are NOT_OBSERVABLE without org-admin tokens, should Phase 1 explicitly include work on improving observability (e.g., documenting what tokens are needed, or providing guidance for org admins), or should those controls remain UNKNOWN until a later phase?
+
+**Stephen's response:**
+
+
+#### CQ-10: Mapping file ownership and contribution model
+
+The versioned mapping file (e.g., `pkg/osps/mappings/v2026-02-19.yaml`) is a critical artifact that defines which probes satisfy which OSPS controls. Who should own this file? Options:
+- Scorecard maintainers only (changes require maintainer review)
+- Community-contributed with maintainer approval (like checks/probes today)
+- Co-maintained with ORBIT WG members who understand the Baseline controls
+
+This also affects how we handle disagreements about whether a probe truly satisfies a control.
+
+**Stephen's response:**
+
+
+#### CQ-11: Backwards compatibility of OSPS output format
+
+The spec requires `--format=osps` as a new output format. Since this is a new surface, we have freedom to iterate on the schema. However, once shipped, consumers will depend on it. What stability guarantees should we offer?
+- No guarantees during Phase 1 (alpha schema, may break between releases)
+- Semver-like schema versioning from day one (breaking changes increment major version)
+- Follow the Gemara L4 schema if one exists, inheriting its stability model
+
+**Stephen's response:**
+
+
+#### CQ-12: Probe gap prioritization for Phase 1
+
+The coverage analysis identifies 7 Level 1 GAP controls that need new probes (excluding the 2 that depend on Security Insights). Ranked by implementation feasibility:
+
+1. OSPS-GV-03.01 — CONTRIBUTING file presence
+2. OSPS-GV-02.01 — Issues/discussions enabled
+3. OSPS-DO-02.01 — Issue templates or bug report docs
+4. OSPS-DO-01.01 — Documentation presence heuristics
+5. OSPS-BR-07.01 — Secrets detection (platform signal consumption)
+6. OSPS-BR-03.01 / BR-03.02 — Encrypted transport (requires Security Insights)
+7. OSPS-QA-04.01 — Subproject listing (requires Security Insights)
+
+Do you agree with this priority ordering? Are there any controls you would move up or down, or any you would defer to Phase 2?
+
+**Stephen's response:**

From ea0e0673d073d309c6958f76b3b7b16c0576ad1d Mon Sep 17 00:00:00 2001
From: Stephen Augustus <foo@auggie.dev>
Date: Fri, 27 Feb 2026 06:03:11 -0500
Subject: [PATCH 09/28] :seedling: Fix Mermaid diagram newlines in proposal

Replace \n with <br/> in Mermaid node labels so line breaks render
correctly in GitHub's Mermaid renderer.

Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Stephen Augustus <foo@auggie.dev>
---
 .../osps-baseline-conformance/proposal.md     | 20 +++++++++----------
 1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/openspec/changes/osps-baseline-conformance/proposal.md b/openspec/changes/osps-baseline-conformance/proposal.md
index d3a1a86d245..3754612b6f4 100644
--- a/openspec/changes/osps-baseline-conformance/proposal.md
+++ b/openspec/changes/osps-baseline-conformance/proposal.md
@@ -47,10 +47,10 @@ Several tools operate in adjacent spaces. Understanding their capabilities clari
 
 ```mermaid
 flowchart LR
-    Scorecard["Scorecard\n(Measure)"] -->|findings + attestations| Darnit["Darnit\n(Audit + Remediate)"]
-    Scorecard -->|findings + attestations| AMPEL["AMPEL\n(Enforce)"]
+    Scorecard["Scorecard<br/>(Measure)"] -->|findings + attestations| Darnit["Darnit<br/>(Audit + Remediate)"]
+    Scorecard -->|findings + attestations| AMPEL["AMPEL<br/>(Enforce)"]
     Darnit -->|compliance attestation| AMPEL
-    Scorecard -->|conformance evidence| Privateer["Privateer Plugin\n(Baseline evaluation)"]
+    Scorecard -->|conformance evidence| Privateer["Privateer Plugin<br/>(Baseline evaluation)"]
 ```
 
 Scorecard is the **data source** (measures repository security). Darnit audits compliance and remediates. AMPEL enforces policies on attestations. The Privateer plugin evaluates Baseline conformance. They are complementary, not competing.
@@ -197,17 +197,17 @@ Phases are ordered by outcome, not calendar quarter. Maintainer bandwidth dictat
 ```mermaid
 flowchart TD
     subgraph ORBIT["ORBIT WG Ecosystem"]
-        Baseline["OSPS Baseline\n(controls)"]
-        Gemara["Gemara\n(schemas: L2/L4)"]
-        SI["Security Insights\n(metadata)"]
+        Baseline["OSPS Baseline<br/>(controls)"]
+        Gemara["Gemara<br/>(schemas: L2/L4)"]
+        SI["Security Insights<br/>(metadata)"]
 
         subgraph Evaluation["Evaluation"]
-            Privateer["Privateer GitHub Plugin\n(LFX Insights driver)"]
-            Scorecard["OpenSSF Scorecard\n(deep analysis, conformance output,\nmulti-platform, large install base)"]
+            Privateer["Privateer GitHub Plugin<br/>(LFX Insights driver)"]
+            Scorecard["OpenSSF Scorecard<br/>(deep analysis, conformance output,<br/>multi-platform, large install base)"]
         end
 
-        Minder["Minder\n(enforce)"]
-        Darn["Darn\n(remediate)"]
+        Minder["Minder<br/>(enforce)"]
+        Darn["Darn<br/>(remediate)"]
     end
 
     Baseline -->|defines controls| Privateer

From 4723c013697300936abfc368d3fa74cb01d381e4 Mon Sep 17 00:00:00 2001
From: Stephen Augustus <foo@auggie.dev>
Date: Fri, 27 Feb 2026 06:33:43 -0500
Subject: [PATCH 10/28] =?UTF-8?q?:seedling:=20Fix=20Darn=E2=86=92Darnit=20?=
 =?UTF-8?q?references;=20add=20Minder=20to=20ecosystem=20comparison?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Replace remaining "Darn" references with "Darnit" throughout the
proposal. Add Minder to the ecosystem comparison table, integration
diagram, and "What Scorecard must not do" section. Minder is an
OpenSSF Sandbox project in the ORBIT WG that consumes Scorecard
findings for policy enforcement and auto-remediation.

Add CQ-13 (Minder integration surface) and CQ-14 (Darnit vs. Minder
delineation) as new clarifying questions.

Update docs/ROADMAP.md ecosystem alignment to include Minder.

Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Stephen Augustus <foo@auggie.dev>
---
 docs/ROADMAP.md                               |  4 +-
 .../osps-baseline-conformance/proposal.md     | 71 ++++++++++++-------
 2 files changed, 50 insertions(+), 25 deletions(-)

diff --git a/docs/ROADMAP.md b/docs/ROADMAP.md
index 3ce3c4745e5..9ece5eef60c 100644
--- a/docs/ROADMAP.md
+++ b/docs/ROADMAP.md
@@ -77,12 +77,14 @@ Deliverables:
 Scorecard operates within the ORBIT WG ecosystem as a measurement and
 evidence tool. It does not duplicate:
 
+- **[Minder](https://github.com/mindersec/minder)** — Policy enforcement and remediation platform (OpenSSF Sandbox, ORBIT WG)
 - **[Privateer plugin for GitHub repositories](https://github.com/ossf/pvtr-github-repo-scanner)** — Baseline evaluation powered by Gemara and Security Insights
 - **[Darnit](https://github.com/kusari-oss/darnit)** — Compliance audit and remediation
 - **[AMPEL](https://github.com/carabiner-dev/ampel)** — Attestation-based policy enforcement
 
 Scorecard's role is to produce deep, probe-based conformance evidence that
-these tools and downstream consumers can use.
+these tools and downstream consumers can use. Minder already consumes
+Scorecard findings to enforce security policies across repositories.
 
 ### Design principles
 
diff --git a/openspec/changes/osps-baseline-conformance/proposal.md b/openspec/changes/osps-baseline-conformance/proposal.md
index 3754612b6f4..78405a257fd 100644
--- a/openspec/changes/osps-baseline-conformance/proposal.md
+++ b/openspec/changes/osps-baseline-conformance/proposal.md
@@ -29,36 +29,38 @@ This is fundamentally a **product-level shift**: Scorecard today answers "how we
 
 Several tools operate in adjacent spaces. Understanding their capabilities clarifies what is and isn't Scorecard's job.
 
-| Dimension | **Scorecard** | **[Darnit](https://github.com/kusari-oss/darnit)** | **[AMPEL](https://github.com/carabiner-dev/ampel)** | **[Privateer GitHub Plugin](https://github.com/ossf/pvtr-github-repo-scanner)** |
-|-----------|--------------|-----------|-----------|-------------|
-| **Purpose** | Security health measurement | Compliance audit + remediation | Attestation-based policy enforcement | Baseline conformance evaluation |
-| **Action** | Analyzes repositories (read-only) | Audits + fixes repositories | Verifies attestations against policies | Evaluates repos against OSPS controls |
-| **Data source** | Collects from APIs/code | Analyzes repo state | Consumes attestations only | Collects from GitHub API + Security Insights |
-| **Output** | Scores (0-10) + probe findings | PASS/FAIL + attestations + fixes | PASS/FAIL + results attestation | Gemara L4 assessment results |
-| **OSPS Baseline** | Partial (via probes) | Full (62 controls) | Via policy rules | 39 of 52 controls |
-| **In-toto** | Produces attestations | Produces attestations | Consumes + verifies | N/A |
-| **OSCAL** | No | No | Native support | N/A |
-| **Sigstore** | No | Signs attestations | Verifies signatures | N/A |
-| **Gemara** | Not yet (planned) | No | No | L2 + L4 native |
-| **Maturity** | Production (v5.3.0) | Alpha (v0.1.0, Jan 2026) | Production (v1.0.0) | Production, powers LFX Insights |
-| **Language** | Go | Python | Go | Go |
-
-**Integration model — three-layer pipeline:**
+| Dimension | **Scorecard** | **[Minder](https://github.com/mindersec/minder)** | **[Darnit](https://github.com/kusari-oss/darnit)** | **[AMPEL](https://github.com/carabiner-dev/ampel)** | **[Privateer GitHub Plugin](https://github.com/ossf/pvtr-github-repo-scanner)** |
+|-----------|--------------|---------|-----------|-----------|-------------|
+| **Purpose** | Security health measurement | Policy enforcement + remediation platform | Compliance audit + remediation | Attestation-based policy enforcement | Baseline conformance evaluation |
+| **Action** | Analyzes repositories (read-only) | Enforces policies, auto-remediates | Audits + fixes repositories | Verifies attestations against policies | Evaluates repos against OSPS controls |
+| **Data source** | Collects from APIs/code | Collects from APIs + consumes findings from other tools | Analyzes repo state | Consumes attestations only | Collects from GitHub API + Security Insights |
+| **Output** | Scores (0-10) + probe findings | Policy evaluation results + remediation PRs | PASS/FAIL + attestations + fixes | PASS/FAIL + results attestation | Gemara L4 assessment results |
+| **OSPS Baseline** | Partial (via probes) | Via Rego policy rules | Full (62 controls) | Via policy rules | 39 of 52 controls |
+| **In-toto** | Produces attestations | Consumes attestations | Produces attestations | Consumes + verifies | N/A |
+| **OSCAL** | No | No | No | Native support | N/A |
+| **Sigstore** | No | Verifies signatures | Signs attestations | Verifies signatures | N/A |
+| **Gemara** | Not yet (planned) | No | No | No | L2 + L4 native |
+| **Maturity** | Production (v5.3.0) | Sandbox (OpenSSF, donated Oct 2024) | Alpha (v0.1.0, Jan 2026) | Production (v1.0.0) | Production, powers LFX Insights |
+| **Language** | Go | Go | Python | Go | Go |
+
+**Integration model:**
 
 ```mermaid
 flowchart LR
-    Scorecard["Scorecard<br/>(Measure)"] -->|findings + attestations| Darnit["Darnit<br/>(Audit + Remediate)"]
+    Scorecard["Scorecard<br/>(Measure)"] -->|findings + attestations| Minder["Minder<br/>(Enforce + Remediate)"]
+    Scorecard -->|findings + attestations| Darnit["Darnit<br/>(Audit + Remediate)"]
     Scorecard -->|findings + attestations| AMPEL["AMPEL<br/>(Enforce)"]
     Darnit -->|compliance attestation| AMPEL
     Scorecard -->|conformance evidence| Privateer["Privateer Plugin<br/>(Baseline evaluation)"]
 ```
 
-Scorecard is the **data source** (measures repository security). Darnit audits compliance and remediates. AMPEL enforces policies on attestations. The Privateer plugin evaluates Baseline conformance. They are complementary, not competing.
+Scorecard is the **data source** (measures repository security). Minder consumes Scorecard findings to enforce policies and auto-remediate across repositories. Darnit audits compliance and remediates. AMPEL enforces policies on attestations. The Privateer plugin evaluates Baseline conformance. They are complementary, not competing.
 
 ### What Scorecard must not do
 
 - **Duplicate the Privateer plugin's role.** The [Privateer plugin for GitHub repositories](https://github.com/ossf/pvtr-github-repo-scanner) is the Baseline evaluator in the ORBIT ecosystem. Scorecard should complement it with deeper analysis and interoperable output, not fork the evaluation model.
-- **Duplicate remediation.** Darnit handles compliance auditing and automated remediation (PR creation, file generation, AI-assisted fixes). Scorecard is read-only.
+- **Duplicate policy enforcement or remediation.** [Minder](https://github.com/mindersec/minder) (OpenSSF Sandbox project, ORBIT WG) consumes Scorecard findings and enforces security policies across repositories with auto-remediation. Scorecard produces findings for Minder to act on.
+- **Duplicate compliance auditing.** Darnit handles compliance auditing and automated remediation (PR creation, file generation, AI-assisted fixes). Scorecard is read-only.
 - **Duplicate attestation policy enforcement.** AMPEL verifies attestations against policies and gates CI/CD pipelines. Scorecard *produces* attestations for AMPEL to consume.
 - **Turn OSPS controls into Scorecard checks.** OSPS conformance is a layer that consumes existing Scorecard signals, not 59 new checks.
 
@@ -144,10 +146,9 @@ Spencer's position is that OSPS evidence references should point to probe findin
 
 ### Out of scope
 
-- Remediation automation (Darn's domain)
+- Policy enforcement and remediation (Minder's and Darnit's domain)
 - Replacing the Privateer plugin for GitHub repositories
 - Changing existing check scores or behavior
-- Platform enforcement (Minder's domain)
 - OSPS Baseline specification changes (ORBIT WG's domain)
 
 ## Phased delivery
@@ -206,12 +207,13 @@ flowchart TD
             Scorecard["OpenSSF Scorecard<br/>(deep analysis, conformance output,<br/>multi-platform, large install base)"]
         end
 
-        Minder["Minder<br/>(enforce)"]
-        Darn["Darn<br/>(remediate)"]
+        Minder["Minder<br/>(enforce + remediate)"]
+        Darnit["Darnit<br/>(audit + remediate)"]
     end
 
     Baseline -->|defines controls| Privateer
     Baseline -->|defines controls| Scorecard
+    Baseline -->|defines controls| Minder
     Gemara -->|provides schemas| Privateer
     Gemara -->|provides schemas| Scorecard
     SI -->|provides metadata| Privateer
@@ -219,12 +221,12 @@ flowchart TD
     SI -->|provides metadata| Minder
     Scorecard -->|conformance evidence| Privateer
     Scorecard -->|findings| Minder
-    Scorecard -->|findings| Darn
+    Scorecard -->|findings| Darnit
 ```
 
 **Scorecard's role**: Produce deep, probe-based conformance evidence that the Privateer plugin, Minder, and downstream consumers can use. Scorecard reads Security Insights (shared data plane), outputs Gemara L4-compatible results (shared schema), and fills analysis gaps where the Privateer plugin has `NotImplemented` steps.
 
-**What Scorecard does NOT do**: Replace the Privateer plugin, enforce policies (Minder's role), or perform remediation (Darn's role).
+**What Scorecard does NOT do**: Replace the Privateer plugin, enforce policies or remediate (Minder's role), or perform compliance auditing and remediation (Darnit's role).
 
 ## Success criteria
 
@@ -418,3 +420,24 @@ The coverage analysis identifies 7 Level 1 GAP controls that need new probes (ex
 Do you agree with this priority ordering? Are there any controls you would move up or down, or any you would defer to Phase 2?
 
 **Stephen's response:**
+
+
+#### CQ-13: Minder integration surface
+
+[Minder](https://github.com/mindersec/minder) is an OpenSSF Sandbox project within the ORBIT WG that already consumes Scorecard findings to enforce security policies and auto-remediate across repositories. Minder uses Rego-based rules and can enforce OSPS Baseline controls via policy profiles. A draft Scorecard PR (#4723, now stale) attempted deeper integration by enabling Scorecard checks to be written using Minder's Rego rule engine.
+
+Given Minder's position in the ORBIT ecosystem:
+- Should the OSPS conformance output be designed with Minder as an explicit consumer (e.g., ensuring the output schema works well as Minder policy input)?
+- Should we coordinate with Minder maintainers during Phase 1 to validate the integration surface?
+- Is there a risk of duplicating Baseline evaluation work that Minder already does via its own rules, and if so, how should we delineate?
+
+**Stephen's response:**
+
+
+#### CQ-14: Darnit vs. Minder delineation
+
+The proposal lists both [Darnit](https://github.com/kusari-oss/darnit) and [Minder](https://github.com/mindersec/minder) as tools that handle remediation and enforcement. Their capabilities overlap in some areas (both can enforce Baseline controls, both can remediate). For Scorecard's purposes, the distinction matters primarily for the "What Scorecard must not do" boundary.
+
+Is the current framing correct — that Scorecard is the measurement layer and both Minder and Darnit are downstream consumers? Or should we position Scorecard differently relative to one versus the other, given that Minder is an OpenSSF project in the same working group while Darnit is not?
+
+**Stephen's response:**

From c28a67401f42eef7020ee50e91cea7e3bfa7cde4 Mon Sep 17 00:00:00 2001
From: Stephen Augustus <foo@auggie.dev>
Date: Fri, 27 Feb 2026 06:39:09 -0500
Subject: [PATCH 11/28] :seedling: Map existing issues/PRs to OSPS Baseline
 gaps; add CQ-15

Add a new section to docs/osps-baseline-coverage.md listing existing
Scorecard issues and PRs that are directly relevant to closing OSPS
Baseline coverage gaps, including:
- #2305 / #2479 (Security Insights)
- #30 (secrets scanning)
- #1476 / #2605 (SBOM)
- #4824 (changelog)
- #2465 (private vulnerability reporting)
- #4080 / #4823 / #2684 / #1417 (signed releases)
- #2142 (threat model)
- #4723 (Minder/Rego integration, closed)

Add CQ-15 asking whether existing issues should be adopted as Phase 1
work items or whether new issues should reference them.

Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Stephen Augustus <foo@auggie.dev>
---
 docs/osps-baseline-coverage.md                | 60 +++++++++++++++++++
 .../osps-baseline-conformance/proposal.md     | 15 +++++
 2 files changed, 75 insertions(+)

diff --git a/docs/osps-baseline-coverage.md b/docs/osps-baseline-coverage.md
index a0b85be7ac5..33a79c13e39 100644
--- a/docs/osps-baseline-coverage.md
+++ b/docs/osps-baseline-coverage.md
@@ -146,6 +146,66 @@ The following probes exist in Scorecard but do not directly map to any OSPS Base
 
 These probes remain valuable for Scorecard's existing scoring model and may become relevant for future Baseline versions.
 
+## Existing issues and PRs relevant to gap closure
+
+The following open issues and PRs in the Scorecard repository are directly
+relevant to closing OSPS Baseline coverage gaps. These should be prioritized
+and linked to the conformance work.
+
+### Security Insights ingestion
+- [#2305](https://github.com/ossf/scorecard/issues/2305) — Support for SECURITY INSIGHTS
+- [#2479](https://github.com/ossf/scorecard/issues/2479) — SECURITY-INSIGHTS.yml implementation
+
+These are critical for OSPS-BR-03.01, BR-03.02, QA-04.01, and other
+controls that depend on declared project metadata.
+
+### Secrets detection (OSPS-BR-07.01)
+- [#30](https://github.com/ossf/scorecard/issues/30) — New check: code is scanning for secrets
+
+Open since the project's earliest days. Phase 1 priority.
+
+### SBOM (OSPS-QA-02.02)
+- [#1476](https://github.com/ossf/scorecard/issues/1476) — Feature: Detect if SBOMs generated
+- [#2605](https://github.com/ossf/scorecard/issues/2605) — Add support for SBOM analyzing at Binary-Artifacts stage
+
+The SBOM check and probes (`hasSBOM`, `hasReleaseSBOM`) already exist but
+may need enhancement for compiled release asset association.
+
+### Changelog / release notes (OSPS-BR-04.01)
+- [#4824](https://github.com/ossf/scorecard/issues/4824) — Feature: New Check: Check if the project has and maintains a CHANGELOG
+
+Direct match for Phase 2 deliverable.
+
+### Private vulnerability reporting (OSPS-VM-03.01)
+- [#2465](https://github.com/ossf/scorecard/issues/2465) — Factor whether or not private vulnerability reporting is enabled into the scorecard
+
+Direct match. GitHub's private vulnerability reporting API could provide
+platform-level evidence.
+
+### Vulnerability disclosure improvements (OSPS-VM-01.01, VM-04.01)
+- [#4192](https://github.com/ossf/scorecard/issues/4192) — Test for security policy in other places than SECURITY.md
+- [#4789](https://github.com/ossf/scorecard/issues/4789) — Rethinking vulnerability check scoring logic
+- [#1371](https://github.com/ossf/scorecard/issues/1371) — Feature: add check for vulnerability alerts
+
+### Signed releases and provenance (OSPS-BR-06.01)
+- [#4823](https://github.com/ossf/scorecard/issues/4823) — Feature: pass Signed-Releases with GitHub immutable release process
+- [#4080](https://github.com/ossf/scorecard/issues/4080) — Use GitHub attestations to check for signed releases
+- [#2684](https://github.com/ossf/scorecard/issues/2684) — Rework: Signed-Releases: Separate score calculation of provenance and signatures
+- [#1417](https://github.com/ossf/scorecard/issues/1417) — Feature: add support for keyless signed release
+
+### Threat model / security assessment (OSPS-SA-01.01, SA-03.01)
+- [#2142](https://github.com/ossf/scorecard/issues/2142) — Feature: Assess presence and maintenance of a threat model
+
+### Release scoring (OSPS-BR-02.01, BR-02.02)
+- [#1985](https://github.com/ossf/scorecard/issues/1985) — Feature: Scoring for individual releases
+
+### Minder integration
+- [#4723](https://github.com/ossf/scorecard/pull/4723) — Initial draft of using Minder rules in Scorecard (CLOSED)
+
+Draft PR that attempted to run Minder Rego rules within Scorecard,
+including OSPS-QA-05.01 and QA-03.01. Closed due to inactivity but
+demonstrates interest in deeper Minder/Scorecard integration.
+
 ## Notes
 
 - The OSPS Baseline v2026.02.19 contains 59 controls. Previous coverage
diff --git a/openspec/changes/osps-baseline-conformance/proposal.md b/openspec/changes/osps-baseline-conformance/proposal.md
index 78405a257fd..cedc28e5abf 100644
--- a/openspec/changes/osps-baseline-conformance/proposal.md
+++ b/openspec/changes/osps-baseline-conformance/proposal.md
@@ -441,3 +441,18 @@ The proposal lists both [Darnit](https://github.com/kusari-oss/darnit) and [Mind
 Is the current framing correct — that Scorecard is the measurement layer and both Minder and Darnit are downstream consumers? Or should we position Scorecard differently relative to one versus the other, given that Minder is an OpenSSF project in the same working group while Darnit is not?
 
 **Stephen's response:**
+
+
+#### CQ-15: Existing issues as Phase 1 work items
+
+The coverage analysis (`docs/osps-baseline-coverage.md`) now includes a section mapping existing Scorecard issues to OSPS Baseline gaps. Several long-standing issues align directly with Phase 1 priorities:
+
+- [#30](https://github.com/ossf/scorecard/issues/30) — Secrets scanning (OSPS-BR-07.01), open since the project's earliest days
+- [#2305](https://github.com/ossf/scorecard/issues/2305) / [#2479](https://github.com/ossf/scorecard/issues/2479) — Security Insights ingestion
+- [#2465](https://github.com/ossf/scorecard/issues/2465) — Private vulnerability reporting (OSPS-VM-03.01)
+- [#4824](https://github.com/ossf/scorecard/issues/4824) — Changelog check (OSPS-BR-04.01)
+- [#4723](https://github.com/ossf/scorecard/pull/4723) — Minder/Rego integration draft (closed)
+
+Should we adopt these existing issues as the starting work items for Phase 1, or create new issues that reference them? Some of these issues have significant discussion history that may contain design decisions worth preserving.
+
+**Stephen's response:**

From ef670de2e7b0f603142977eeae54c969911abd78 Mon Sep 17 00:00:00 2001
From: Stephen Augustus <foo@auggie.dev>
Date: Fri, 27 Feb 2026 06:55:24 -0500
Subject: [PATCH 12/28] :seedling: Remove premature openspec scaffolding and
 roadmap-ideas gitignore

Remove openspec system specs (core-checks, platform-clients, probes)
that were scaffolding for documenting existing Scorecard architecture.
These are not part of the OSPS conformance proposal and can be
recreated if needed.

Remove docs/roadmap-ideas.md from .gitignore.

Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Stephen Augustus <foo@auggie.dev>
---
 .gitignore                              |  3 --
 openspec/specs/core-checks/spec.md      | 29 -----------------
 openspec/specs/platform-clients/spec.md | 26 ---------------
 openspec/specs/probes/spec.md           | 42 -------------------------
 4 files changed, 100 deletions(-)
 delete mode 100644 openspec/specs/core-checks/spec.md
 delete mode 100644 openspec/specs/platform-clients/spec.md
 delete mode 100644 openspec/specs/probes/spec.md

diff --git a/.gitignore b/.gitignore
index 01e1083606d..fb2ac85231e 100644
--- a/.gitignore
+++ b/.gitignore
@@ -67,6 +67,3 @@ newRelease.json
 
 # AI tooling instructions
 AGENTS.md
-
-# Roadmap drafting ideas
-docs/roadmap-ideas.md
diff --git a/openspec/specs/core-checks/spec.md b/openspec/specs/core-checks/spec.md
deleted file mode 100644
index 3f4401d065e..00000000000
--- a/openspec/specs/core-checks/spec.md
+++ /dev/null
@@ -1,29 +0,0 @@
-# Core Checks System
-
-## Purpose
-
-The checks system provides high-level security assessments of open source repositories. Each check produces a score from 0-10 representing how well a project follows a particular security practice.
-
-## Requirements
-
-### Requirement: Check Registration
-The system SHALL allow checks to self-register via `init()` functions using `registerCheck()`.
-
-### Requirement: Parallel Execution
-The system SHALL execute all enabled checks concurrently using goroutines.
-
-### Requirement: Three-Tier Architecture
-Each check SHALL follow the raw data collection -> probe execution -> evaluation pipeline.
-
-### Requirement: Score Range
-All checks SHALL produce a score between 0 (MinResultScore) and 10 (MaxResultScore), or an inconclusive/error result.
-
-### Requirement: Automated Assessment
-All checks SHALL be fully automatable and require no interaction from repository maintainers.
-
-### Requirement: Actionable Results
-All check results SHALL include actionable remediation guidance.
-
-## Current Checks
-
-Binary-Artifacts, Branch-Protection, CI-Tests, CII-Best-Practices, Code-Review, Contributors, Dangerous-Workflow, Dependency-Update-Tool, Fuzzing, License, Maintained, Packaging, Pinned-Dependencies, SAST, Security-Policy, Signed-Releases, Token-Permissions, Vulnerabilities, Webhooks (experimental), SBOM (experimental).
diff --git a/openspec/specs/platform-clients/spec.md b/openspec/specs/platform-clients/spec.md
deleted file mode 100644
index 9d273d1ff97..00000000000
--- a/openspec/specs/platform-clients/spec.md
+++ /dev/null
@@ -1,26 +0,0 @@
-# Platform Clients
-
-## Purpose
-
-Platform clients provide an abstraction layer over repository hosting platforms (GitHub, GitLab, Azure DevOps, local directories), allowing checks and probes to operate platform-agnostically through the `clients.RepoClient` interface.
-
-## Requirements
-
-### Requirement: RepoClient Interface
-All platform clients SHALL implement the `clients.RepoClient` interface.
-
-### Requirement: Platform Agnosticism
-Checks and probes SHALL NOT contain platform-specific logic; all platform differences SHALL be handled within client implementations.
-
-### Requirement: Authentication
-Clients SHALL support authentication via environment variables (`GITHUB_AUTH_TOKEN`, `GITLAB_AUTH_TOKEN`, `AZURE_DEVOPS_AUTH_TOKEN`).
-
-### Requirement: Rate Limiting
-The GitHub client SHALL support round-robin token rotation for rate limit management.
-
-## Supported Platforms
-
-- **GitHub** (`clients/githubrepo/`) - Stable. REST and GraphQL APIs.
-- **GitLab** (`clients/gitlabrepo/`) - Stable.
-- **Azure DevOps** (`clients/azuredevopsrepo/`) - Experimental.
-- **Local Directory** (`clients/localdir/`) - Stable. File-system-only checks.
diff --git a/openspec/specs/probes/spec.md b/openspec/specs/probes/spec.md
deleted file mode 100644
index bb32479d203..00000000000
--- a/openspec/specs/probes/spec.md
+++ /dev/null
@@ -1,42 +0,0 @@
-# Probes System
-
-## Purpose
-
-Probes are granular, individual heuristics that assess a specific behavior a project may or may not exhibit. They are the atomic units of analysis within Scorecard, composed into higher-level checks.
-
-## Requirements
-
-### Requirement: Boolean Naming
-Probe names SHALL use camelCase and be phrased as boolean statements (e.g., `hasUnverifiedBinaryArtifacts`).
-
-### Requirement: Three-File Structure
-Each probe SHALL consist of exactly three files: `def.yml` (documentation), `impl.go` (implementation), `impl_test.go` (tests).
-
-### Requirement: Finding Outcomes
-Probes SHALL return one or more `finding.Finding` values with outcomes from: `OutcomeTrue`, `OutcomeFalse`, `OutcomeNotApplicable`, `OutcomeNotAvailable`.
-
-### Requirement: Lifecycle Management
-Each probe SHALL declare a lifecycle state in `def.yml`: `Experimental`, `Stable`, or `Deprecated`.
-
-### Requirement: Registration
-Probes SHALL register via `probes.MustRegister()` (check-associated) or `probes.MustRegisterIndependent()` (standalone), and be cataloged in `probes/entries.go`.
-
-### Requirement: Remediation
-Probes SHALL provide remediation guidance in their `def.yml` definition.
-
-## Scenarios
-
-### Scenario: Probe returns true finding
-- GIVEN a repository that exhibits the behavior described by the probe
-- WHEN the probe executes
-- THEN it returns at least one finding with `OutcomeTrue`
-
-### Scenario: Probe returns false finding
-- GIVEN a repository that does not exhibit the behavior
-- WHEN the probe executes
-- THEN it returns at least one finding with `OutcomeFalse`
-
-### Scenario: Probe lacks data
-- GIVEN a repository where the relevant data is unavailable
-- WHEN the probe executes
-- THEN it returns a finding with `OutcomeNotAvailable`

From 307448f49a0f94a1e8b696448c9bf6490457d1d8 Mon Sep 17 00:00:00 2001
From: Stephen Augustus <foo@auggie.dev>
Date: Fri, 27 Feb 2026 07:19:43 -0500
Subject: [PATCH 13/28] :seedling: Add Allstar to ecosystem comparison and
 ORBIT diagram; add CQ-16

Add Allstar (Scorecard sub-project) to the ecosystem comparison table,
integration flow diagram, and ORBIT ecosystem diagram. Allstar
continuously monitors GitHub orgs and enforces Scorecard checks as
policies with auto-remediation, and already enforces controls aligned
with OSPS Baseline (branch protection, security policy, binary
artifacts, dangerous workflows).

Add Allstar to "Existing Scorecard surfaces that matter" section and
to docs/ROADMAP.md ecosystem alignment.

Add CQ-16 asking whether Allstar should be an explicit Phase 1
consumer of OSPS conformance output, and whether it is considered
part of the enforcement boundary Scorecard does not cross.

Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Stephen Augustus <foo@auggie.dev>
---
 docs/ROADMAP.md                               |  7 ++-
 .../osps-baseline-conformance/proposal.md     | 47 ++++++++++++-------
 2 files changed, 37 insertions(+), 17 deletions(-)

diff --git a/docs/ROADMAP.md b/docs/ROADMAP.md
index 9ece5eef60c..0b7fe5b4d21 100644
--- a/docs/ROADMAP.md
+++ b/docs/ROADMAP.md
@@ -75,7 +75,12 @@ Deliverables:
 ### Ecosystem alignment
 
 Scorecard operates within the ORBIT WG ecosystem as a measurement and
-evidence tool. It does not duplicate:
+evidence tool. [Allstar](https://github.com/ossf/allstar), a Scorecard
+sub-project, continuously monitors GitHub organizations and enforces
+Scorecard check results as policies. OSPS conformance output could enable
+Allstar to enforce Baseline conformance at the organization level.
+
+Scorecard does not duplicate:
 
 - **[Minder](https://github.com/mindersec/minder)** — Policy enforcement and remediation platform (OpenSSF Sandbox, ORBIT WG)
 - **[Privateer plugin for GitHub repositories](https://github.com/ossf/pvtr-github-repo-scanner)** — Baseline evaluation powered by Gemara and Security Insights
diff --git a/openspec/changes/osps-baseline-conformance/proposal.md b/openspec/changes/osps-baseline-conformance/proposal.md
index cedc28e5abf..e0eabe171c7 100644
--- a/openspec/changes/osps-baseline-conformance/proposal.md
+++ b/openspec/changes/osps-baseline-conformance/proposal.md
@@ -29,32 +29,33 @@ This is fundamentally a **product-level shift**: Scorecard today answers "how we
 
 Several tools operate in adjacent spaces. Understanding their capabilities clarifies what is and isn't Scorecard's job.
 
-| Dimension | **Scorecard** | **[Minder](https://github.com/mindersec/minder)** | **[Darnit](https://github.com/kusari-oss/darnit)** | **[AMPEL](https://github.com/carabiner-dev/ampel)** | **[Privateer GitHub Plugin](https://github.com/ossf/pvtr-github-repo-scanner)** |
-|-----------|--------------|---------|-----------|-----------|-------------|
-| **Purpose** | Security health measurement | Policy enforcement + remediation platform | Compliance audit + remediation | Attestation-based policy enforcement | Baseline conformance evaluation |
-| **Action** | Analyzes repositories (read-only) | Enforces policies, auto-remediates | Audits + fixes repositories | Verifies attestations against policies | Evaluates repos against OSPS controls |
-| **Data source** | Collects from APIs/code | Collects from APIs + consumes findings from other tools | Analyzes repo state | Consumes attestations only | Collects from GitHub API + Security Insights |
-| **Output** | Scores (0-10) + probe findings | Policy evaluation results + remediation PRs | PASS/FAIL + attestations + fixes | PASS/FAIL + results attestation | Gemara L4 assessment results |
-| **OSPS Baseline** | Partial (via probes) | Via Rego policy rules | Full (62 controls) | Via policy rules | 39 of 52 controls |
-| **In-toto** | Produces attestations | Consumes attestations | Produces attestations | Consumes + verifies | N/A |
-| **OSCAL** | No | No | No | Native support | N/A |
-| **Sigstore** | No | Verifies signatures | Signs attestations | Verifies signatures | N/A |
-| **Gemara** | Not yet (planned) | No | No | No | L2 + L4 native |
-| **Maturity** | Production (v5.3.0) | Sandbox (OpenSSF, donated Oct 2024) | Alpha (v0.1.0, Jan 2026) | Production (v1.0.0) | Production, powers LFX Insights |
-| **Language** | Go | Go | Python | Go | Go |
+| Dimension | **Scorecard** | **[Allstar](https://github.com/ossf/allstar)** | **[Minder](https://github.com/mindersec/minder)** | **[Darnit](https://github.com/kusari-oss/darnit)** | **[AMPEL](https://github.com/carabiner-dev/ampel)** | **[Privateer GitHub Plugin](https://github.com/ossf/pvtr-github-repo-scanner)** |
+|-----------|--------------|---------|---------|-----------|-----------|-------------|
+| **Purpose** | Security health measurement | GitHub policy enforcement | Policy enforcement + remediation platform | Compliance audit + remediation | Attestation-based policy enforcement | Baseline conformance evaluation |
+| **Action** | Analyzes repositories (read-only) | Monitors orgs, opens issues, auto-fixes settings | Enforces policies, auto-remediates | Audits + fixes repositories | Verifies attestations against policies | Evaluates repos against OSPS controls |
+| **Data source** | Collects from APIs/code | Collects from GitHub API + runs Scorecard checks | Collects from APIs + consumes findings from other tools | Analyzes repo state | Consumes attestations only | Collects from GitHub API + Security Insights |
+| **Output** | Scores (0-10) + probe findings | GitHub issues + auto-remediated settings | Policy evaluation results + remediation PRs | PASS/FAIL + attestations + fixes | PASS/FAIL + results attestation | Gemara L4 assessment results |
+| **OSPS Baseline** | Partial (via probes) | Indirect (enforces subset via Scorecard checks) | Via Rego policy rules | Full (62 controls) | Via policy rules | 39 of 52 controls |
+| **In-toto** | Produces attestations | N/A | Consumes attestations | Produces attestations | Consumes + verifies | N/A |
+| **OSCAL** | No | No | No | No | Native support | N/A |
+| **Sigstore** | No | No | Verifies signatures | Signs attestations | Verifies signatures | N/A |
+| **Gemara** | Not yet (planned) | No | No | No | No | L2 + L4 native |
+| **Maturity** | Production (v5.3.0) | Production (v4.5, Scorecard sub-project) | Sandbox (OpenSSF, donated Oct 2024) | Alpha (v0.1.0, Jan 2026) | Production (v1.0.0) | Production, powers LFX Insights |
+| **Language** | Go | Go | Go | Python | Go | Go |
 
 **Integration model:**
 
 ```mermaid
 flowchart LR
-    Scorecard["Scorecard<br/>(Measure)"] -->|findings + attestations| Minder["Minder<br/>(Enforce + Remediate)"]
+    Scorecard["Scorecard<br/>(Measure)"] -->|checks| Allstar["Allstar<br/>(Enforce on GitHub)"]
+    Scorecard -->|findings + attestations| Minder["Minder<br/>(Enforce + Remediate)"]
     Scorecard -->|findings + attestations| Darnit["Darnit<br/>(Audit + Remediate)"]
     Scorecard -->|findings + attestations| AMPEL["AMPEL<br/>(Enforce)"]
     Darnit -->|compliance attestation| AMPEL
     Scorecard -->|conformance evidence| Privateer["Privateer Plugin<br/>(Baseline evaluation)"]
 ```
 
-Scorecard is the **data source** (measures repository security). Minder consumes Scorecard findings to enforce policies and auto-remediate across repositories. Darnit audits compliance and remediates. AMPEL enforces policies on attestations. The Privateer plugin evaluates Baseline conformance. They are complementary, not competing.
+Scorecard is the **data source** (measures repository security). [Allstar](https://github.com/ossf/allstar) is a Scorecard sub-project that continuously monitors GitHub organizations and enforces Scorecard check results as policies (opening issues or auto-remediating settings). Minder consumes Scorecard findings to enforce policies and auto-remediate across repositories. Darnit audits compliance and remediates. AMPEL enforces policies on attestations. The Privateer plugin evaluates Baseline conformance. They are complementary, not competing.
 
 ### What Scorecard must not do
 
@@ -75,6 +76,7 @@ A fresh analysis of Scorecard's current coverage against OSPS Baseline v2026.02.
 - **Checks** produce 0-10 scores — useful as signal but not conformance results
 - **Probes** produce structured boolean findings — the right granularity for control mapping
 - **Output formats** (JSON, SARIF, probe, in-toto) — OSPS output is a new format alongside these
+- **[Allstar](https://github.com/ossf/allstar)** (Scorecard sub-project) — continuously monitors GitHub organizations and enforces Scorecard checks as policies with auto-remediation. Allstar already enforces several controls aligned with OSPS Baseline (branch protection, security policy, binary artifacts, dangerous workflows). OSPS conformance output could enable Allstar to enforce Baseline conformance at the organization level.
 - **Multi-repo scanning** (`--repos`, `--org`) — needed for OSPS-QA-04.02 (subproject conformance)
 - **Serve mode** — HTTP surface for pipeline integration
 
@@ -204,7 +206,10 @@ flowchart TD
 
         subgraph Evaluation["Evaluation"]
             Privateer["Privateer GitHub Plugin<br/>(LFX Insights driver)"]
-            Scorecard["OpenSSF Scorecard<br/>(deep analysis, conformance output,<br/>multi-platform, large install base)"]
+            subgraph ScorecardEcosystem["Scorecard Ecosystem"]
+                Scorecard["OpenSSF Scorecard<br/>(deep analysis, conformance output,<br/>multi-platform, large install base)"]
+                Allstar["Allstar<br/>(GitHub policy enforcement,<br/>Scorecard sub-project)"]
+            end
         end
 
         Minder["Minder<br/>(enforce + remediate)"]
@@ -219,6 +224,7 @@ flowchart TD
     SI -->|provides metadata| Privateer
     SI -->|provides metadata| Scorecard
     SI -->|provides metadata| Minder
+    Scorecard -->|checks| Allstar
     Scorecard -->|conformance evidence| Privateer
     Scorecard -->|findings| Minder
     Scorecard -->|findings| Darnit
@@ -456,3 +462,12 @@ The coverage analysis (`docs/osps-baseline-coverage.md`) now includes a section
 Should we adopt these existing issues as the starting work items for Phase 1, or create new issues that reference them? Some of these issues have significant discussion history that may contain design decisions worth preserving.
 
 **Stephen's response:**
+
+
+#### CQ-16: Allstar's role in OSPS conformance enforcement
+
+[Allstar](https://github.com/ossf/allstar) is a Scorecard sub-project that continuously monitors GitHub organizations and enforces Scorecard check results as policies (branch protection, binary artifacts, security policy, dangerous workflows). It already enforces a subset of controls aligned with OSPS Baseline.
+
+With OSPS conformance output, Allstar could potentially enforce Baseline conformance at the organization level — e.g., opening issues or auto-remediating when a repository falls below Level 1 conformance. Should the proposal explicitly include Allstar as a Phase 1 consumer of OSPS output, or should that be deferred? And more broadly, should Allstar be considered part of the "enforcement" boundary that Scorecard itself does not cross, even though it is a Scorecard sub-project?
+
+**Stephen's response:**

From 101b0b145035cc73f855ae582f31ecf8c6dee123 Mon Sep 17 00:00:00 2001
From: Stephen Augustus <foo@auggie.dev>
Date: Fri, 27 Feb 2026 16:39:15 -0500
Subject: [PATCH 14/28] :seedling: Integrate Eddie Knight's ORBIT WG feedback
 into proposal
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Add "ORBIT WG feedback" section documenting Eddie Knight's feedback
from PR #4952. Eddie is the ORBIT WG TSC Chair and maintainer of
Gemara, Privateer, and OSPS Baseline.

Five feedback items documented as EK-1 through EK-5:
- EK-1: Mapping file could live in Baseline repo with CODEOWNERS
- EK-2: No "OSPS output format" exists; use Gemara SDK formats
- EK-3: Current proposal duplicates Privateer despite stating otherwise
- EK-4: Catalog extraction needs concrete implementation plan
- EK-5: Alternative architecture — shared plugin model

Add five new clarifying questions (CQ-17 through CQ-21) for Steering
Committee decisions:
- CQ-17: Mapping file location (Scorecard repo vs shared)
- CQ-18: Output format (--format=osps vs Gemara SDK)
- CQ-19: Build vs integrate (own engine vs shared plugin)
- CQ-20: Catalog extraction scope
- CQ-21: Privateer code duplication acceptability

Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Stephen Augustus <foo@auggie.dev>
---
 .../osps-baseline-conformance/proposal.md     | 138 ++++++++++++++++++
 1 file changed, 138 insertions(+)

diff --git a/openspec/changes/osps-baseline-conformance/proposal.md b/openspec/changes/osps-baseline-conformance/proposal.md
index e0eabe171c7..2fe04788a29 100644
--- a/openspec/changes/osps-baseline-conformance/proposal.md
+++ b/openspec/changes/osps-baseline-conformance/proposal.md
@@ -464,6 +464,144 @@ Should we adopt these existing issues as the starting work items for Phase 1, or
 **Stephen's response:**
 
 
+### ORBIT WG feedback
+
+The following feedback was provided by Eddie Knight (ORBIT WG Technical Steering Committee Chair, maintainer of Gemara, Privateer, and OSPS Baseline) on [PR #4952](https://github.com/ossf/scorecard/pull/4952).
+
+#### EK-1: Mapping file location
+
+> "Regarding mappings between Baseline catalog<->Scorecard checks, it is possible to easily put that into a new file with Scorecard maintainers as codeowners, pending approval from scorecard maintainers."
+
+Eddie is offering to host the Baseline-to-Scorecard mapping in the OSPS Baseline repository (or a shared location) with Scorecard maintainers as CODEOWNERS. The current proposal places the mapping in the Scorecard repo (`pkg/osps/mappings/v2026-02-19.yaml`).
+
+This affects ownership, versioning cadence, and who can update the mapping when controls or probes change. The trade-offs:
+
+- **In Scorecard repo**: Scorecard maintainers fully own the mapping. Mapping updates are coupled to Scorecard releases. Other tools cannot easily consume the mapping.
+- **In Baseline repo (or shared)**: Mapping is co-owned. Versioned alongside the Baseline spec. Other tools (Privateer, Darnit, Minder) can consume the same mapping. Scorecard maintainers retain CODEOWNERS authority.
+
+#### EK-2: Output format — no "OSPS output format"
+
+> "There is not an 'OSPS output format,' and even the relevant Gemara schemas (which are quite opinionated) are still designed to support output in multiple output formats within the SDK, such as SARIF. I would expect that you'd keep your current output logic, and then _maybe_ add basic Gemara json/yaml as another option."
+
+The current proposal defines `--format=osps` as a new output format. Eddie clarifies that the ORBIT ecosystem does not define a special "OSPS output format" — instead, the Gemara SDK supports multiple output formats (including SARIF). The suggestion is to keep Scorecard's existing output logic and optionally add Gemara JSON/YAML as another format option.
+
+This is a significant clarification that affects the spec's output requirements, the Phase 1 deliverables, and how we frame the conformance layer.
+
+#### EK-3: Technical relationship with Privateer plugin
+
+> "There is a stated goal of not duplicating the code from the plugin ossf/pvtr-github-repo-scanner, but the implementation plan as it's currently written does require duplication. In the current proposal, there would not be a technical relationship between the two codebases."
+
+Eddie identifies a contradiction: the proposal says "do not duplicate Privateer" but proposes building a parallel conformance engine with no code-level relationship to the Privateer plugin. The current plan would result in two separate codebases evaluating the same OSPS controls independently.
+
+#### EK-4: Catalog extraction needs an implementation plan
+
+> "There is cursory mention of a scorecard _catalog extraction_, which I'm hugely in favor of, but I don't see an implementation plan for that."
+
+The proposal mentions "Scorecard control catalog extraction plan" as a Phase 1 deliverable but does not specify what this means concretely. Eddie wants to see how Scorecard's checks/probes would be made consumable by other tools.
+
+#### EK-5: Alternative architecture — shared plugin model
+
+> "An alternative plan would be for us to spend a week consolidating checks/probes into the pvtr plugin (with relevant CODEOWNERS), then update Scorecard to selectively execute the plugin under the covers."
+
+Eddie proposes a fundamentally different architecture:
+
+1. Consolidate Scorecard checks/probes into the [Privateer plugin](https://github.com/ossf/pvtr-github-repo-scanner) as shared evaluation logic
+2. Scorecard executes the plugin under the covers for Baseline evaluation
+3. Privateer and LFX Insights can optionally run Scorecard checks via the same plugin
+
+**Claimed benefits:**
+- Extract the Scorecard control catalog for independent versioning
+- Instantly integrate Gemara into Scorecard
+- Allow bidirectional check execution (Scorecard runs Privateer checks; Privateer runs Scorecard checks)
+- Simplify contribution overhead for individual checks
+- Improve both codebases through shared logic
+
+**This is the central architectural decision for the proposal.** The Steering Committee needs to evaluate this against the current plan (Scorecard builds its own conformance engine).
+
+---
+
+The following clarifying questions require Steering Committee decisions informed by Eddie's feedback.
+
+#### CQ-17: Mapping file location — Scorecard repo or shared?
+
+Eddie offers to host the Baseline-to-Scorecard mapping in the Baseline repository with Scorecard maintainers as CODEOWNERS (EK-1). The current proposal places it in the Scorecard repo.
+
+Options:
+1. **Scorecard repo** (`pkg/osps/mappings/`): Scorecard owns the mapping entirely. Mapping is coupled to Scorecard releases and probe changes.
+2. **Baseline repo** (or shared location): Co-owned with ORBIT WG. Other tools can consume the same mapping. Scorecard maintainers retain CODEOWNERS authority over their portion.
+3. **Both**: Scorecard maintains a local mapping for runtime use; a shared mapping in the Baseline repo serves as the cross-tool reference. Keep them in sync.
+
+Which approach do you prefer?
+
+**Stephen's response:**
+
+
+#### CQ-18: Output format — `--format=osps` vs. Gemara SDK output
+
+Eddie clarifies that no "OSPS output format" exists in the ORBIT ecosystem (EK-2). The Gemara SDK supports multiple formats (JSON, YAML, SARIF). He suggests Scorecard keep its existing output logic and optionally add Gemara JSON/YAML as another format.
+
+This affects the spec's requirement for `--format=osps`. Options:
+1. **Keep `--format=osps`**: Define a Scorecard-specific conformance output format. Risk: inventing a format that doesn't align with the ecosystem.
+2. **Use `--format=gemara`** (or similar): Integrate the Gemara SDK and output Gemara L4 assessment results in JSON/YAML. Aligns with ecosystem, but creates a Gemara SDK dependency.
+3. **Extend existing formats**: Add conformance data to `--format=json` and `--format=sarif` outputs. No new format flag needed.
+
+Which approach do you prefer? Does the Gemara SDK dependency concern you?
+
+**Stephen's response:**
+
+
+#### CQ-19: Architectural direction — build vs. integrate
+
+This is the central decision. Eddie proposes consolidating Scorecard checks/probes into the Privateer plugin and having Scorecard execute the plugin (EK-5). The current proposal has Scorecard building its own conformance engine.
+
+**Option A: Scorecard builds its own conformance engine** (current proposal)
+- Scorecard adds a mapping file, conformance evaluation logic, and output format
+- No code-level dependency on Privateer
+- Scorecard controls its own release cadence and architecture
+- Risk: duplicates evaluation logic, no technical relationship with Privateer (EK-3)
+
+**Option B: Shared plugin model** (Eddie's alternative)
+- Scorecard checks/probes are consolidated into the Privateer plugin
+- Scorecard executes the plugin under the covers
+- Bidirectional: Privateer can also run Scorecard checks
+- Gemara integration comes for free via the plugin
+- Risk: Scorecard takes a dependency on an external plugin's release cadence and architecture; multi-platform support (GitLab, Azure DevOps, local) may not fit the plugin model; architectural coupling
+
+**Option C: Hybrid**
+- Scorecard maintains its own probe execution (its core competency)
+- Scorecard exports its probe results in a format the Privateer plugin can consume (Gemara L4)
+- The Privateer plugin consumes Scorecard output as supplementary evidence
+- Control catalog is extracted and shared, but evaluation logic stays separate
+- No code-level coupling, but interoperable output
+
+Which option do you prefer? What are your concerns about taking a dependency on the Privateer plugin codebase?
+
+**Stephen's response:**
+
+
+#### CQ-20: Catalog extraction — what does it mean concretely?
+
+Eddie is "hugely in favor" of extracting the Scorecard control catalog (EK-4) but the proposal lacks an implementation plan. Concretely, this could mean:
+
+1. **Machine-readable probe definitions**: Export `probes/*/def.yml` as a versioned catalog (already exists in the repo, but not packaged for external consumption)
+2. **Gemara L2 control definitions**: Map Scorecard probes to Gemara Layer 2 schema entries, making them available in the Gemara catalog
+3. **Shared evaluation steps**: Extract Scorecard's probe logic into a reusable Go library or Privateer plugin steps that other tools can execute
+4. **API-level catalog**: Expose probe definitions via the Scorecard API so tools can discover what Scorecard can evaluate
+
+What level of extraction do you envision? Is option 2 (Gemara L2 integration) the right target, or should we start simpler?
+
+**Stephen's response:**
+
+
+#### CQ-21: Privateer code duplication — is it acceptable?
+
+Eddie points out that the current proposal would result in two codebases evaluating the same OSPS controls independently (EK-3). Even if the proposal says "don't duplicate Privateer," building a separate conformance engine effectively does that.
+
+Is some duplication acceptable if it means Scorecard retains architectural independence? Or is avoiding duplication a hard constraint that should drive us toward the shared plugin model (CQ-19 Option B)?
+
+**Stephen's response:**
+
+
 #### CQ-16: Allstar's role in OSPS conformance enforcement
 
 [Allstar](https://github.com/ossf/allstar) is a Scorecard sub-project that continuously monitors GitHub organizations and enforces Scorecard check results as policies (branch protection, binary artifacts, security policy, dangerous workflows). It already enforces a subset of controls aligned with OSPS Baseline.

From 5bf8487b45f079b40f0397e3854c9d52674669ce Mon Sep 17 00:00:00 2001
From: Stephen Augustus <justaugustus@users.noreply.github.com>
Date: Fri, 27 Feb 2026 19:00:11 -0500
Subject: [PATCH 15/28] Apply suggestions from Eddie's code review

These are clarifying edits to ensure we've captured the recommendations
and open questions correctly.

Co-authored-by: Eddie Knight <knight@linux.com>
Signed-off-by: Stephen Augustus <justaugustus@users.noreply.github.com>
Signed-off-by: Stephen Augustus <foo@auggie.dev>
---
 .../osps-baseline-conformance/proposal.md     | 23 +++++++++++--------
 1 file changed, 14 insertions(+), 9 deletions(-)

diff --git a/openspec/changes/osps-baseline-conformance/proposal.md b/openspec/changes/osps-baseline-conformance/proposal.md
index 2fe04788a29..329b20b6dd6 100644
--- a/openspec/changes/osps-baseline-conformance/proposal.md
+++ b/openspec/changes/osps-baseline-conformance/proposal.md
@@ -470,14 +470,16 @@ The following feedback was provided by Eddie Knight (ORBIT WG Technical Steering
 
 #### EK-1: Mapping file location
 
-> "Regarding mappings between Baseline catalog<->Scorecard checks, it is possible to easily put that into a new file with Scorecard maintainers as codeowners, pending approval from scorecard maintainers."
+> "Regarding mappings between Baseline catalog<->Scorecard checks, it is possible to easily put that into a new file with Scorecard maintainers as codeowners, pending approval from OSPS Baseline maintainers for the change."
 
 Eddie is offering to host the Baseline-to-Scorecard mapping in the OSPS Baseline repository (or a shared location) with Scorecard maintainers as CODEOWNERS. The current proposal places the mapping in the Scorecard repo (`pkg/osps/mappings/v2026-02-19.yaml`).
 
-This affects ownership, versioning cadence, and who can update the mapping when controls or probes change. The trade-offs:
+Mappings currently exist within the Baseline Catalog and are proposed for addition to the Scorecard repository as well. The mappings could be maintained in one or both of the projects. This affects ownership, versioning cadence, and who can update the mapping when controls or probes change.
+
+The trade-offs:
 
 - **In Scorecard repo**: Scorecard maintainers fully own the mapping. Mapping updates are coupled to Scorecard releases. Other tools cannot easily consume the mapping.
-- **In Baseline repo (or shared)**: Mapping is co-owned. Versioned alongside the Baseline spec. Other tools (Privateer, Darnit, Minder) can consume the same mapping. Scorecard maintainers retain CODEOWNERS authority.
+- **In Baseline repo (or shared)**: Mapping is co-owned. Versioned alongside the Baseline spec. End users and other tools (Privateer, Darnit, Minder) can consume the same mapping. Scorecard maintainers retain CODEOWNERS authority.
 
 #### EK-2: Output format — no "OSPS output format"
 
@@ -497,7 +499,7 @@ Eddie identifies a contradiction: the proposal says "do not duplicate Privateer"
 
 > "There is cursory mention of a scorecard _catalog extraction_, which I'm hugely in favor of, but I don't see an implementation plan for that."
 
-The proposal mentions "Scorecard control catalog extraction plan" as a Phase 1 deliverable but does not specify what this means concretely. Eddie wants to see how Scorecard's checks/probes would be made consumable by other tools.
+The proposal mentions "Scorecard control catalog extraction plan" as a Phase 1 deliverable but does not specify what this means concretely or how it would be achieved.
 
 #### EK-5: Alternative architecture — shared plugin model
 
@@ -506,11 +508,11 @@ The proposal mentions "Scorecard control catalog extraction plan" as a Phase 1 d
 Eddie proposes a fundamentally different architecture:
 
 1. Consolidate Scorecard checks/probes into the [Privateer plugin](https://github.com/ossf/pvtr-github-repo-scanner) as shared evaluation logic
-2. Scorecard executes the plugin under the covers for Baseline evaluation
+2. Scorecard executes the plugin under the covers for Baseline evaluation and then Scorecard handles follow-up logic such as scoring and storing the results
 3. Privateer and LFX Insights can optionally run Scorecard checks via the same plugin
 
 **Claimed benefits:**
-- Extract the Scorecard control catalog for independent versioning
+- Extract the Scorecard control catalog for independent versioning and cross-catalog mapping to Baseline
 - Instantly integrate Gemara into Scorecard
 - Allow bidirectional check execution (Scorecard runs Privateer checks; Privateer runs Scorecard checks)
 - Simplify contribution overhead for individual checks
@@ -533,6 +535,8 @@ Options:
 
 Which approach do you prefer?
 
+_Note that this question is negated if consolidating check logic within `pvtr-github-repo-scanner`, because then the mappings are managed within the control catalog in Gemara format._
+
 **Stephen's response:**
 
 
@@ -563,15 +567,16 @@ This is the central decision. Eddie proposes consolidating Scorecard checks/prob
 **Option B: Shared plugin model** (Eddie's alternative)
 - Scorecard checks/probes are consolidated into the Privateer plugin
 - Scorecard executes the plugin under the covers
-- Bidirectional: Privateer can also run Scorecard checks
+- Bidirectional: Privateer users can also run Scorecard checks e.g., LFX Insights
 - Gemara integration comes for free via the plugin
-- Risk: Scorecard takes a dependency on an external plugin's release cadence and architecture; multi-platform support (GitLab, Azure DevOps, local) may not fit the plugin model; architectural coupling
+- Risk: Scorecard releases are coupled to plugin's release cadence; CODEOWNERS in the second repo must be meticulously managed to avoid surprises; multi-platform support (GitLab, Azure DevOps, local) will require maintenance of independent plugins with isolated data collection for each platform
 
 **Option C: Hybrid**
 - Scorecard maintains its own probe execution (its core competency)
-- Scorecard exports its probe results in a format the Privateer plugin can consume (Gemara L4)
+- Scorecard exports its probe results in a format the Privateer plugin can consume (Gemara L5)
 - The Privateer plugin consumes Scorecard output as supplementary evidence
 - Control catalog is extracted and shared, but evaluation logic stays separate
+- Users will choose between the Privateer plugin and Scorecard for Baseline evaluations
 - No code-level coupling, but interoperable output
 
 Which option do you prefer? What are your concerns about taking a dependency on the Privateer plugin codebase?

From 69424ebfe5b6fdee8d33e6bb239727a10e0ba0d5 Mon Sep 17 00:00:00 2001
From: Stephen Augustus <foo@auggie.dev>
Date: Mon, 2 Mar 2026 04:52:56 -0500
Subject: [PATCH 16/28] :seedling: Add stakeholder annotations and decision
 priorities

Add stakeholders to each open question and clarifying question
in the proposal. Add a new Decision Priority Analysis section
that organizes questions into tiers based on dependencies.

Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Stephen Augustus <foo@auggie.dev>
---
 .../osps-baseline-conformance/proposal.md     | 89 +++++++++++++++++++
 1 file changed, 89 insertions(+)

diff --git a/openspec/changes/osps-baseline-conformance/proposal.md b/openspec/changes/osps-baseline-conformance/proposal.md
index 329b20b6dd6..43145a1398e 100644
--- a/openspec/changes/osps-baseline-conformance/proposal.md
+++ b/openspec/changes/osps-baseline-conformance/proposal.md
@@ -89,6 +89,8 @@ The following questions were raised by Spencer (Steering Committee member) durin
 > "The attestation/provenance layer. What is doing the attestation? Is this some OIDC? A personal token? A workflow (won't have the right tokens)?"
 > — Spencer, on Section 5.1
 
+**Stakeholders:** Spencer (raised this, flagged as blocking), Stephen, Steering Committee
+
 This is a fundamental design question. Options include:
 - **Repo-local metadata files** (e.g., Security Insights, `.osps-attestations.yml`): simplest, no cryptographic identity, maintainer self-declares by committing the file.
 - **Signed attestations via Sigstore/OIDC**: strongest guarantees, but requires workflow identity and the right tokens — which Spencer correctly notes may not be available in all contexts.
@@ -101,6 +103,8 @@ This is a fundamental design question. Options include:
 > "I thought the other doc said Scorecard wasn't an enforcement tool?"
 > — Spencer, on Q4 deliverables (enforcement detection)
 
+**Stakeholders:** Spencer (raised this), Stephen, Steering Committee
+
 This is a critical framing question. The roadmap proposes *detecting* whether enforcement exists (e.g., "are SAST results required to pass before merge?"), not *performing* enforcement. But the line between "detecting enforcement" and "being an enforcement tool" needs to be drawn clearly.
 
 **Recommendation to discuss**: Scorecard detects and reports whether enforcement mechanisms are in place. It does not itself enforce. The `--fail-on=fail` CI gating is a reporting exit code, not an enforcement action — the CI system is the enforcer. This distinction should be documented explicitly.
@@ -110,6 +114,8 @@ This is a critical framing question. The roadmap proposes *detecting* whether en
 > "Not sure I see the importance [of `scan_scope`]"
 > — Spencer, on Section 9 (output schema)
 
+**Stakeholders:** Stephen (can resolve alone)
+
 The `scan_scope` field (repo|org|repos) in the proposed OSPS output schema may not carry meaningful information. If the output always describes a single repository's conformance, the scope is implicit.
 
 **Recommendation to discuss**: Drop `scan_scope` from the schema unless multi-repo aggregation (OSPS-QA-04.02) produces a fundamentally different output shape. Revisit in Q4 when project-level aggregation is implemented.
@@ -119,6 +125,8 @@ The `scan_scope` field (repo|org|repos) in the proposed OSPS output schema may n
 > "[Evidence] should be probe-based only, not check"
 > — Spencer, on Section 9 (output schema)
 
+**Stakeholders:** Spencer (raised this), Stephen — effectively resolved (adopted)
+
 Spencer's position is that OSPS evidence references should point to probe findings, not check-level results. This aligns with the architectural direction of Scorecard v5 (probes as the measurement unit, checks as scoring aggregations).
 
 **Recommendation**: Adopt this. The `evidence` array in the OSPS output schema should reference probes and their findings only. Checks may be listed in a `derived_from` field for human context but are not evidence.
@@ -384,6 +392,8 @@ We need to land these capabilities for as much surface area as possible.
 
 #### CQ-9: Coverage analysis and Phase 1 scope validation
 
+**Stakeholders:** Stephen (can answer alone)
+
 The coverage analysis (`docs/osps-baseline-coverage.md`) identifies 25 Level 1 controls. Of those, 6 are COVERED, 8 are PARTIAL, 9 are GAP, and 2 are NOT_OBSERVABLE. The Phase 1 plan targets closing the 9 GAP controls. Given that 2 controls (AC-01.01, AC-02.01) are NOT_OBSERVABLE without org-admin tokens, should Phase 1 explicitly include work on improving observability (e.g., documenting what tokens are needed, or providing guidance for org admins), or should those controls remain UNKNOWN until a later phase?
 
 **Stephen's response:**
@@ -391,6 +401,8 @@ The coverage analysis (`docs/osps-baseline-coverage.md`) identifies 25 Level 1 c
 
 #### CQ-10: Mapping file ownership and contribution model
 
+**Stakeholders:** Stephen, Eddie Knight, Baseline maintainers — partially superseded by CQ-17
+
 The versioned mapping file (e.g., `pkg/osps/mappings/v2026-02-19.yaml`) is a critical artifact that defines which probes satisfy which OSPS controls. Who should own this file? Options:
 - Scorecard maintainers only (changes require maintainer review)
 - Community-contributed with maintainer approval (like checks/probes today)
@@ -403,6 +415,8 @@ This also affects how we handle disagreements about whether a probe truly satisf
 
 #### CQ-11: Backwards compatibility of OSPS output format
 
+**Stakeholders:** Stephen, Spencer, Eddie Knight — depends on CQ-18 (output format decision)
+
 The spec requires `--format=osps` as a new output format. Since this is a new surface, we have freedom to iterate on the schema. However, once shipped, consumers will depend on it. What stability guarantees should we offer?
 - No guarantees during Phase 1 (alpha schema, may break between releases)
 - Semver-like schema versioning from day one (breaking changes increment major version)
@@ -413,6 +427,8 @@ The spec requires `--format=osps` as a new output format. Since this is a new su
 
 #### CQ-12: Probe gap prioritization for Phase 1
 
+**Stakeholders:** Stephen (can answer alone)
+
 The coverage analysis identifies 7 Level 1 GAP controls that need new probes (excluding the 2 that depend on Security Insights). Ranked by implementation feasibility:
 
 1. OSPS-GV-03.01 — CONTRIBUTING file presence
@@ -430,6 +446,8 @@ Do you agree with this priority ordering? Are there any controls you would move
 
 #### CQ-13: Minder integration surface
 
+**Stakeholders:** Stephen, Minder maintainers, Steering Committee
+
 [Minder](https://github.com/mindersec/minder) is an OpenSSF Sandbox project within the ORBIT WG that already consumes Scorecard findings to enforce security policies and auto-remediate across repositories. Minder uses Rego-based rules and can enforce OSPS Baseline controls via policy profiles. A draft Scorecard PR (#4723, now stale) attempted deeper integration by enabling Scorecard checks to be written using Minder's Rego rule engine.
 
 Given Minder's position in the ORBIT ecosystem:
@@ -442,6 +460,8 @@ Given Minder's position in the ORBIT ecosystem:
 
 #### CQ-14: Darnit vs. Minder delineation
 
+**Stakeholders:** Stephen (can answer alone)
+
 The proposal lists both [Darnit](https://github.com/kusari-oss/darnit) and [Minder](https://github.com/mindersec/minder) as tools that handle remediation and enforcement. Their capabilities overlap in some areas (both can enforce Baseline controls, both can remediate). For Scorecard's purposes, the distinction matters primarily for the "What Scorecard must not do" boundary.
 
 Is the current framing correct — that Scorecard is the measurement layer and both Minder and Darnit are downstream consumers? Or should we position Scorecard differently relative to one versus the other, given that Minder is an OpenSSF project in the same working group while Darnit is not?
@@ -451,6 +471,8 @@ Is the current framing correct — that Scorecard is the measurement layer and b
 
 #### CQ-15: Existing issues as Phase 1 work items
 
+**Stakeholders:** Stephen (can answer alone)
+
 The coverage analysis (`docs/osps-baseline-coverage.md`) now includes a section mapping existing Scorecard issues to OSPS Baseline gaps. Several long-standing issues align directly with Phase 1 priorities:
 
 - [#30](https://github.com/ossf/scorecard/issues/30) — Secrets scanning (OSPS-BR-07.01), open since the project's earliest days
@@ -526,6 +548,8 @@ The following clarifying questions require Steering Committee decisions informed
 
 #### CQ-17: Mapping file location — Scorecard repo or shared?
 
+**Stakeholders:** Stephen, Eddie Knight, OSPS Baseline maintainers
+
 Eddie offers to host the Baseline-to-Scorecard mapping in the Baseline repository with Scorecard maintainers as CODEOWNERS (EK-1). The current proposal places it in the Scorecard repo.
 
 Options:
@@ -542,6 +566,8 @@ _Note that this question is negated if consolidating check logic within `pvtr-gi
 
 #### CQ-18: Output format — `--format=osps` vs. Gemara SDK output
 
+**Stakeholders:** Stephen, Spencer (OQ-4 constrains this), Eddie Knight
+
 Eddie clarifies that no "OSPS output format" exists in the ORBIT ecosystem (EK-2). The Gemara SDK supports multiple formats (JSON, YAML, SARIF). He suggests Scorecard keep its existing output logic and optionally add Gemara JSON/YAML as another format.
 
 This affects the spec's requirement for `--format=osps`. Options:
@@ -556,6 +582,8 @@ Which approach do you prefer? Does the Gemara SDK dependency concern you?
 
 #### CQ-19: Architectural direction — build vs. integrate
 
+**Stakeholders:** Stephen, Spencer, Eddie Knight, Steering Committee, at least 1 non-Steering maintainer — this is the gating decision; most other open questions depend on its outcome
+
 This is the central decision. Eddie proposes consolidating Scorecard checks/probes into the Privateer plugin and having Scorecard execute the plugin (EK-5). The current proposal has Scorecard building its own conformance engine.
 
 **Option A: Scorecard builds its own conformance engine** (current proposal)
@@ -586,6 +614,8 @@ Which option do you prefer? What are your concerns about taking a dependency on
 
 #### CQ-20: Catalog extraction — what does it mean concretely?
 
+**Stakeholders:** Stephen, Eddie Knight, Steering Committee
+
 Eddie is "hugely in favor" of extracting the Scorecard control catalog (EK-4) but the proposal lacks an implementation plan. Concretely, this could mean:
 
 1. **Machine-readable probe definitions**: Export `probes/*/def.yml` as a versioned catalog (already exists in the repo, but not packaged for external consumption)
@@ -600,6 +630,8 @@ What level of extraction do you envision? Is option 2 (Gemara L2 integration) th
 
 #### CQ-21: Privateer code duplication — is it acceptable?
 
+**Stakeholders:** Stephen, Spencer, Eddie Knight, Steering Committee — flows from CQ-19
+
 Eddie points out that the current proposal would result in two codebases evaluating the same OSPS controls independently (EK-3). Even if the proposal says "don't duplicate Privateer," building a separate conformance engine effectively does that.
 
 Is some duplication acceptable if it means Scorecard retains architectural independence? Or is avoiding duplication a hard constraint that should drive us toward the shared plugin model (CQ-19 Option B)?
@@ -609,8 +641,65 @@ Is some duplication acceptable if it means Scorecard retains architectural indep
 
 #### CQ-16: Allstar's role in OSPS conformance enforcement
 
+**Stakeholders:** Stephen (can answer alone)
+
 [Allstar](https://github.com/ossf/allstar) is a Scorecard sub-project that continuously monitors GitHub organizations and enforces Scorecard check results as policies (branch protection, binary artifacts, security policy, dangerous workflows). It already enforces a subset of controls aligned with OSPS Baseline.
 
 With OSPS conformance output, Allstar could potentially enforce Baseline conformance at the organization level — e.g., opening issues or auto-remediating when a repository falls below Level 1 conformance. Should the proposal explicitly include Allstar as a Phase 1 consumer of OSPS output, or should that be deferred? And more broadly, should Allstar be considered part of the "enforcement" boundary that Scorecard itself does not cross, even though it is a Scorecard sub-project?
 
 **Stephen's response:**
+
+
+### Decision priority analysis
+
+The open questions have dependencies between them. Answering them in the
+wrong order will result in rework. The recommended sequence follows.
+
+#### Tier 1 — Gating decisions (answer before all others)
+
+| Question | Why it gates | Who decides |
+|----------|-------------|-------------|
+| **CQ-19** | Architectural direction (build vs. integrate vs. hybrid). If answered as Option B (shared plugin), CQ-17, CQ-18, CQ-20, and CQ-21 are either resolved or fundamentally reframed. | Stephen + Spencer + Eddie Knight + Steering Committee |
+| **OQ-1** | Attestation identity model. Spencer flagged as blocking. Determines how non-automatable controls are handled across all phases. | Spencer + Stephen + Steering Committee |
+
+#### Tier 2 — Downstream of CQ-19 (answer once Tier 1 is resolved)
+
+| Question | Dependency | Who decides |
+|----------|-----------|-------------|
+| **CQ-18** | Output format depends on CQ-19's architectural direction. | Stephen + Spencer + Eddie Knight |
+| **CQ-17** | Mapping file location depends on CQ-19 (negated if Option B). | Stephen + Eddie Knight + Baseline maintainers |
+| **OQ-2** | Enforcement detection scope. Affects Phase 3 scope. | Spencer + Stephen + Steering Committee |
+
+#### Tier 3 — Important but non-blocking for Phase 1 start
+
+| Question | Notes | Who decides |
+|----------|-------|-------------|
+| **CQ-20** | Catalog extraction scope. Flows from CQ-19. | Stephen + Eddie Knight |
+| **CQ-21** | Code duplication tolerance. Flows from CQ-19. | Stephen + Spencer + Eddie Knight |
+| **CQ-13** | Minder integration surface. Affects ecosystem positioning. | Stephen + Minder maintainers |
+| **CQ-11** | Output stability guarantees. Depends on CQ-18. | Stephen + Spencer |
+
+#### Tier 4 — Stephen can answer alone (any time)
+
+| Question | Notes |
+|----------|-------|
+| **CQ-9** | NOT_OBSERVABLE controls — implementation detail, UNKNOWN-first principle already agreed. |
+| **CQ-12** | Probe gap priority ordering — coverage doc already proposes an order. |
+| **CQ-14** | Darnit vs. Minder delineation — ecosystem positioning Stephen can articulate. |
+| **CQ-15** | Existing issues as Phase 1 work items — backlog triage. |
+| **CQ-16** | Allstar's role — Scorecard sub-project under same Steering Committee. |
+
+#### Effectively resolved
+
+| Question | Resolution |
+|----------|-----------|
+| **OQ-3** | Drop `scan_scope` from the schema (Spencer's feedback). |
+| **OQ-4** | Evidence is probe-based only, not check-based (adopted). |
+| **CQ-10** | Partially superseded by CQ-17 (same topic with Eddie's context). |
+
+#### Recommended next steps
+
+1. **Schedule a discussion with Eddie Knight** to resolve CQ-19. Bring Spencer. This is the fork in the road — everything downstream depends on it.
+2. **Resolve OQ-1** with Spencer and the Steering Committee. Spencer flagged it as blocking, and it affects the spec regardless of CQ-19's outcome.
+3. **Answer the Tier 4 questions** at any time — they are independent and don't block others.
+4. **Once CQ-19 and OQ-1 are decided**, the Tier 2 and 3 questions can be resolved quickly since they flow from the architectural direction.

From 5f68422424c030d90187106211ca7d110ba5fa8f Mon Sep 17 00:00:00 2001
From: Stephen Augustus <foo@auggie.dev>
Date: Mon, 2 Mar 2026 04:53:05 -0500
Subject: [PATCH 17/28] :seedling: Integrate PR feedback; elevate AMPEL; remove
 spec.md
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Integrate PR review feedback from Adolfo García Veytia (AMPEL
maintainer) and Mike Lieberman:
- Add AP-1 through AP-4 (Adolfo) and ML-1 (Mike) feedback sections
- Create CQ-22 (attestation decomposition) and CQ-23 (mapping
  registry) from Adolfo's feedback
- Elevate AMPEL alongside Minder throughout proposal and roadmap
- Remove premature spec.md that was causing reviewer confusion
- Update ecosystem tooling comparison table

Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Stephen Augustus <foo@auggie.dev>
---
 docs/ROADMAP.md                               |  14 +-
 .../osps-baseline-conformance/proposal.md     | 177 ++++++++++++++----
 .../specs/osps-conformance/spec.md            | 129 -------------
 3 files changed, 151 insertions(+), 169 deletions(-)
 delete mode 100644 openspec/changes/osps-baseline-conformance/specs/osps-conformance/spec.md

diff --git a/docs/ROADMAP.md b/docs/ROADMAP.md
index 0b7fe5b4d21..11a2e87d5c6 100644
--- a/docs/ROADMAP.md
+++ b/docs/ROADMAP.md
@@ -85,11 +85,14 @@ Scorecard does not duplicate:
 - **[Minder](https://github.com/mindersec/minder)** — Policy enforcement and remediation platform (OpenSSF Sandbox, ORBIT WG)
 - **[Privateer plugin for GitHub repositories](https://github.com/ossf/pvtr-github-repo-scanner)** — Baseline evaluation powered by Gemara and Security Insights
 - **[Darnit](https://github.com/kusari-oss/darnit)** — Compliance audit and remediation
-- **[AMPEL](https://github.com/carabiner-dev/ampel)** — Attestation-based policy enforcement
+- **[AMPEL](https://github.com/carabiner-dev/ampel)** — Attestation-based policy enforcement; already consumes Scorecard probe results via [policy library](https://github.com/carabiner-dev/policies/tree/main/scorecard)
 
 Scorecard's role is to produce deep, probe-based conformance evidence that
-these tools and downstream consumers can use. Minder already consumes
-Scorecard findings to enforce security policies across repositories.
+these tools and downstream consumers can use. Both Minder and AMPEL already
+consume Scorecard findings today — Minder to enforce security policies
+across repositories, and AMPEL to validate Scorecard attestations against
+[OSPS Baseline policies](https://github.com/carabiner-dev/policies/tree/main/groups/osps-baseline)
+in CI/CD pipelines.
 
 ### Design principles
 
@@ -118,6 +121,5 @@ The following design questions are under active discussion among maintainers:
 ### How to contribute
 
 See the [proposal](../openspec/changes/osps-baseline-conformance/proposal.md)
-and [spec](../openspec/changes/osps-baseline-conformance/specs/osps-conformance/spec.md)
-for detailed requirements. Discussion and feedback are welcome via GitHub
-issues and the Scorecard community meetings.
+for detailed requirements and open questions. Discussion and feedback are
+welcome via GitHub issues and the Scorecard community meetings.
diff --git a/openspec/changes/osps-baseline-conformance/proposal.md b/openspec/changes/osps-baseline-conformance/proposal.md
index 43145a1398e..089902f8739 100644
--- a/openspec/changes/osps-baseline-conformance/proposal.md
+++ b/openspec/changes/osps-baseline-conformance/proposal.md
@@ -35,7 +35,7 @@ Several tools operate in adjacent spaces. Understanding their capabilities clari
 | **Action** | Analyzes repositories (read-only) | Monitors orgs, opens issues, auto-fixes settings | Enforces policies, auto-remediates | Audits + fixes repositories | Verifies attestations against policies | Evaluates repos against OSPS controls |
 | **Data source** | Collects from APIs/code | Collects from GitHub API + runs Scorecard checks | Collects from APIs + consumes findings from other tools | Analyzes repo state | Consumes attestations only | Collects from GitHub API + Security Insights |
 | **Output** | Scores (0-10) + probe findings | GitHub issues + auto-remediated settings | Policy evaluation results + remediation PRs | PASS/FAIL + attestations + fixes | PASS/FAIL + results attestation | Gemara L4 assessment results |
-| **OSPS Baseline** | Partial (via probes) | Indirect (enforces subset via Scorecard checks) | Via Rego policy rules | Full (62 controls) | Via policy rules | 39 of 52 controls |
+| **OSPS Baseline** | Partial (via probes) | Indirect (enforces subset via Scorecard checks) | Via Rego policy rules | Full (62 controls) | 36 policies mapping to controls (5 consume Scorecard probes) | 39 of 52 controls |
 | **In-toto** | Produces attestations | N/A | Consumes attestations | Produces attestations | Consumes + verifies | N/A |
 | **OSCAL** | No | No | No | No | Native support | N/A |
 | **Sigstore** | No | No | Verifies signatures | Signs attestations | Verifies signatures | N/A |
@@ -48,21 +48,20 @@ Several tools operate in adjacent spaces. Understanding their capabilities clari
 ```mermaid
 flowchart LR
     Scorecard["Scorecard<br/>(Measure)"] -->|checks| Allstar["Allstar<br/>(Enforce on GitHub)"]
-    Scorecard -->|findings + attestations| Minder["Minder<br/>(Enforce + Remediate)"]
-    Scorecard -->|findings + attestations| Darnit["Darnit<br/>(Audit + Remediate)"]
-    Scorecard -->|findings + attestations| AMPEL["AMPEL<br/>(Enforce)"]
+    Scorecard -->|findings| Minder["Minder<br/>(Enforce + Remediate)"]
+    Scorecard -->|attestations| AMPEL["AMPEL<br/>(Attestation-based<br/>policy enforcement)"]
+    Scorecard -->|findings| Darnit["Darnit<br/>(Audit + Remediate)"]
     Darnit -->|compliance attestation| AMPEL
     Scorecard -->|conformance evidence| Privateer["Privateer Plugin<br/>(Baseline evaluation)"]
 ```
 
-Scorecard is the **data source** (measures repository security). [Allstar](https://github.com/ossf/allstar) is a Scorecard sub-project that continuously monitors GitHub organizations and enforces Scorecard check results as policies (opening issues or auto-remediating settings). Minder consumes Scorecard findings to enforce policies and auto-remediate across repositories. Darnit audits compliance and remediates. AMPEL enforces policies on attestations. The Privateer plugin evaluates Baseline conformance. They are complementary, not competing.
+Scorecard is the **data source** (measures repository security). [Allstar](https://github.com/ossf/allstar) is a Scorecard sub-project that continuously monitors GitHub organizations and enforces Scorecard check results as policies (opening issues or auto-remediating settings). [Minder](https://github.com/mindersec/minder) consumes Scorecard findings to enforce policies and auto-remediate across repositories. [AMPEL](https://github.com/carabiner-dev/ampel) validates Scorecard attestations against policies and gates CI/CD pipelines — it already maintains [production policies consuming Scorecard probe results](https://github.com/carabiner-dev/policies/tree/main/scorecard) and [OSPS Baseline policy mappings](https://github.com/carabiner-dev/policies/tree/main/groups/osps-baseline). Darnit audits compliance and remediates. The Privateer plugin evaluates Baseline conformance. They are complementary, not competing.
 
 ### What Scorecard must not do
 
 - **Duplicate the Privateer plugin's role.** The [Privateer plugin for GitHub repositories](https://github.com/ossf/pvtr-github-repo-scanner) is the Baseline evaluator in the ORBIT ecosystem. Scorecard should complement it with deeper analysis and interoperable output, not fork the evaluation model.
-- **Duplicate policy enforcement or remediation.** [Minder](https://github.com/mindersec/minder) (OpenSSF Sandbox project, ORBIT WG) consumes Scorecard findings and enforces security policies across repositories with auto-remediation. Scorecard produces findings for Minder to act on.
+- **Duplicate policy enforcement or remediation.** [Minder](https://github.com/mindersec/minder) (OpenSSF Sandbox project, ORBIT WG) consumes Scorecard findings and enforces security policies across repositories with auto-remediation. [AMPEL](https://github.com/carabiner-dev/ampel) (production v1.0.0) validates Scorecard attestations against policies and gates CI/CD pipelines — it already maintains [Scorecard-consuming policies](https://github.com/carabiner-dev/policies/tree/main/scorecard) and [OSPS Baseline mappings](https://github.com/carabiner-dev/policies/tree/main/groups/osps-baseline). Scorecard *produces* findings and attestations for Minder and AMPEL to consume.
 - **Duplicate compliance auditing.** Darnit handles compliance auditing and automated remediation (PR creation, file generation, AI-assisted fixes). Scorecard is read-only.
-- **Duplicate attestation policy enforcement.** AMPEL verifies attestations against policies and gates CI/CD pipelines. Scorecard *produces* attestations for AMPEL to consume.
 - **Turn OSPS controls into Scorecard checks.** OSPS conformance is a layer that consumes existing Scorecard signals, not 59 new checks.
 
 ## Current state
@@ -156,7 +155,7 @@ Spencer's position is that OSPS evidence references should point to probe findin
 
 ### Out of scope
 
-- Policy enforcement and remediation (Minder's and Darnit's domain)
+- Policy enforcement and remediation (Minder's, AMPEL's, and Darnit's domain)
 - Replacing the Privateer plugin for GitHub repositories
 - Changing existing check scores or behavior
 - OSPS Baseline specification changes (ORBIT WG's domain)
@@ -220,13 +219,18 @@ flowchart TD
             end
         end
 
-        Minder["Minder<br/>(enforce + remediate)"]
+        subgraph Enforcement["Policy Enforcement"]
+            Minder["Minder<br/>(enforce + remediate)"]
+            AMPEL["AMPEL<br/>(attestation-based<br/>policy enforcement)"]
+        end
+
         Darnit["Darnit<br/>(audit + remediate)"]
     end
 
     Baseline -->|defines controls| Privateer
     Baseline -->|defines controls| Scorecard
     Baseline -->|defines controls| Minder
+    Baseline -->|defines controls| AMPEL
     Gemara -->|provides schemas| Privateer
     Gemara -->|provides schemas| Scorecard
     SI -->|provides metadata| Privateer
@@ -235,19 +239,21 @@ flowchart TD
     Scorecard -->|checks| Allstar
     Scorecard -->|conformance evidence| Privateer
     Scorecard -->|findings| Minder
+    Scorecard -->|attestations| AMPEL
     Scorecard -->|findings| Darnit
+    Darnit -->|compliance attestation| AMPEL
 ```
 
-**Scorecard's role**: Produce deep, probe-based conformance evidence that the Privateer plugin, Minder, and downstream consumers can use. Scorecard reads Security Insights (shared data plane), outputs Gemara L4-compatible results (shared schema), and fills analysis gaps where the Privateer plugin has `NotImplemented` steps.
+**Scorecard's role**: Produce deep, probe-based conformance evidence that the Privateer plugin, Minder, AMPEL, and downstream consumers can use. Scorecard reads Security Insights (shared data plane), outputs interoperable results (shared schema), and fills analysis gaps where the Privateer plugin has `NotImplemented` steps.
 
-**What Scorecard does NOT do**: Replace the Privateer plugin, enforce policies or remediate (Minder's role), or perform compliance auditing and remediation (Darnit's role).
+**What Scorecard does NOT do**: Replace the Privateer plugin, enforce policies or remediate (Minder's and AMPEL's role), or perform compliance auditing and remediation (Darnit's role).
 
 ## Success criteria
 
 1. `scorecard --format=osps --osps-level=1` produces a valid conformance report for any public GitHub repository
 2. OSPS Baseline Level 1 conformance is achieved (Phase 1 outcome)
 3. OSPS output is available across CLI, Action, and API surfaces
-4. OSPS output is consumable by the Privateer plugin as supplementary evidence (validated with ORBIT WG)
+4. OSPS output is consumable by the Privateer plugin, AMPEL, and Minder as supplementary evidence (validated with ORBIT WG)
 5. All four open questions (OQ-1 through OQ-4) are resolved with documented decisions
 6. No changes to existing check scores or behavior
 
@@ -444,16 +450,20 @@ Do you agree with this priority ordering? Are there any controls you would move
 **Stephen's response:**
 
 
-#### CQ-13: Minder integration surface
+#### CQ-13: Minder and AMPEL integration surfaces
+
+**Stakeholders:** Stephen, Minder maintainers, Adolfo García Veytia (AMPEL), Steering Committee
 
-**Stakeholders:** Stephen, Minder maintainers, Steering Committee
+Two tools already consume Scorecard data for policy enforcement:
 
-[Minder](https://github.com/mindersec/minder) is an OpenSSF Sandbox project within the ORBIT WG that already consumes Scorecard findings to enforce security policies and auto-remediate across repositories. Minder uses Rego-based rules and can enforce OSPS Baseline controls via policy profiles. A draft Scorecard PR (#4723, now stale) attempted deeper integration by enabling Scorecard checks to be written using Minder's Rego rule engine.
+**[Minder](https://github.com/mindersec/minder)** (OpenSSF Sandbox, ORBIT WG) consumes Scorecard findings to enforce security policies and auto-remediate across repositories. Uses Rego-based rules and can enforce OSPS Baseline controls via policy profiles. A draft Scorecard PR (#4723, now stale) attempted deeper integration.
 
-Given Minder's position in the ORBIT ecosystem:
-- Should the OSPS conformance output be designed with Minder as an explicit consumer (e.g., ensuring the output schema works well as Minder policy input)?
-- Should we coordinate with Minder maintainers during Phase 1 to validate the integration surface?
-- Is there a risk of duplicating Baseline evaluation work that Minder already does via its own rules, and if so, how should we delineate?
+**[AMPEL](https://github.com/carabiner-dev/ampel)** (production v1.0.0) validates Scorecard attestations against policies in CI/CD pipelines. Already maintains [5 Scorecard-consuming policies](https://github.com/carabiner-dev/policies/tree/main/scorecard) and [36 OSPS Baseline policy mappings](https://github.com/carabiner-dev/policies/tree/main/groups/osps-baseline). Uses CEL expressions and in-toto attestations.
+
+Questions:
+- Should the OSPS conformance output be designed with Minder and AMPEL as explicit consumers (e.g., ensuring the output works as Minder policy input and as AMPEL attestation input)?
+- Should we coordinate with both Minder maintainers and Adolfo during Phase 1 to validate the integration surface?
+- Is there a risk of duplicating Baseline evaluation work that Minder or AMPEL already do via their own rules, and if so, how should we delineate?
 
 **Stephen's response:**
 
@@ -542,9 +552,72 @@ Eddie proposes a fundamentally different architecture:
 
 **This is the central architectural decision for the proposal.** The Steering Committee needs to evaluate this against the current plan (Scorecard builds its own conformance engine).
 
+### Adolfo García Veytia's feedback (AMPEL maintainer)
+
+The following feedback was provided by Adolfo García Veytia (@puerco, maintainer of [AMPEL](https://github.com/carabiner-dev/ampel)) on [PR #4952](https://github.com/ossf/scorecard/pull/4952).
+
+#### AP-1: Mapping file registry — single source preferred
+
+> "It's great that you also see the need for machine-readable data. This would help projects like AMPEL write policies that enforce the baseline controls based on the results from Scorecard and other analysis tools."
+>
+> "Initially, we were trying to build the mappings into baseline itself. I still think it's the way to go as it would be better to have a single registry and data format of those mappings (in this case baseline's). Unfortunately, the way baseline considers its mappings [was demoted](https://github.com/ossf/security-baseline/pull/476) so we don't have that registry anymore."
+
+Adolfo strongly supports machine-readable mapping data and prefers a single registry in the Baseline itself, though the Baseline's own mapping support was recently demoted (PR #476 in security-baseline). This aligns with Eddie's offer (EK-1) to host mappings in the Baseline repo, but adds the context that there is no longer an official registry for tool-to-control mappings.
+
+AMPEL already maintains its own [Scorecard-to-Baseline policy mappings](https://github.com/carabiner-dev/policies/tree/main/groups/osps-baseline) (36 OSPS control policies, 5 of which directly consume Scorecard probe results). An official upstream mapping from Scorecard would benefit the entire ecosystem.
+
+#### AP-2: Output format — use in-toto predicates, not a custom format
+
+> "As others have mentioned, there is no _OSPS output format_ but there are two formal/in process of formalizing in-toto predicate types that are useful for this:
+>
+> **[Simple Verification Results](https://github.com/in-toto/attestation/blob/main/spec/predicates/svr.md)** — a simple predicate that communicates just the verified control labels along with the tool that performed the evaluation. It is a generalization of the VSA for non-SLSA controls.
+>
+> **[The "Baseline" Predicate](https://github.com/in-toto/attestation/pull/502)** — Still not merged, this predicate type was proposed by some of the baseline maintainers to capture an interoperability format more in line with the requirements in this spec, including manual assessments (what is named in this PR as 'ATTESTED')."
+
+Adolfo identifies two concrete in-toto predicate types that Scorecard should consider for output instead of inventing a custom format:
+
+1. **Simple Verification Results (SVR)**: Already merged in the in-toto attestation spec. Communicates verified control labels and the evaluating tool. Generalizes SLSA VSA to non-SLSA controls.
+2. **Baseline Predicate**: Proposed by Baseline maintainers (PR #502, not yet merged). Designed for interoperability and includes support for manual assessments (ATTESTED status).
+
+This is the most concrete guidance on output format so far and directly informs CQ-18.
+
+#### AP-3: Attestation question conflates identity and tooling
+
+> "The question here is conflating two domains. One question is _who_ signs the attestation, and how can those identities be trusted (identity). The other is _what_ (tool) generates the attestations, and more importantly, from scorecard's perspective, when. This hints at a policy implementation and the answers will most likely differ for projects and controls. Happy to chat about this one day."
+
+Adolfo clarifies that OQ-1 (attestation mechanism identity) is actually two separate questions:
+1. **Identity**: Who signs the attestation, and how are those identities trusted? (Sigstore/OIDC, personal keys, etc.)
+2. **Tooling**: What tool generates the attestation, and when? (Scorecard during scan, CI pipeline, manual process)
+
+The answers will differ per project and per control. This decomposition should inform how OQ-1 is resolved.
+
+#### AP-4: AMPEL already consumes Scorecard data for Baseline enforcement
+
+> "I agree with this role statement. Just as minder, ampel also can enforce Scorecard's data ([see an example here](https://github.com/carabiner-dev/policies/blob/ab1eb42ef179c7a0016d6b7ed72991774a48f151/scorecard/sast.json#L4)) and we also [maintain a mapping of some of scorecard's probes vs baseline controls](https://github.com/carabiner-dev/policies/blob/ab1eb42ef179c7a0016d6b7ed72991774a48f151/groups/osps-baseline/osps-vm-06.hjson#L5) that would greatly benefit from an official/upstream map.
+>
+> The probes can enrich the baseline ecosystem substantially and having the data accessible from other tools encourages other projects in the ecosystem to help maintain and improve them."
+
+AMPEL is an active consumer of Scorecard data today:
+- 5 production policies directly evaluate Scorecard probe results (SAST, binary artifacts, code review, dangerous workflows, token permissions)
+- 36 OSPS Baseline policy mappings, several of which reference Scorecard checks
+- An official upstream Scorecard-to-Baseline mapping would directly benefit AMPEL's policy library
+
+This validates the proposal's direction of making Scorecard's probe results and control mappings available to the broader ecosystem.
+
+### Mike Lieberman's feedback
+
+The following feedback was provided by Mike Lieberman (@mlieberman85) on [PR #4952](https://github.com/ossf/scorecard/pull/4952).
+
+#### ML-1: No "OSPS output format" exists
+
+> "What is OSPS output format?"
+> — on ROADMAP.md, Phase 1 deliverable
+
+Mike echoes Eddie's (EK-2) and Adolfo's (AP-2) point: there is no defined "OSPS output format." This is the third reviewer to flag this, confirming it needs to be reframed. The output format question (CQ-18) now has concrete alternatives: Gemara SDK formats (Eddie), in-toto SVR/Baseline predicates (Adolfo), or extending existing Scorecard formats.
+
 ---
 
-The following clarifying questions require Steering Committee decisions informed by Eddie's feedback.
+The following clarifying questions require Steering Committee decisions informed by reviewer feedback.
 
 #### CQ-17: Mapping file location — Scorecard repo or shared?
 
@@ -564,18 +637,20 @@ _Note that this question is negated if consolidating check logic within `pvtr-gi
 **Stephen's response:**
 
 
-#### CQ-18: Output format — `--format=osps` vs. Gemara SDK output
+#### CQ-18: Output format — `--format=osps` vs. ecosystem formats
 
-**Stakeholders:** Stephen, Spencer (OQ-4 constrains this), Eddie Knight
+**Stakeholders:** Stephen, Spencer (OQ-4 constrains this), Eddie Knight, Adolfo García Veytia
 
-Eddie clarifies that no "OSPS output format" exists in the ORBIT ecosystem (EK-2). The Gemara SDK supports multiple formats (JSON, YAML, SARIF). He suggests Scorecard keep its existing output logic and optionally add Gemara JSON/YAML as another format.
+Three reviewers (Eddie, Adolfo, Mike) independently flagged that no "OSPS output format" exists. Eddie suggests Gemara SDK formats (EK-2). Adolfo identifies two concrete in-toto predicate types (AP-2): the [Simple Verification Results (SVR)](https://github.com/in-toto/attestation/blob/main/spec/predicates/svr.md) predicate (merged) and the [Baseline Predicate](https://github.com/in-toto/attestation/pull/502) (proposed, not yet merged).
 
-This affects the spec's requirement for `--format=osps`. Options:
-1. **Keep `--format=osps`**: Define a Scorecard-specific conformance output format. Risk: inventing a format that doesn't align with the ecosystem.
-2. **Use `--format=gemara`** (or similar): Integrate the Gemara SDK and output Gemara L4 assessment results in JSON/YAML. Aligns with ecosystem, but creates a Gemara SDK dependency.
-3. **Extend existing formats**: Add conformance data to `--format=json` and `--format=sarif` outputs. No new format flag needed.
+Options:
+1. **Keep `--format=osps`**: Define a Scorecard-specific conformance output format. Risk: inventing a format that three reviewers have said doesn't belong.
+2. **Use `--format=gemara`** (or similar): Integrate the Gemara SDK and output Gemara assessment results in JSON/YAML. Aligns with ORBIT ecosystem, creates a Gemara SDK dependency.
+3. **Use in-toto predicates**: Output conformance results as in-toto attestations using SVR or the Baseline predicate. Aligns with in-toto ecosystem and Adolfo's guidance. The Baseline predicate is not yet merged.
+4. **Extend existing formats**: Add conformance data to `--format=json` and `--format=sarif` outputs. No new format flag needed.
+5. **Combination**: Use Gemara SDK for structured output + in-toto predicates for attestation output. These are not mutually exclusive.
 
-Which approach do you prefer? Does the Gemara SDK dependency concern you?
+Which approach do you prefer?
 
 **Stephen's response:**
 
@@ -639,6 +714,39 @@ Is some duplication acceptable if it means Scorecard retains architectural indep
 **Stephen's response:**
 
 
+#### CQ-22: Attestation decomposition — identity vs. tooling
+
+**Stakeholders:** Stephen, Spencer, Adolfo García Veytia, Eddie Knight
+
+Adolfo points out that OQ-1 (attestation mechanism identity) conflates two questions (AP-3):
+
+1. **Identity**: Who signs the attestation, and how are those identities trusted? (Sigstore/OIDC, personal keys, platform-native)
+2. **Tooling**: What tool generates the attestation, and when? (Scorecard during scan, CI pipeline post-scan, manual maintainer process)
+
+The answers will differ per project and per control. Should OQ-1 be decomposed into these two sub-questions, and should the design allow different identity/tooling combinations per control?
+
+Adolfo has offered to discuss this in depth.
+
+**Stephen's response:**
+
+
+#### CQ-23: Mapping registry — where should the canonical mapping live?
+
+**Stakeholders:** Stephen, Eddie Knight, Adolfo García Veytia, Baseline maintainers
+
+Three perspectives have emerged on where Scorecard-to-Baseline mappings should live:
+
+- **Eddie (EK-1)**: Host in the Baseline repo with Scorecard maintainers as CODEOWNERS
+- **Adolfo (AP-1)**: Prefers a single registry in the Baseline itself, but notes the Baseline's mapping support was [demoted](https://github.com/ossf/security-baseline/pull/476)
+- **Current proposal**: Host in Scorecard repo (`pkg/osps/mappings/`)
+
+Additionally, AMPEL already maintains [independent Scorecard-to-Baseline mappings](https://github.com/carabiner-dev/policies/tree/main/groups/osps-baseline) in its policy library. An official upstream mapping would benefit both AMPEL and the wider ecosystem.
+
+This extends CQ-17 with Adolfo's context about the demoted Baseline registry. Should the Scorecard mapping effort also advocate for restoring a shared registry in the Baseline spec?
+
+**Stephen's response:**
+
+
 #### CQ-16: Allstar's role in OSPS conformance enforcement
 
 **Stakeholders:** Stephen (can answer alone)
@@ -666,8 +774,9 @@ wrong order will result in rework. The recommended sequence follows.
 
 | Question | Dependency | Who decides |
 |----------|-----------|-------------|
-| **CQ-18** | Output format depends on CQ-19's architectural direction. | Stephen + Spencer + Eddie Knight |
-| **CQ-17** | Mapping file location depends on CQ-19 (negated if Option B). | Stephen + Eddie Knight + Baseline maintainers |
+| **CQ-18** | Output format depends on CQ-19's architectural direction. Three reviewers flagged "no OSPS output format." In-toto predicates (AP-2) and Gemara SDK are concrete alternatives. | Stephen + Spencer + Eddie Knight + Adolfo |
+| **CQ-17/CQ-23** | Mapping file location depends on CQ-19 (negated if Option B). Adolfo adds context: Baseline registry was demoted (AP-1). | Stephen + Eddie Knight + Adolfo + Baseline maintainers |
+| **CQ-22** | Attestation decomposition (identity vs. tooling). Refines OQ-1. | Stephen + Spencer + Adolfo + Eddie Knight |
 | **OQ-2** | Enforcement detection scope. Affects Phase 3 scope. | Spencer + Stephen + Steering Committee |
 
 #### Tier 3 — Important but non-blocking for Phase 1 start
@@ -676,7 +785,7 @@ wrong order will result in rework. The recommended sequence follows.
 |----------|-------|-------------|
 | **CQ-20** | Catalog extraction scope. Flows from CQ-19. | Stephen + Eddie Knight |
 | **CQ-21** | Code duplication tolerance. Flows from CQ-19. | Stephen + Spencer + Eddie Knight |
-| **CQ-13** | Minder integration surface. Affects ecosystem positioning. | Stephen + Minder maintainers |
+| **CQ-13** | Minder/AMPEL integration surface. Affects ecosystem positioning. | Stephen + Minder maintainers + Adolfo |
 | **CQ-11** | Output stability guarantees. Depends on CQ-18. | Stephen + Spencer |
 
 #### Tier 4 — Stephen can answer alone (any time)
@@ -699,7 +808,7 @@ wrong order will result in rework. The recommended sequence follows.
 
 #### Recommended next steps
 
-1. **Schedule a discussion with Eddie Knight** to resolve CQ-19. Bring Spencer. This is the fork in the road — everything downstream depends on it.
-2. **Resolve OQ-1** with Spencer and the Steering Committee. Spencer flagged it as blocking, and it affects the spec regardless of CQ-19's outcome.
+1. **Schedule a discussion with Eddie Knight and Adolfo García Veytia** to resolve CQ-19. Bring Spencer. This is the fork in the road — everything downstream depends on it.
+2. **Resolve OQ-1/CQ-22** with Spencer, Adolfo, and the Steering Committee. Spencer flagged OQ-1 as blocking. Adolfo's decomposition (identity vs. tooling) clarifies what needs to be decided.
 3. **Answer the Tier 4 questions** at any time — they are independent and don't block others.
-4. **Once CQ-19 and OQ-1 are decided**, the Tier 2 and 3 questions can be resolved quickly since they flow from the architectural direction.
+4. **Once CQ-19 and OQ-1/CQ-22 are decided**, the Tier 2 and 3 questions can be resolved quickly since they flow from the architectural direction.
diff --git a/openspec/changes/osps-baseline-conformance/specs/osps-conformance/spec.md b/openspec/changes/osps-baseline-conformance/specs/osps-conformance/spec.md
deleted file mode 100644
index 781b44114f6..00000000000
--- a/openspec/changes/osps-baseline-conformance/specs/osps-conformance/spec.md
+++ /dev/null
@@ -1,129 +0,0 @@
-# OSPS Baseline Conformance
-
-## Purpose
-
-Enable Scorecard to evaluate repositories against the OSPS Baseline specification, producing per-control conformance results (PASS/FAIL/UNKNOWN/NOT_APPLICABLE/ATTESTED) with probe-based evidence, interoperable with the ORBIT WG ecosystem.
-
-## Requirements
-
-### Conformance Engine
-
-#### Requirement: Conformance evaluation
-Scorecard SHALL include a conformance engine that evaluates OSPS Baseline controls against Scorecard probe findings and outputs a per-control status.
-
-#### Requirement: Status values
-Each control evaluation SHALL produce one of: `PASS`, `FAIL`, `UNKNOWN`, `NOT_APPLICABLE`, or `ATTESTED`.
-
-#### Requirement: UNKNOWN-first honesty
-When Scorecard cannot observe a control due to insufficient permissions, missing platform support, or lack of data, the status SHALL be `UNKNOWN` with an explanation. The engine SHALL NOT produce `PASS` or `FAIL` for unobservable controls.
-
-#### Requirement: Applicability detection
-The conformance engine SHALL detect applicability preconditions (e.g., "project has made a release") and produce `NOT_APPLICABLE` when preconditions are not met.
-
-#### Requirement: Backward compatibility
-The conformance engine SHALL be additive. Existing checks, probes, scores, and output formats SHALL NOT change behavior.
-
-### Mapping
-
-#### Requirement: Versioned mapping file
-The mapping between OSPS Baseline controls and Scorecard probes SHALL be maintained as a data-driven, versioned YAML file (e.g., `pkg/osps/mappings/v2026-02-19.yaml`), not hard-coded.
-
-#### Requirement: Mapping contents
-Each mapping entry SHALL specify:
-- the OSPS control ID
-- the maturity level (1, 2, or 3)
-- the Scorecard probes that provide evidence
-- applicability conditions (if any)
-- the evaluation logic (how probe outcomes map to control status)
-
-#### Requirement: Unmapped controls
-Controls without mapped probes SHALL appear in output with status `UNKNOWN` and a note indicating no automated evaluation is available.
-
-### Output
-
-#### Requirement: OSPS output format
-Scorecard SHALL support `--format=osps` producing a JSON conformance report containing:
-- OSPS Baseline version
-- target maturity level
-- per-control status, evidence, limitations, and remediation
-- summary counts
-- tool metadata (Scorecard version, timestamp)
-
-#### Requirement: Probe-based evidence
-Evidence references in OSPS output SHALL reference probes and their findings. Check-level results SHALL NOT be used as evidence. Checks MAY be listed in a `derived_from` field for human context.
-
-> **Open Question (OQ-4)**: Spencer's position — evidence should be probe-based only, not check-based. This spec adopts that position. Need to confirm this is the consensus view.
-
-#### Requirement: Gemara Layer 4 compatibility
-The OSPS output schema SHALL be structurally compatible with Gemara Layer 4 assessment results, enabling consumption by ORBIT ecosystem tools without transformation.
-
-#### Requirement: CI gating
-Scorecard SHALL support a `--fail-on=fail` flag (or equivalent) when using OSPS output. `UNKNOWN` statuses SHALL NOT cause failure by default; this SHALL be configurable.
-
-### Metadata and Attestation
-
-#### Requirement: Security Insights ingestion
-Scorecard SHALL read Security Insights files (`security-insights.yml` or `.github/security-insights.yml`) to satisfy controls that depend on declared project metadata.
-
-#### Requirement: Attestation for non-automatable controls
-The conformance engine SHALL accept attestation evidence from a repo-local metadata file for controls that cannot be automated, producing status `ATTESTED` with evidence links.
-
-> **Open Question (OQ-1)**: The identity and trust model for attestations is unresolved. What does the attestation? OIDC? A personal token? A workflow (which won't have the right tokens)? See proposal.md for options. Spencer flagged this as a blocking design question.
-
-### Ecosystem Interoperability
-
-#### Requirement: Complementarity with the Privateer plugin
-Scorecard SHALL NOT duplicate the [Privateer plugin for GitHub repositories](https://github.com/ossf/pvtr-github-repo-scanner). Scorecard provides deep probe-based analysis; the Privateer plugin can consume Scorecard's OSPS output as supplementary evidence.
-
-#### Requirement: No enforcement
-Scorecard evaluates and reports conformance. It SHALL NOT enforce policies. The `--fail-on=fail` exit code is a reporting mechanism; the CI system is the enforcer.
-
-> **Open Question (OQ-2)**: Spencer asked whether "enforcement detection" (Phase 3: detecting whether SCA/SAST gating exists) conflicts with Scorecard's stated non-enforcement role. Proposed distinction: Scorecard *detects* enforcement mechanisms, it does not *perform* enforcement. Needs maintainer consensus.
-
-## Scenarios
-
-### Scenario: Full Level 1 conformance report
-- GIVEN a public GitHub repository with a Security Insights file
-- WHEN `scorecard --repo=github.com/org/repo --format=osps --osps-level=1` is run
-- THEN the output contains all Level 1 controls with status PASS, FAIL, UNKNOWN, or NOT_APPLICABLE
-- AND each result includes probe-based evidence references
-
-### Scenario: Permission-limited scan produces UNKNOWN
-- GIVEN a scan token without admin access
-- WHEN evaluating OSPS-AC-01.01 (MFA enforcement)
-- THEN the status is `UNKNOWN`
-- AND the limitations field explains "requires org admin visibility"
-
-### Scenario: Release applicability triggers NOT_APPLICABLE
-- GIVEN a repository that has never made a release
-- WHEN evaluating OSPS-DO-01.01 (user guides for released software)
-- THEN the status is `NOT_APPLICABLE`
-- AND the applicability facts record `has_release=false`
-
-### Scenario: Attestation for non-automatable control
-- GIVEN a repo-local metadata file attesting a security assessment was performed
-- WHEN evaluating OSPS-SA-03.01
-- THEN the status is `ATTESTED`
-- AND the evidence includes the attestation source and link
-
-### Scenario: Missing Security Insights file
-- GIVEN a repository without a Security Insights file
-- WHEN evaluating controls dependent on Security Insights data
-- THEN those controls evaluate to `UNKNOWN` with limitation "requires security-insights.yml"
-- AND controls not dependent on Security Insights evaluate normally
-
-### Scenario: Unmapped control
-- GIVEN an OSPS control with no corresponding Scorecard probes in the mapping file
-- WHEN the conformance engine evaluates it
-- THEN the status is `UNKNOWN` with note "no automated evaluation available"
-
-### Scenario: CI gating on conformance
-- GIVEN `scorecard --format=osps --osps-level=1 --fail-on=fail`
-- WHEN any Level 1 control evaluates to `FAIL`
-- THEN the process exits with non-zero exit code
-- AND `UNKNOWN` controls do not cause failure by default
-
-### Scenario: Existing Scorecard behavior unchanged
-- GIVEN any repository
-- WHEN `scorecard --repo=github.com/org/repo --format=json` is run (without `--format=osps`)
-- THEN the output is identical to Scorecard without OSPS conformance changes

From 4e09cbc8fbed7b728e73d91cd69b0c42a7b8fd7b Mon Sep 17 00:00:00 2001
From: Stephen Augustus <foo@auggie.dev>
Date: Mon, 2 Mar 2026 04:53:12 -0500
Subject: [PATCH 18/28] :seedling: Split proposal into proposal.md +
 decisions.md

Split the proposal into two documents to improve readability:
- proposal.md: core proposal (motivation, scope, phased delivery,
  ecosystem positioning)
- decisions.md: reviewer feedback, open questions, maintainer
  responses, and decision priority analysis

Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Stephen Augustus <foo@auggie.dev>
---
 .../osps-baseline-conformance/decisions.md    | 625 ++++++++++++++++++
 .../osps-baseline-conformance/proposal.md     | 610 +----------------
 2 files changed, 641 insertions(+), 594 deletions(-)
 create mode 100644 openspec/changes/osps-baseline-conformance/decisions.md

diff --git a/openspec/changes/osps-baseline-conformance/decisions.md b/openspec/changes/osps-baseline-conformance/decisions.md
new file mode 100644
index 00000000000..18d54c43872
--- /dev/null
+++ b/openspec/changes/osps-baseline-conformance/decisions.md
@@ -0,0 +1,625 @@
+# OSPS Baseline Conformance — Feedback and Decisions
+
+Companion document to [`proposal.md`](proposal.md). This document tracks
+reviewer feedback, open questions, maintainer responses, and the decision
+priority analysis.
+
+For the proposal itself (motivation, scope, phased delivery, ecosystem
+positioning), see [`proposal.md`](proposal.md).
+
+For the control-by-control coverage analysis, see
+[`docs/osps-baseline-coverage.md`](../../docs/osps-baseline-coverage.md).
+
+---
+
+## Open questions from maintainer review
+
+The following questions were raised by Spencer (Steering Committee member)
+during review of the roadmap and need to be resolved before or during
+implementation.
+
+### OQ-1: Attestation mechanism identity
+
+> "The attestation/provenance layer. What is doing the attestation? Is this some OIDC? A personal token? A workflow (won't have the right tokens)?"
+> — Spencer, on Section 5.1
+
+**Stakeholders:** Spencer (raised this, flagged as blocking), Stephen, Steering Committee
+
+This is a fundamental design question. Options include:
+- **Repo-local metadata files** (e.g., Security Insights, `.osps-attestations.yml`): simplest, no cryptographic identity, maintainer self-declares by committing the file.
+- **Signed attestations via Sigstore/OIDC**: strongest guarantees, but requires workflow identity and the right tokens — which Spencer correctly notes may not be available in all contexts.
+- **Platform-native signals**: e.g., GitHub's private vulnerability reporting enabled status, which the platform attests implicitly.
+
+**Recommendation to discuss**: Start with repo-local metadata files (unsigned) for the v1 attestation mechanism, with a defined extension point for signed attestations in a future iteration. This avoids blocking on the identity question while still making non-automatable controls reportable.
+
+### OQ-2: Scorecard's role in enforcement detection vs. enforcement
+
+> "I thought the other doc said Scorecard wasn't an enforcement tool?"
+> — Spencer, on Q4 deliverables (enforcement detection)
+
+**Stakeholders:** Spencer (raised this), Stephen, Steering Committee
+
+This is a critical framing question. The roadmap proposes *detecting* whether enforcement exists (e.g., "are SAST results required to pass before merge?"), not *performing* enforcement. But the line between "detecting enforcement" and "being an enforcement tool" needs to be drawn clearly.
+
+**Recommendation to discuss**: Scorecard detects and reports whether enforcement mechanisms are in place. It does not itself enforce. The `--fail-on=fail` CI gating is a reporting exit code, not an enforcement action — the CI system is the enforcer. This distinction should be documented explicitly.
+
+### OQ-3: `scan_scope` field in output schema
+
+> "Not sure I see the importance [of `scan_scope`]"
+> — Spencer, on Section 9 (output schema)
+
+**Stakeholders:** Stephen (can resolve alone)
+
+The `scan_scope` field (repo|org|repos) in the proposed OSPS output schema may not carry meaningful information. If the output always describes a single repository's conformance, the scope is implicit.
+
+**Recommendation to discuss**: Drop `scan_scope` from the schema unless multi-repo aggregation (OSPS-QA-04.02) produces a fundamentally different output shape. Revisit when project-level aggregation is implemented.
+
+### OQ-4: Evidence model — probes only, not checks
+
+> "[Evidence] should be probe-based only, not check"
+> — Spencer, on Section 9 (output schema)
+
+**Stakeholders:** Spencer (raised this), Stephen — effectively resolved (adopted)
+
+Spencer's position is that OSPS evidence references should point to probe findings, not check-level results. This aligns with the architectural direction of Scorecard v5 (probes as the measurement unit, checks as scoring aggregations).
+
+**Recommendation**: Adopt this. The `evidence` array in the OSPS output schema should reference probes and their findings only. Checks may be listed in a `derived_from` field for human context but are not evidence.
+
+---
+
+## Maintainer review
+
+### Stephen's notes
+
+<!-- Stephen: Use this section to record your overall impressions, concerns,
+     and positions on the proposal. Edit freely — this is your space. -->
+
+**Overall assessment:**
+
+
+**Key concerns or risks:**
+
+
+**Things I agree with:**
+
+
+**Things I disagree with or want to change:**
+
+- "PVTR" is shorthand for "Privateer". Throughout this proposal it makes it appear as if https://github.com/ossf/pvtr-github-repo-scanner is separate from Privateer, when it is really THE Privateer plugin for GitHub repositories. Any references to PVTR should be corrected.
+- This proposal does not contain an even consideration of the capabilities of [Darnit](https://github.com/kusari-oss/darnit) and [AMPEL](https://github.com/carabiner-dev/ampel). We should do that comparison to get a better idea of what should be in or out of scope for Scorecard.
+- The timeline that is in this proposal is not accurate, as we're already about to enter Q2 2026. We should focus on phases and outcomes, and let maintainer bandwidth dictate delivery timing.
+- Scorecard has an existing set of checks and probes, which is essentially a control catalog. We should make a plan to extract the Scorecard control catalog so that it can be used by other tools that can handle evaluation tasks.
+- Use Mermaid when creating diagrams.
+- We need to understand what level of coverage Scorecard currently has for OSPS Baseline and that analysis should be created in a separate file (in `docs/`). Assume that any existing findings are out-of-date.
+- `docs/roadmap-ideas.md` will not be committed to the repo, as it is a rough draft which needs to be refined for public consumption. We should create `docs/ROADMAP.md` with a 2026 second-level heading with contains the publicly-consummable roadmap.
+
+**Priority ordering — what matters most to ship first:**
+
+
+### Clarifying questions
+
+The following questions need input before this proposal can move to design.
+Questions with Stephen's responses are answered; the rest are open.
+
+#### CQ-1: Scorecard as a conformance tool — product identity
+
+The proposal frames this as a "product-level shift" where Scorecard gains a second mode: conformance evaluation alongside its existing scoring. Does this framing match your vision, or do you see conformance as eventually *replacing* the scoring model? Should we be thinking about deprecating 0-10 scores long-term, or do both modes coexist indefinitely?
+
+**Stephen's response:**
+
+I believe the scoring model will continue to be useful to consumers and it should be maintained. For now, both modes should coexist. There is no need to make a decision about this for the current iteration of the proposal.
+
+#### CQ-2: OSPS Baseline version targeting
+
+The roadmap previously targeted OSPS Baseline v2025-10-10. The Privateer GitHub plugin targets v2025-02-25. The Baseline is a living spec with periodic releases. How should Scorecard handle version drift? Options:
+- Support only the latest version at any given time
+- Support multiple versions concurrently via the versioned mapping file
+- Pin to a version and update on a defined cadence (e.g., quarterly)
+
+**Stephen's response:**
+
+The current version of the OSPS Baseline is [v2026.02.19](https://baseline.openssf.org/versions/2026-02-19).
+
+We should align with the latest version at first and have a process for aligning with new versions on a defined cadence. We should understand the [OSPS Baseline maintenance process](https://baseline.openssf.org/maintenance.html) and align with it.
+
+The OSPS Baseline [FAQ](https://baseline.openssf.org/faq.html) and [Implementation Guidance for Maintainers](https://baseline.openssf.org/maintainers.html) may have guidance we should consider incorporating.
+
+#### CQ-3: Security Insights as a hard dependency
+
+Many OSPS controls depend on Security Insights data (official channels, distribution points, subproject inventory, core team). The Privateer GitHub plugin treats the Security Insights file as nearly required — most of its evaluation steps begin with `HasSecurityInsightsFile`.
+
+Should Scorecard:
+- Treat Security Insights the same way (controls that need it go UNKNOWN without it)?
+- Provide a degraded but still useful evaluation without it?
+- Accept alternative metadata sources (e.g., `.project`, custom config)?
+
+This also raises a broader adoption question: most projects today don't have a `security-insights.yml`. How do we avoid making the OSPS output useless for the majority of repositories?
+
+**Stephen's response:**
+
+We should provide a degraded, but still-useful evaluation without a Security Insights file, especially since our probes today can already cover a lot of ground without it. It would be good for us to eventually support alternative metadata sources, but this should not be an immediate goal.
+
+#### CQ-4: PVTR relationship — complement vs. converge
+
+The proposal positions Scorecard as complementary to the Privateer plugin. But there's a deeper question: should this stay as two separate tools indefinitely, or is the long-term goal convergence (e.g., the Privateer plugin consuming Scorecard as a library, or Scorecard becoming a Privateer plugin itself)? Your position on this affects how tightly we couple the output formats and whether we invest in Gemara SDK integration.
+
+**Stephen's response:**
+
+Multiple tools should be able to consume Scorecard, so yes, we should invest in Gemara SDK integration.
+
+#### CQ-5: Scope of new probes in 2026
+
+The roadmap calls for significant new probe development (secrets detection, governance/docs presence, dependency manifests, release asset inspection, enforcement detection). That's a lot of new surface area. Should we:
+- Build all of these within Scorecard?
+- Prioritize a subset and defer the rest?
+- Look for ways to consume signals from external tools (e.g., GitHub's secret scanning API, SBOM tools) rather than building detection from scratch?
+
+If prioritizing, which new probes matter most to you?
+
+**Stephen's response:**
+
+We should prioritize OSPS Baseline Level 1 conformance work.
+We should consider any signals that can be consumed from external sources.
+
+#### CQ-6: Community and governance process
+
+This is a major initiative touching Scorecard's product direction. What's the governance process for getting this approved?
+- Does this need a formal proposal to the Scorecard maintainer group?
+- Should this be presented at an ORBIT WG meeting?
+- Do we need sign-off from the OpenSSF TAC?
+- Who else beyond you and Spencer needs to weigh in?
+
+**Stephen's response:**
+
+We should have Stephen and Spencer sign off on this proposal as Steering Committee members. In addition, we should have reviews from:
+- [blocking] At least 1 non-Steering Scorecard maintainer
+- [non-blocking] Maintainers of tools in the WG ORBIT ecosystem
+
+This does not require review from the TAC, but we should inform WG ORBIT members.
+
+#### CQ-7: The "minimum viable conformance report"
+
+If we had to ship the smallest useful thing in Q1, what would it be? The roadmap proposes the full OSPS output format + mapping file + applicability engine. But a simpler starting point might be:
+- Just the mapping file (documentation-only, no runtime)
+- A `--format=osps` that only reports on controls Scorecard already covers (no new probes, lots of UNKNOWN)
+- Something else?
+
+What would make Q2 a success in your eyes?
+
+**Stephen's response:**
+
+As previously mentioned, the quarterly targets are not currently accurate. One of our Q2 outcomes should be OSPS Baseline Level 1 conformance.
+
+#### CQ-8: Existing Scorecard Action and API impact
+
+Scorecard runs at scale via the Scorecard Action (GitHub Action) and the public API (api.scorecard.dev). Should OSPS conformance be available through these surfaces from day one, or should it start as a CLI-only feature? The API and Action have their own release and stability considerations.
+
+**Stephen's response:**
+
+We need to land these capabilities for as much surface area as possible.
+
+#### CQ-9: Coverage analysis and Phase 1 scope validation
+
+**Stakeholders:** Stephen (can answer alone)
+
+The coverage analysis (`docs/osps-baseline-coverage.md`) identifies 25 Level 1 controls. Of those, 6 are COVERED, 8 are PARTIAL, 9 are GAP, and 2 are NOT_OBSERVABLE. The Phase 1 plan targets closing the 9 GAP controls. Given that 2 controls (AC-01.01, AC-02.01) are NOT_OBSERVABLE without org-admin tokens, should Phase 1 explicitly include work on improving observability (e.g., documenting what tokens are needed, or providing guidance for org admins), or should those controls remain UNKNOWN until a later phase?
+
+**Stephen's response:**
+
+
+#### CQ-10: Mapping file ownership and contribution model
+
+**Stakeholders:** Stephen, Eddie Knight, Baseline maintainers — partially superseded by CQ-17
+
+The versioned mapping file (e.g., `pkg/osps/mappings/v2026-02-19.yaml`) is a critical artifact that defines which probes satisfy which OSPS controls. Who should own this file? Options:
+- Scorecard maintainers only (changes require maintainer review)
+- Community-contributed with maintainer approval (like checks/probes today)
+- Co-maintained with ORBIT WG members who understand the Baseline controls
+
+This also affects how we handle disagreements about whether a probe truly satisfies a control.
+
+**Stephen's response:**
+
+
+#### CQ-11: Backwards compatibility of OSPS output format
+
+**Stakeholders:** Stephen, Spencer, Eddie Knight — depends on CQ-18 (output format decision)
+
+The spec requires `--format=osps` as a new output format. Since this is a new surface, we have freedom to iterate on the schema. However, once shipped, consumers will depend on it. What stability guarantees should we offer?
+- No guarantees during Phase 1 (alpha schema, may break between releases)
+- Semver-like schema versioning from day one (breaking changes increment major version)
+- Follow the Gemara L4 schema if one exists, inheriting its stability model
+
+**Stephen's response:**
+
+
+#### CQ-12: Probe gap prioritization for Phase 1
+
+**Stakeholders:** Stephen (can answer alone)
+
+The coverage analysis identifies 7 Level 1 GAP controls that need new probes (excluding the 2 that depend on Security Insights). Ranked by implementation feasibility:
+
+1. OSPS-GV-03.01 — CONTRIBUTING file presence
+2. OSPS-GV-02.01 — Issues/discussions enabled
+3. OSPS-DO-02.01 — Issue templates or bug report docs
+4. OSPS-DO-01.01 — Documentation presence heuristics
+5. OSPS-BR-07.01 — Secrets detection (platform signal consumption)
+6. OSPS-BR-03.01 / BR-03.02 — Encrypted transport (requires Security Insights)
+7. OSPS-QA-04.01 — Subproject listing (requires Security Insights)
+
+Do you agree with this priority ordering? Are there any controls you would move up or down, or any you would defer to Phase 2?
+
+**Stephen's response:**
+
+
+#### CQ-13: Minder and AMPEL integration surfaces
+
+**Stakeholders:** Stephen, Minder maintainers, Adolfo García Veytia (AMPEL), Steering Committee
+
+Two tools already consume Scorecard data for policy enforcement:
+
+**[Minder](https://github.com/mindersec/minder)** (OpenSSF Sandbox, ORBIT WG) consumes Scorecard findings to enforce security policies and auto-remediate across repositories. Uses Rego-based rules and can enforce OSPS Baseline controls via policy profiles. A draft Scorecard PR (#4723, now stale) attempted deeper integration.
+
+**[AMPEL](https://github.com/carabiner-dev/ampel)** (production v1.0.0) validates Scorecard attestations against policies in CI/CD pipelines. Already maintains [5 Scorecard-consuming policies](https://github.com/carabiner-dev/policies/tree/main/scorecard) and [36 OSPS Baseline policy mappings](https://github.com/carabiner-dev/policies/tree/main/groups/osps-baseline). Uses CEL expressions and in-toto attestations.
+
+Questions:
+- Should the OSPS conformance output be designed with Minder and AMPEL as explicit consumers (e.g., ensuring the output works as Minder policy input and as AMPEL attestation input)?
+- Should we coordinate with both Minder maintainers and Adolfo during Phase 1 to validate the integration surface?
+- Is there a risk of duplicating Baseline evaluation work that Minder or AMPEL already do via their own rules, and if so, how should we delineate?
+
+**Stephen's response:**
+
+
+#### CQ-14: Darnit vs. Minder delineation
+
+**Stakeholders:** Stephen (can answer alone)
+
+The proposal lists both [Darnit](https://github.com/kusari-oss/darnit) and [Minder](https://github.com/mindersec/minder) as tools that handle remediation and enforcement. Their capabilities overlap in some areas (both can enforce Baseline controls, both can remediate). For Scorecard's purposes, the distinction matters primarily for the "What Scorecard must not do" boundary.
+
+Is the current framing correct — that Scorecard is the measurement layer and both Minder and Darnit are downstream consumers? Or should we position Scorecard differently relative to one versus the other, given that Minder is an OpenSSF project in the same working group while Darnit is not?
+
+**Stephen's response:**
+
+
+#### CQ-15: Existing issues as Phase 1 work items
+
+**Stakeholders:** Stephen (can answer alone)
+
+The coverage analysis (`docs/osps-baseline-coverage.md`) now includes a section mapping existing Scorecard issues to OSPS Baseline gaps. Several long-standing issues align directly with Phase 1 priorities:
+
+- [#30](https://github.com/ossf/scorecard/issues/30) — Secrets scanning (OSPS-BR-07.01), open since the project's earliest days
+- [#2305](https://github.com/ossf/scorecard/issues/2305) / [#2479](https://github.com/ossf/scorecard/issues/2479) — Security Insights ingestion
+- [#2465](https://github.com/ossf/scorecard/issues/2465) — Private vulnerability reporting (OSPS-VM-03.01)
+- [#4824](https://github.com/ossf/scorecard/issues/4824) — Changelog check (OSPS-BR-04.01)
+- [#4723](https://github.com/ossf/scorecard/pull/4723) — Minder/Rego integration draft (closed)
+
+Should we adopt these existing issues as the starting work items for Phase 1, or create new issues that reference them? Some of these issues have significant discussion history that may contain design decisions worth preserving.
+
+**Stephen's response:**
+
+
+---
+
+## ORBIT WG feedback
+
+### Eddie Knight's feedback (ORBIT WG TSC Chair)
+
+The following feedback was provided by Eddie Knight (ORBIT WG Technical Steering Committee Chair, maintainer of Gemara, Privateer, and OSPS Baseline) on [PR #4952](https://github.com/ossf/scorecard/pull/4952).
+
+#### EK-1: Mapping file location
+
+> "Regarding mappings between Baseline catalog<->Scorecard checks, it is possible to easily put that into a new file with Scorecard maintainers as codeowners, pending approval from OSPS Baseline maintainers for the change."
+
+Eddie is offering to host the Baseline-to-Scorecard mapping in the OSPS Baseline repository (or a shared location) with Scorecard maintainers as CODEOWNERS. The current proposal places the mapping in the Scorecard repo (`pkg/osps/mappings/v2026-02-19.yaml`).
+
+Mappings currently exist within the Baseline Catalog and are proposed for addition to the Scorecard repository as well. The mappings could be maintained in one or both of the projects. This affects ownership, versioning cadence, and who can update the mapping when controls or probes change.
+
+The trade-offs:
+
+- **In Scorecard repo**: Scorecard maintainers fully own the mapping. Mapping updates are coupled to Scorecard releases. Other tools cannot easily consume the mapping.
+- **In Baseline repo (or shared)**: Mapping is co-owned. Versioned alongside the Baseline spec. End users and other tools (Privateer, Darnit, Minder) can consume the same mapping. Scorecard maintainers retain CODEOWNERS authority.
+
+#### EK-2: Output format — no "OSPS output format"
+
+> "There is not an 'OSPS output format,' and even the relevant Gemara schemas (which are quite opinionated) are still designed to support output in multiple output formats within the SDK, such as SARIF. I would expect that you'd keep your current output logic, and then _maybe_ add basic Gemara json/yaml as another option."
+
+The current proposal defines `--format=osps` as a new output format. Eddie clarifies that the ORBIT ecosystem does not define a special "OSPS output format" — instead, the Gemara SDK supports multiple output formats (including SARIF). The suggestion is to keep Scorecard's existing output logic and optionally add Gemara JSON/YAML as another format option.
+
+This is a significant clarification that affects the output requirements, the Phase 1 deliverables, and how we frame the conformance layer.
+
+#### EK-3: Technical relationship with Privateer plugin
+
+> "There is a stated goal of not duplicating the code from the plugin ossf/pvtr-github-repo-scanner, but the implementation plan as it's currently written does require duplication. In the current proposal, there would not be a technical relationship between the two codebases."
+
+Eddie identifies a contradiction: the proposal says "do not duplicate Privateer" but proposes building a parallel conformance engine with no code-level relationship to the Privateer plugin. The current plan would result in two separate codebases evaluating the same OSPS controls independently.
+
+#### EK-4: Catalog extraction needs an implementation plan
+
+> "There is cursory mention of a scorecard _catalog extraction_, which I'm hugely in favor of, but I don't see an implementation plan for that."
+
+The proposal mentions "Scorecard control catalog extraction plan" as a Phase 1 deliverable but does not specify what this means concretely or how it would be achieved.
+
+#### EK-5: Alternative architecture — shared plugin model
+
+> "An alternative plan would be for us to spend a week consolidating checks/probes into the pvtr plugin (with relevant CODEOWNERS), then update Scorecard to selectively execute the plugin under the covers."
+
+Eddie proposes a fundamentally different architecture:
+
+1. Consolidate Scorecard checks/probes into the [Privateer plugin](https://github.com/ossf/pvtr-github-repo-scanner) as shared evaluation logic
+2. Scorecard executes the plugin under the covers for Baseline evaluation and then Scorecard handles follow-up logic such as scoring and storing the results
+3. Privateer and LFX Insights can optionally run Scorecard checks via the same plugin
+
+**Claimed benefits:**
+- Extract the Scorecard control catalog for independent versioning and cross-catalog mapping to Baseline
+- Instantly integrate Gemara into Scorecard
+- Allow bidirectional check execution (Scorecard runs Privateer checks; Privateer runs Scorecard checks)
+- Simplify contribution overhead for individual checks
+- Improve both codebases through shared logic
+
+**This is the central architectural decision for the proposal.** The Steering Committee needs to evaluate this against the current plan (Scorecard builds its own conformance engine).
+
+### Adolfo García Veytia's feedback (AMPEL maintainer)
+
+The following feedback was provided by Adolfo García Veytia (@puerco, maintainer of [AMPEL](https://github.com/carabiner-dev/ampel)) on [PR #4952](https://github.com/ossf/scorecard/pull/4952).
+
+#### AP-1: Mapping file registry — single source preferred
+
+> "It's great that you also see the need for machine-readable data. This would help projects like AMPEL write policies that enforce the baseline controls based on the results from Scorecard and other analysis tools."
+>
+> "Initially, we were trying to build the mappings into baseline itself. I still think it's the way to go as it would be better to have a single registry and data format of those mappings (in this case baseline's). Unfortunately, the way baseline considers its mappings [was demoted](https://github.com/ossf/security-baseline/pull/476) so we don't have that registry anymore."
+
+Adolfo strongly supports machine-readable mapping data and prefers a single registry in the Baseline itself, though the Baseline's own mapping support was recently demoted (PR #476 in security-baseline). This aligns with Eddie's offer (EK-1) to host mappings in the Baseline repo, but adds the context that there is no longer an official registry for tool-to-control mappings.
+
+AMPEL already maintains its own [Scorecard-to-Baseline policy mappings](https://github.com/carabiner-dev/policies/tree/main/groups/osps-baseline) (36 OSPS control policies, 5 of which directly consume Scorecard probe results). An official upstream mapping from Scorecard would benefit the entire ecosystem.
+
+#### AP-2: Output format — use in-toto predicates, not a custom format
+
+> "As others have mentioned, there is no _OSPS output format_ but there are two formal/in process of formalizing in-toto predicate types that are useful for this:
+>
+> **[Simple Verification Results](https://github.com/in-toto/attestation/blob/main/spec/predicates/svr.md)** — a simple predicate that communicates just the verified control labels along with the tool that performed the evaluation. It is a generalization of the VSA for non-SLSA controls.
+>
+> **[The "Baseline" Predicate](https://github.com/in-toto/attestation/pull/502)** — Still not merged, this predicate type was proposed by some of the baseline maintainers to capture an interoperability format more in line with the requirements in this spec, including manual assessments (what is named in this PR as 'ATTESTED')."
+
+Adolfo identifies two concrete in-toto predicate types that Scorecard should consider for output instead of inventing a custom format:
+
+1. **Simple Verification Results (SVR)**: Already merged in the in-toto attestation spec. Communicates verified control labels and the evaluating tool. Generalizes SLSA VSA to non-SLSA controls.
+2. **Baseline Predicate**: Proposed by Baseline maintainers (PR #502, not yet merged). Designed for interoperability and includes support for manual assessments (ATTESTED status).
+
+This is the most concrete guidance on output format so far and directly informs CQ-18.
+
+#### AP-3: Attestation question conflates identity and tooling
+
+> "The question here is conflating two domains. One question is _who_ signs the attestation, and how can those identities be trusted (identity). The other is _what_ (tool) generates the attestations, and more importantly, from scorecard's perspective, when. This hints at a policy implementation and the answers will most likely differ for projects and controls. Happy to chat about this one day."
+
+Adolfo clarifies that OQ-1 (attestation mechanism identity) is actually two separate questions:
+1. **Identity**: Who signs the attestation, and how are those identities trusted? (Sigstore/OIDC, personal keys, etc.)
+2. **Tooling**: What tool generates the attestation, and when? (Scorecard during scan, CI pipeline, manual process)
+
+The answers will differ per project and per control. This decomposition should inform how OQ-1 is resolved.
+
+#### AP-4: AMPEL already consumes Scorecard data for Baseline enforcement
+
+> "I agree with this role statement. Just as minder, ampel also can enforce Scorecard's data ([see an example here](https://github.com/carabiner-dev/policies/blob/ab1eb42ef179c7a0016d6b7ed72991774a48f151/scorecard/sast.json#L4)) and we also [maintain a mapping of some of scorecard's probes vs baseline controls](https://github.com/carabiner-dev/policies/blob/ab1eb42ef179c7a0016d6b7ed72991774a48f151/groups/osps-baseline/osps-vm-06.hjson#L5) that would greatly benefit from an official/upstream map.
+>
+> The probes can enrich the baseline ecosystem substantially and having the data accessible from other tools encourages other projects in the ecosystem to help maintain and improve them."
+
+AMPEL is an active consumer of Scorecard data today:
+- 5 production policies directly evaluate Scorecard probe results (SAST, binary artifacts, code review, dangerous workflows, token permissions)
+- 36 OSPS Baseline policy mappings, several of which reference Scorecard checks
+- An official upstream Scorecard-to-Baseline mapping would directly benefit AMPEL's policy library
+
+This validates the proposal's direction of making Scorecard's probe results and control mappings available to the broader ecosystem.
+
+### Mike Lieberman's feedback
+
+The following feedback was provided by Mike Lieberman (@mlieberman85) on [PR #4952](https://github.com/ossf/scorecard/pull/4952).
+
+#### ML-1: No "OSPS output format" exists
+
+> "What is OSPS output format?"
+> — on ROADMAP.md, Phase 1 deliverable
+
+Mike echoes Eddie's (EK-2) and Adolfo's (AP-2) point: there is no defined "OSPS output format." This is the third reviewer to flag this, confirming it needs to be reframed. The output format question (CQ-18) now has concrete alternatives: Gemara SDK formats (Eddie), in-toto SVR/Baseline predicates (Adolfo), or extending existing Scorecard formats.
+
+---
+
+## Clarifying questions from ORBIT WG feedback
+
+The following clarifying questions require Steering Committee decisions
+informed by Eddie's, Adolfo's, and Mike's feedback.
+
+#### CQ-16: Allstar's role in OSPS conformance enforcement
+
+**Stakeholders:** Stephen (can answer alone)
+
+[Allstar](https://github.com/ossf/allstar) is a Scorecard sub-project that continuously monitors GitHub organizations and enforces Scorecard check results as policies (branch protection, binary artifacts, security policy, dangerous workflows). It already enforces a subset of controls aligned with OSPS Baseline.
+
+With OSPS conformance output, Allstar could potentially enforce Baseline conformance at the organization level — e.g., opening issues or auto-remediating when a repository falls below Level 1 conformance. Should the proposal explicitly include Allstar as a Phase 1 consumer of OSPS output, or should that be deferred? And more broadly, should Allstar be considered part of the "enforcement" boundary that Scorecard itself does not cross, even though it is a Scorecard sub-project?
+
+**Stephen's response:**
+
+
+#### CQ-17: Mapping file location — Scorecard repo or shared?
+
+**Stakeholders:** Stephen, Eddie Knight, OSPS Baseline maintainers
+
+Eddie offers to host the Baseline-to-Scorecard mapping in the Baseline repository with Scorecard maintainers as CODEOWNERS (EK-1). The current proposal places it in the Scorecard repo.
+
+Options:
+1. **Scorecard repo** (`pkg/osps/mappings/`): Scorecard owns the mapping entirely. Mapping is coupled to Scorecard releases and probe changes.
+2. **Baseline repo** (or shared location): Co-owned with ORBIT WG. Other tools can consume the same mapping. Scorecard maintainers retain CODEOWNERS authority over their portion.
+3. **Both**: Scorecard maintains a local mapping for runtime use; a shared mapping in the Baseline repo serves as the cross-tool reference. Keep them in sync.
+
+Which approach do you prefer?
+
+_Note that this question is negated if consolidating check logic within `pvtr-github-repo-scanner`, because then the mappings are managed within the control catalog in Gemara format._
+
+**Stephen's response:**
+
+
+#### CQ-18: Output format — `--format=osps` vs. ecosystem formats
+
+**Stakeholders:** Stephen, Spencer (OQ-4 constrains this), Eddie Knight, Adolfo García Veytia
+
+Three reviewers (Eddie, Adolfo, Mike) independently flagged that no "OSPS output format" exists. Eddie suggests Gemara SDK formats (EK-2). Adolfo identifies two concrete in-toto predicate types (AP-2): the [Simple Verification Results (SVR)](https://github.com/in-toto/attestation/blob/main/spec/predicates/svr.md) predicate (merged) and the [Baseline Predicate](https://github.com/in-toto/attestation/pull/502) (proposed, not yet merged).
+
+Options:
+1. **Keep `--format=osps`**: Define a Scorecard-specific conformance output format. Risk: inventing a format that three reviewers have said doesn't belong.
+2. **Use `--format=gemara`** (or similar): Integrate the Gemara SDK and output Gemara assessment results in JSON/YAML. Aligns with ORBIT ecosystem, creates a Gemara SDK dependency.
+3. **Use in-toto predicates**: Output conformance results as in-toto attestations using SVR or the Baseline predicate. Aligns with in-toto ecosystem and Adolfo's guidance. The Baseline predicate is not yet merged.
+4. **Extend existing formats**: Add conformance data to `--format=json` and `--format=sarif` outputs. No new format flag needed.
+5. **Combination**: Use Gemara SDK for structured output + in-toto predicates for attestation output. These are not mutually exclusive.
+
+Which approach do you prefer?
+
+**Stephen's response:**
+
+
+#### CQ-19: Architectural direction — build vs. integrate
+
+**Stakeholders:** Stephen, Spencer, Eddie Knight, Steering Committee, at least 1 non-Steering maintainer — this is the gating decision; most other open questions depend on its outcome
+
+This is the central decision. Eddie proposes consolidating Scorecard checks/probes into the Privateer plugin and having Scorecard execute the plugin (EK-5). The current proposal has Scorecard building its own conformance engine.
+
+**Option A: Scorecard builds its own conformance engine** (current proposal)
+- Scorecard adds a mapping file, conformance evaluation logic, and output format
+- No code-level dependency on Privateer
+- Scorecard controls its own release cadence and architecture
+- Risk: duplicates evaluation logic, no technical relationship with Privateer (EK-3)
+
+**Option B: Shared plugin model** (Eddie's alternative)
+- Scorecard checks/probes are consolidated into the Privateer plugin
+- Scorecard executes the plugin under the covers
+- Bidirectional: Privateer users can also run Scorecard checks e.g., LFX Insights
+- Gemara integration comes for free via the plugin
+- Risk: Scorecard releases are coupled to plugin's release cadence; CODEOWNERS in the second repo must be meticulously managed to avoid surprises; multi-platform support (GitLab, Azure DevOps, local) will require maintenance of independent plugins with isolated data collection for each platform
+
+**Option C: Hybrid**
+- Scorecard maintains its own probe execution (its core competency)
+- Scorecard exports its probe results in a format the Privateer plugin can consume (Gemara L5)
+- The Privateer plugin consumes Scorecard output as supplementary evidence
+- Control catalog is extracted and shared, but evaluation logic stays separate
+- Users will choose between the Privateer plugin and Scorecard for Baseline evaluations
+- No code-level coupling, but interoperable output
+
+Which option do you prefer? What are your concerns about taking a dependency on the Privateer plugin codebase?
+
+**Stephen's response:**
+
+
+#### CQ-20: Catalog extraction — what does it mean concretely?
+
+**Stakeholders:** Stephen, Eddie Knight, Steering Committee
+
+Eddie is "hugely in favor" of extracting the Scorecard control catalog (EK-4) but the proposal lacks an implementation plan. Concretely, this could mean:
+
+1. **Machine-readable probe definitions**: Export `probes/*/def.yml` as a versioned catalog (already exists in the repo, but not packaged for external consumption)
+2. **Gemara L2 control definitions**: Map Scorecard probes to Gemara Layer 2 schema entries, making them available in the Gemara catalog
+3. **Shared evaluation steps**: Extract Scorecard's probe logic into a reusable Go library or Privateer plugin steps that other tools can execute
+4. **API-level catalog**: Expose probe definitions via the Scorecard API so tools can discover what Scorecard can evaluate
+
+What level of extraction do you envision? Is option 2 (Gemara L2 integration) the right target, or should we start simpler?
+
+**Stephen's response:**
+
+
+#### CQ-21: Privateer code duplication — is it acceptable?
+
+**Stakeholders:** Stephen, Spencer, Eddie Knight, Steering Committee — flows from CQ-19
+
+Eddie points out that the current proposal would result in two codebases evaluating the same OSPS controls independently (EK-3). Even if the proposal says "don't duplicate Privateer," building a separate conformance engine effectively does that.
+
+Is some duplication acceptable if it means Scorecard retains architectural independence? Or is avoiding duplication a hard constraint that should drive us toward the shared plugin model (CQ-19 Option B)?
+
+**Stephen's response:**
+
+
+#### CQ-22: Attestation decomposition — identity vs. tooling
+
+**Stakeholders:** Stephen, Spencer, Adolfo García Veytia, Eddie Knight
+
+Adolfo points out that OQ-1 (attestation mechanism identity) conflates two questions (AP-3):
+
+1. **Identity**: Who signs the attestation, and how are those identities trusted? (Sigstore/OIDC, personal keys, platform-native)
+2. **Tooling**: What tool generates the attestation, and when? (Scorecard during scan, CI pipeline post-scan, manual maintainer process)
+
+The answers will differ per project and per control. Should OQ-1 be decomposed into these two sub-questions, and should the design allow different identity/tooling combinations per control?
+
+Adolfo has offered to discuss this in depth.
+
+**Stephen's response:**
+
+
+#### CQ-23: Mapping registry — where should the canonical mapping live?
+
+**Stakeholders:** Stephen, Eddie Knight, Adolfo García Veytia, Baseline maintainers
+
+Three perspectives have emerged on where Scorecard-to-Baseline mappings should live:
+
+- **Eddie (EK-1)**: Host in the Baseline repo with Scorecard maintainers as CODEOWNERS
+- **Adolfo (AP-1)**: Prefers a single registry in the Baseline itself, but notes the Baseline's mapping support was [demoted](https://github.com/ossf/security-baseline/pull/476)
+- **Current proposal**: Host in Scorecard repo (`pkg/osps/mappings/`)
+
+Additionally, AMPEL already maintains [independent Scorecard-to-Baseline mappings](https://github.com/carabiner-dev/policies/tree/main/groups/osps-baseline) in its policy library. An official upstream mapping would benefit both AMPEL and the wider ecosystem.
+
+This extends CQ-17 with Adolfo's context about the demoted Baseline registry. Should the Scorecard mapping effort also advocate for restoring a shared registry in the Baseline spec?
+
+**Stephen's response:**
+
+
+---
+
+## Decision priority analysis
+
+The open questions have dependencies between them. Answering them in the
+wrong order will result in rework. The recommended sequence follows.
+
+### Tier 1 — Gating decisions (answer before all others)
+
+| Question | Why it gates | Who decides |
+|----------|-------------|-------------|
+| **CQ-19** | Architectural direction (build vs. integrate vs. hybrid). If answered as Option B (shared plugin), CQ-17, CQ-18, CQ-20, and CQ-21 are either resolved or fundamentally reframed. | Stephen + Spencer + Eddie Knight + Steering Committee |
+| **OQ-1** | Attestation identity model. Spencer flagged as blocking. Determines how non-automatable controls are handled across all phases. | Spencer + Stephen + Steering Committee |
+
+### Tier 2 — Downstream of CQ-19 (answer once Tier 1 is resolved)
+
+| Question | Dependency | Who decides |
+|----------|-----------|-------------|
+| **CQ-18** | Output format depends on CQ-19's architectural direction. Three reviewers flagged "no OSPS output format." In-toto predicates (AP-2) and Gemara SDK are concrete alternatives. | Stephen + Spencer + Eddie Knight + Adolfo |
+| **CQ-17/CQ-23** | Mapping file location depends on CQ-19 (negated if Option B). Adolfo adds context: Baseline registry was demoted (AP-1). | Stephen + Eddie Knight + Adolfo + Baseline maintainers |
+| **CQ-22** | Attestation decomposition (identity vs. tooling). Refines OQ-1. | Stephen + Spencer + Adolfo + Eddie Knight |
+| **OQ-2** | Enforcement detection scope. Affects Phase 3 scope. | Spencer + Stephen + Steering Committee |
+
+### Tier 3 — Important but non-blocking for Phase 1 start
+
+| Question | Notes | Who decides |
+|----------|-------|-------------|
+| **CQ-20** | Catalog extraction scope. Flows from CQ-19. | Stephen + Eddie Knight |
+| **CQ-21** | Code duplication tolerance. Flows from CQ-19. | Stephen + Spencer + Eddie Knight |
+| **CQ-13** | Minder/AMPEL integration surface. Affects ecosystem positioning. | Stephen + Minder maintainers + Adolfo |
+| **CQ-11** | Output stability guarantees. Depends on CQ-18. | Stephen + Spencer |
+
+### Tier 4 — Stephen can answer alone (any time)
+
+| Question | Notes |
+|----------|-------|
+| **CQ-9** | NOT_OBSERVABLE controls — implementation detail, UNKNOWN-first principle already agreed. |
+| **CQ-12** | Probe gap priority ordering — coverage doc already proposes an order. |
+| **CQ-14** | Darnit vs. Minder delineation — ecosystem positioning Stephen can articulate. |
+| **CQ-15** | Existing issues as Phase 1 work items — backlog triage. |
+| **CQ-16** | Allstar's role — Scorecard sub-project under same Steering Committee. |
+
+### Effectively resolved
+
+| Question | Resolution |
+|----------|-----------|
+| **OQ-3** | Drop `scan_scope` from the schema (Spencer's feedback). |
+| **OQ-4** | Evidence is probe-based only, not check-based (adopted). |
+| **CQ-10** | Partially superseded by CQ-17 (same topic with Eddie's context). |
+
+### Recommended next steps
+
+1. **Schedule a discussion with Eddie Knight and Adolfo García Veytia** to resolve CQ-19. Bring Spencer. This is the fork in the road — everything downstream depends on it.
+2. **Resolve OQ-1/CQ-22** with Spencer, Adolfo, and the Steering Committee. Spencer flagged OQ-1 as blocking. Adolfo's decomposition (identity vs. tooling) clarifies what needs to be decided.
+3. **Answer the Tier 4 questions** at any time — they are independent and don't block others.
+4. **Once CQ-19 and OQ-1/CQ-22 are decided**, the Tier 2 and 3 questions can be resolved quickly since they flow from the architectural direction.
diff --git a/openspec/changes/osps-baseline-conformance/proposal.md b/openspec/changes/osps-baseline-conformance/proposal.md
index 089902f8739..a1e77ec84c0 100644
--- a/openspec/changes/osps-baseline-conformance/proposal.md
+++ b/openspec/changes/osps-baseline-conformance/proposal.md
@@ -79,56 +79,22 @@ A fresh analysis of Scorecard's current coverage against OSPS Baseline v2026.02.
 - **Multi-repo scanning** (`--repos`, `--org`) — needed for OSPS-QA-04.02 (subproject conformance)
 - **Serve mode** — HTTP surface for pipeline integration
 
-## Open questions from maintainer review
+## Open questions
 
-The following questions were raised by Spencer (Steering Committee member) during review of the roadmap and need to be resolved before or during implementation:
+Several design questions are under active discussion. Spencer (Steering
+Committee) raised questions about attestation identity (OQ-1), enforcement
+detection scope (OQ-2), and the evidence model (OQ-4, resolved — probe-based
+only). Eddie Knight, Adolfo García Veytia, and Mike Lieberman provided
+ORBIT WG feedback on output formats, mapping file ownership, and architectural
+direction.
 
-### OQ-1: Attestation mechanism identity
+**The central open question is CQ-19: should Scorecard build its own
+conformance engine (current proposal), adopt a shared plugin model with
+Privateer, or take a hybrid approach?** This decision gates most other
+open questions.
 
-> "The attestation/provenance layer. What is doing the attestation? Is this some OIDC? A personal token? A workflow (won't have the right tokens)?"
-> — Spencer, on Section 5.1
-
-**Stakeholders:** Spencer (raised this, flagged as blocking), Stephen, Steering Committee
-
-This is a fundamental design question. Options include:
-- **Repo-local metadata files** (e.g., Security Insights, `.osps-attestations.yml`): simplest, no cryptographic identity, maintainer self-declares by committing the file.
-- **Signed attestations via Sigstore/OIDC**: strongest guarantees, but requires workflow identity and the right tokens — which Spencer correctly notes may not be available in all contexts.
-- **Platform-native signals**: e.g., GitHub's private vulnerability reporting enabled status, which the platform attests implicitly.
-
-**Recommendation to discuss**: Start with repo-local metadata files (unsigned) for the v1 attestation mechanism, with a defined extension point for signed attestations in a future iteration. This avoids blocking on the identity question while still making non-automatable controls reportable.
-
-### OQ-2: Scorecard's role in enforcement detection vs. enforcement
-
-> "I thought the other doc said Scorecard wasn't an enforcement tool?"
-> — Spencer, on Q4 deliverables (enforcement detection)
-
-**Stakeholders:** Spencer (raised this), Stephen, Steering Committee
-
-This is a critical framing question. The roadmap proposes *detecting* whether enforcement exists (e.g., "are SAST results required to pass before merge?"), not *performing* enforcement. But the line between "detecting enforcement" and "being an enforcement tool" needs to be drawn clearly.
-
-**Recommendation to discuss**: Scorecard detects and reports whether enforcement mechanisms are in place. It does not itself enforce. The `--fail-on=fail` CI gating is a reporting exit code, not an enforcement action — the CI system is the enforcer. This distinction should be documented explicitly.
-
-### OQ-3: `scan_scope` field in output schema
-
-> "Not sure I see the importance [of `scan_scope`]"
-> — Spencer, on Section 9 (output schema)
-
-**Stakeholders:** Stephen (can resolve alone)
-
-The `scan_scope` field (repo|org|repos) in the proposed OSPS output schema may not carry meaningful information. If the output always describes a single repository's conformance, the scope is implicit.
-
-**Recommendation to discuss**: Drop `scan_scope` from the schema unless multi-repo aggregation (OSPS-QA-04.02) produces a fundamentally different output shape. Revisit in Q4 when project-level aggregation is implemented.
-
-### OQ-4: Evidence model — probes only, not checks
-
-> "[Evidence] should be probe-based only, not check"
-> — Spencer, on Section 9 (output schema)
-
-**Stakeholders:** Spencer (raised this), Stephen — effectively resolved (adopted)
-
-Spencer's position is that OSPS evidence references should point to probe findings, not check-level results. This aligns with the architectural direction of Scorecard v5 (probes as the measurement unit, checks as scoring aggregations).
-
-**Recommendation**: Adopt this. The `evidence` array in the OSPS output schema should reference probes and their findings only. Checks may be listed in a `derived_from` field for human context but are not evidence.
+For the full list of questions, reviewer feedback, maintainer responses, and
+decision priority analysis, see [`decisions.md`](decisions.md).
 
 ## Scope
 
@@ -264,551 +230,7 @@ flowchart TD
 - **[non-blocking]** Reviews from maintainers of tools in the WG ORBIT ecosystem
 - **[informational]** Notify WG ORBIT members (TAC sign-off not required)
 
----
-
-## Maintainer review
-
-### Stephen's notes
-
-<!-- Stephen: Use this section to record your overall impressions, concerns,
-     and positions on the proposal. Edit freely — this is your space. -->
-
-**Overall assessment:**
-
-
-**Key concerns or risks:**
-
-
-**Things I agree with:**
-
-
-**Things I disagree with or want to change:**
-
-- "PVTR" is shorthand for "Privateer". Throughout this proposal it makes it appear as if https://github.com/ossf/pvtr-github-repo-scanner is separate from Privateer, when it is really THE Privateer plugin for GitHub repositories. Any references to PVTR should be corrected.
-- This proposal does not contain an even consideration of the capabilities of [Darnit](https://github.com/kusari-oss/darnit) and [AMPEL](https://github.com/carabiner-dev/ampel). We should do that comparison to get a better idea of what should be in or out of scope for Scorecard.
-- The timeline that is in this proposal is not accurate, as we're already about to enter Q2 2026. We should focus on phases and outcomes, and let maintainer bandwidth dictate delivery timing.
-- Scorecard has an existing set of checks and probes, which is essentially a control catalog. We should make a plan to extract the Scorecard control catalog so that it can be used by other tools that can handle evaluation tasks.
-- Use Mermaid when creating diagrams.
-- We need to understand what level of coverage Scorecard currently has for OSPS Baseline and that analysis should be created in a separate file (in `docs/`). Assume that any existing findings are out-of-date.
-- `docs/roadmap-ideas.md` will not be committed to the repo, as it is a rough draft which needs to be refined for public consumption. We should create `docs/ROADMAP.md` with a 2026 second-level heading with contains the publicly-consummable roadmap.
-
-**Priority ordering — what matters most to ship first:**
-
-
-### Clarifying questions
-
-The following questions need your input before this proposal can move to design. Please fill in your response under each question.
-
-#### CQ-1: Scorecard as a conformance tool — product identity
-
-The proposal frames this as a "product-level shift" where Scorecard gains a second mode: conformance evaluation alongside its existing scoring. Does this framing match your vision, or do you see conformance as eventually *replacing* the scoring model? Should we be thinking about deprecating 0-10 scores long-term, or do both modes coexist indefinitely?
-
-**Stephen's response:**
-
-I believe the scoring model will continue to be useful to consumers and it should be maintained. For now, both modes should coexist. There is no need to make a decision about this for the current iteration of the proposal.
-
-#### CQ-2: OSPS Baseline version targeting
-
-The roadmap previously targeted OSPS Baseline v2025-10-10. The Privateer GitHub plugin targets v2025-02-25. The Baseline is a living spec with periodic releases. How should Scorecard handle version drift? Options:
-- Support only the latest version at any given time
-- Support multiple versions concurrently via the versioned mapping file
-- Pin to a version and update on a defined cadence (e.g., quarterly)
-
-**Stephen's response:**
-
-The current version of the OSPS Baseline is [v2026.02.19](https://baseline.openssf.org/versions/2026-02-19).
-
-We should align with the latest version at first and have a process for aligning with new versions on a defined cadence. We should understand the [OSPS Baseline maintenance process](https://baseline.openssf.org/maintenance.html) and align with it.
-
-The OSPS Baseline [FAQ](https://baseline.openssf.org/faq.html) and [Implementation Guidance for Maintainers](https://baseline.openssf.org/maintainers.html) may have guidance we should consider incorporating.
-
-#### CQ-3: Security Insights as a hard dependency
-
-Many OSPS controls depend on Security Insights data (official channels, distribution points, subproject inventory, core team). The Privateer GitHub plugin treats the Security Insights file as nearly required — most of its evaluation steps begin with `HasSecurityInsightsFile`.
-
-Should Scorecard:
-- Treat Security Insights the same way (controls that need it go UNKNOWN without it)?
-- Provide a degraded but still useful evaluation without it?
-- Accept alternative metadata sources (e.g., `.project`, custom config)?
-
-This also raises a broader adoption question: most projects today don't have a `security-insights.yml`. How do we avoid making the OSPS output useless for the majority of repositories?
-
-**Stephen's response:**
-
-We should provide a degraded, but still-useful evaluation without a Security Insights file, especially since our probes today can already cover a lot of ground without it. It would be good for us to eventually support alternative metadata sources, but this should not be an immediate goal.
-
-#### CQ-4: PVTR relationship — complement vs. converge
-
-The proposal positions Scorecard as complementary to the Privateer plugin. But there's a deeper question: should this stay as two separate tools indefinitely, or is the long-term goal convergence (e.g., the Privateer plugin consuming Scorecard as a library, or Scorecard becoming a Privateer plugin itself)? Your position on this affects how tightly we couple the output formats and whether we invest in Gemara SDK integration.
-
-**Stephen's response:**
-
-Multiple tools should be able to consume Scorecard, so yes, we should invest in Gemara SDK integration.
-
-#### CQ-5: Scope of new probes in 2026
-
-The roadmap calls for significant new probe development (secrets detection, governance/docs presence, dependency manifests, release asset inspection, enforcement detection). That's a lot of new surface area. Should we:
-- Build all of these within Scorecard?
-- Prioritize a subset and defer the rest?
-- Look for ways to consume signals from external tools (e.g., GitHub's secret scanning API, SBOM tools) rather than building detection from scratch?
-
-If prioritizing, which new probes matter most to you?
-
-**Stephen's response:**
-
-We should prioritize OSPS Baseline Level 1 conformance work.
-We should consider any signals that can be consumed from external sources.
-
-#### CQ-6: Community and governance process
-
-This is a major initiative touching Scorecard's product direction. What's the governance process for getting this approved?
-- Does this need a formal proposal to the Scorecard maintainer group?
-- Should this be presented at an ORBIT WG meeting?
-- Do we need sign-off from the OpenSSF TAC?
-- Who else beyond you and Spencer needs to weigh in?
-
-**Stephen's response:**
-
-We should have Stephen and Spencer sign off on this proposal as Steering Committee members. In addition, we should have reviews from:
-- [blocking] At least 1 non-Steering Scorecard maintainer
-- [non-blocking] Maintainers of tools in the WG ORBIT ecosystem
-
-This does not require review from the TAC, but we should inform WG ORBIT members.
-
-#### CQ-7: The "minimum viable conformance report"
-
-If we had to ship the smallest useful thing in Q1, what would it be? The roadmap proposes the full OSPS output format + mapping file + applicability engine. But a simpler starting point might be:
-- Just the mapping file (documentation-only, no runtime)
-- A `--format=osps` that only reports on controls Scorecard already covers (no new probes, lots of UNKNOWN)
-- Something else?
-
-What would make Q1 a success in your eyes?
-
-**Stephen's response:**
-
-As previously mentioned, the quarterly targets are not currently accurate. One of our Q2 outcomes should be OSPS Baseline Level 1 conformance.
-
-#### CQ-8: Existing Scorecard Action and API impact
-
-Scorecard runs at scale via the Scorecard Action (GitHub Action) and the public API (api.scorecard.dev). Should OSPS conformance be available through these surfaces from day one, or should it start as a CLI-only feature? The API and Action have their own release and stability considerations.
-
-**Stephen's response:**
-
-We need to land these capabilities for as much surface area as possible.
-
-#### CQ-9: Coverage analysis and Phase 1 scope validation
-
-**Stakeholders:** Stephen (can answer alone)
-
-The coverage analysis (`docs/osps-baseline-coverage.md`) identifies 25 Level 1 controls. Of those, 6 are COVERED, 8 are PARTIAL, 9 are GAP, and 2 are NOT_OBSERVABLE. The Phase 1 plan targets closing the 9 GAP controls. Given that 2 controls (AC-01.01, AC-02.01) are NOT_OBSERVABLE without org-admin tokens, should Phase 1 explicitly include work on improving observability (e.g., documenting what tokens are needed, or providing guidance for org admins), or should those controls remain UNKNOWN until a later phase?
-
-**Stephen's response:**
-
-
-#### CQ-10: Mapping file ownership and contribution model
-
-**Stakeholders:** Stephen, Eddie Knight, Baseline maintainers — partially superseded by CQ-17
-
-The versioned mapping file (e.g., `pkg/osps/mappings/v2026-02-19.yaml`) is a critical artifact that defines which probes satisfy which OSPS controls. Who should own this file? Options:
-- Scorecard maintainers only (changes require maintainer review)
-- Community-contributed with maintainer approval (like checks/probes today)
-- Co-maintained with ORBIT WG members who understand the Baseline controls
-
-This also affects how we handle disagreements about whether a probe truly satisfies a control.
-
-**Stephen's response:**
-
-
-#### CQ-11: Backwards compatibility of OSPS output format
-
-**Stakeholders:** Stephen, Spencer, Eddie Knight — depends on CQ-18 (output format decision)
-
-The spec requires `--format=osps` as a new output format. Since this is a new surface, we have freedom to iterate on the schema. However, once shipped, consumers will depend on it. What stability guarantees should we offer?
-- No guarantees during Phase 1 (alpha schema, may break between releases)
-- Semver-like schema versioning from day one (breaking changes increment major version)
-- Follow the Gemara L4 schema if one exists, inheriting its stability model
-
-**Stephen's response:**
-
-
-#### CQ-12: Probe gap prioritization for Phase 1
-
-**Stakeholders:** Stephen (can answer alone)
-
-The coverage analysis identifies 7 Level 1 GAP controls that need new probes (excluding the 2 that depend on Security Insights). Ranked by implementation feasibility:
-
-1. OSPS-GV-03.01 — CONTRIBUTING file presence
-2. OSPS-GV-02.01 — Issues/discussions enabled
-3. OSPS-DO-02.01 — Issue templates or bug report docs
-4. OSPS-DO-01.01 — Documentation presence heuristics
-5. OSPS-BR-07.01 — Secrets detection (platform signal consumption)
-6. OSPS-BR-03.01 / BR-03.02 — Encrypted transport (requires Security Insights)
-7. OSPS-QA-04.01 — Subproject listing (requires Security Insights)
-
-Do you agree with this priority ordering? Are there any controls you would move up or down, or any you would defer to Phase 2?
-
-**Stephen's response:**
-
-
-#### CQ-13: Minder and AMPEL integration surfaces
-
-**Stakeholders:** Stephen, Minder maintainers, Adolfo García Veytia (AMPEL), Steering Committee
-
-Two tools already consume Scorecard data for policy enforcement:
-
-**[Minder](https://github.com/mindersec/minder)** (OpenSSF Sandbox, ORBIT WG) consumes Scorecard findings to enforce security policies and auto-remediate across repositories. Uses Rego-based rules and can enforce OSPS Baseline controls via policy profiles. A draft Scorecard PR (#4723, now stale) attempted deeper integration.
-
-**[AMPEL](https://github.com/carabiner-dev/ampel)** (production v1.0.0) validates Scorecard attestations against policies in CI/CD pipelines. Already maintains [5 Scorecard-consuming policies](https://github.com/carabiner-dev/policies/tree/main/scorecard) and [36 OSPS Baseline policy mappings](https://github.com/carabiner-dev/policies/tree/main/groups/osps-baseline). Uses CEL expressions and in-toto attestations.
-
-Questions:
-- Should the OSPS conformance output be designed with Minder and AMPEL as explicit consumers (e.g., ensuring the output works as Minder policy input and as AMPEL attestation input)?
-- Should we coordinate with both Minder maintainers and Adolfo during Phase 1 to validate the integration surface?
-- Is there a risk of duplicating Baseline evaluation work that Minder or AMPEL already do via their own rules, and if so, how should we delineate?
-
-**Stephen's response:**
-
-
-#### CQ-14: Darnit vs. Minder delineation
-
-**Stakeholders:** Stephen (can answer alone)
-
-The proposal lists both [Darnit](https://github.com/kusari-oss/darnit) and [Minder](https://github.com/mindersec/minder) as tools that handle remediation and enforcement. Their capabilities overlap in some areas (both can enforce Baseline controls, both can remediate). For Scorecard's purposes, the distinction matters primarily for the "What Scorecard must not do" boundary.
-
-Is the current framing correct — that Scorecard is the measurement layer and both Minder and Darnit are downstream consumers? Or should we position Scorecard differently relative to one versus the other, given that Minder is an OpenSSF project in the same working group while Darnit is not?
-
-**Stephen's response:**
-
-
-#### CQ-15: Existing issues as Phase 1 work items
-
-**Stakeholders:** Stephen (can answer alone)
-
-The coverage analysis (`docs/osps-baseline-coverage.md`) now includes a section mapping existing Scorecard issues to OSPS Baseline gaps. Several long-standing issues align directly with Phase 1 priorities:
-
-- [#30](https://github.com/ossf/scorecard/issues/30) — Secrets scanning (OSPS-BR-07.01), open since the project's earliest days
-- [#2305](https://github.com/ossf/scorecard/issues/2305) / [#2479](https://github.com/ossf/scorecard/issues/2479) — Security Insights ingestion
-- [#2465](https://github.com/ossf/scorecard/issues/2465) — Private vulnerability reporting (OSPS-VM-03.01)
-- [#4824](https://github.com/ossf/scorecard/issues/4824) — Changelog check (OSPS-BR-04.01)
-- [#4723](https://github.com/ossf/scorecard/pull/4723) — Minder/Rego integration draft (closed)
-
-Should we adopt these existing issues as the starting work items for Phase 1, or create new issues that reference them? Some of these issues have significant discussion history that may contain design decisions worth preserving.
-
-**Stephen's response:**
-
-
-### ORBIT WG feedback
-
-The following feedback was provided by Eddie Knight (ORBIT WG Technical Steering Committee Chair, maintainer of Gemara, Privateer, and OSPS Baseline) on [PR #4952](https://github.com/ossf/scorecard/pull/4952).
-
-#### EK-1: Mapping file location
-
-> "Regarding mappings between Baseline catalog<->Scorecard checks, it is possible to easily put that into a new file with Scorecard maintainers as codeowners, pending approval from OSPS Baseline maintainers for the change."
-
-Eddie is offering to host the Baseline-to-Scorecard mapping in the OSPS Baseline repository (or a shared location) with Scorecard maintainers as CODEOWNERS. The current proposal places the mapping in the Scorecard repo (`pkg/osps/mappings/v2026-02-19.yaml`).
-
-Mappings currently exist within the Baseline Catalog and are proposed for addition to the Scorecard repository as well. The mappings could be maintained in one or both of the projects. This affects ownership, versioning cadence, and who can update the mapping when controls or probes change.
-
-The trade-offs:
-
-- **In Scorecard repo**: Scorecard maintainers fully own the mapping. Mapping updates are coupled to Scorecard releases. Other tools cannot easily consume the mapping.
-- **In Baseline repo (or shared)**: Mapping is co-owned. Versioned alongside the Baseline spec. End users and other tools (Privateer, Darnit, Minder) can consume the same mapping. Scorecard maintainers retain CODEOWNERS authority.
-
-#### EK-2: Output format — no "OSPS output format"
-
-> "There is not an 'OSPS output format,' and even the relevant Gemara schemas (which are quite opinionated) are still designed to support output in multiple output formats within the SDK, such as SARIF. I would expect that you'd keep your current output logic, and then _maybe_ add basic Gemara json/yaml as another option."
-
-The current proposal defines `--format=osps` as a new output format. Eddie clarifies that the ORBIT ecosystem does not define a special "OSPS output format" — instead, the Gemara SDK supports multiple output formats (including SARIF). The suggestion is to keep Scorecard's existing output logic and optionally add Gemara JSON/YAML as another format option.
-
-This is a significant clarification that affects the spec's output requirements, the Phase 1 deliverables, and how we frame the conformance layer.
-
-#### EK-3: Technical relationship with Privateer plugin
-
-> "There is a stated goal of not duplicating the code from the plugin ossf/pvtr-github-repo-scanner, but the implementation plan as it's currently written does require duplication. In the current proposal, there would not be a technical relationship between the two codebases."
-
-Eddie identifies a contradiction: the proposal says "do not duplicate Privateer" but proposes building a parallel conformance engine with no code-level relationship to the Privateer plugin. The current plan would result in two separate codebases evaluating the same OSPS controls independently.
-
-#### EK-4: Catalog extraction needs an implementation plan
-
-> "There is cursory mention of a scorecard _catalog extraction_, which I'm hugely in favor of, but I don't see an implementation plan for that."
-
-The proposal mentions "Scorecard control catalog extraction plan" as a Phase 1 deliverable but does not specify what this means concretely or how it would be achieved.
-
-#### EK-5: Alternative architecture — shared plugin model
-
-> "An alternative plan would be for us to spend a week consolidating checks/probes into the pvtr plugin (with relevant CODEOWNERS), then update Scorecard to selectively execute the plugin under the covers."
-
-Eddie proposes a fundamentally different architecture:
-
-1. Consolidate Scorecard checks/probes into the [Privateer plugin](https://github.com/ossf/pvtr-github-repo-scanner) as shared evaluation logic
-2. Scorecard executes the plugin under the covers for Baseline evaluation and then Scorecard handles follow-up logic such as scoring and storing the results
-3. Privateer and LFX Insights can optionally run Scorecard checks via the same plugin
-
-**Claimed benefits:**
-- Extract the Scorecard control catalog for independent versioning and cross-catalog mapping to Baseline
-- Instantly integrate Gemara into Scorecard
-- Allow bidirectional check execution (Scorecard runs Privateer checks; Privateer runs Scorecard checks)
-- Simplify contribution overhead for individual checks
-- Improve both codebases through shared logic
-
-**This is the central architectural decision for the proposal.** The Steering Committee needs to evaluate this against the current plan (Scorecard builds its own conformance engine).
-
-### Adolfo García Veytia's feedback (AMPEL maintainer)
-
-The following feedback was provided by Adolfo García Veytia (@puerco, maintainer of [AMPEL](https://github.com/carabiner-dev/ampel)) on [PR #4952](https://github.com/ossf/scorecard/pull/4952).
-
-#### AP-1: Mapping file registry — single source preferred
-
-> "It's great that you also see the need for machine-readable data. This would help projects like AMPEL write policies that enforce the baseline controls based on the results from Scorecard and other analysis tools."
->
-> "Initially, we were trying to build the mappings into baseline itself. I still think it's the way to go as it would be better to have a single registry and data format of those mappings (in this case baseline's). Unfortunately, the way baseline considers its mappings [was demoted](https://github.com/ossf/security-baseline/pull/476) so we don't have that registry anymore."
-
-Adolfo strongly supports machine-readable mapping data and prefers a single registry in the Baseline itself, though the Baseline's own mapping support was recently demoted (PR #476 in security-baseline). This aligns with Eddie's offer (EK-1) to host mappings in the Baseline repo, but adds the context that there is no longer an official registry for tool-to-control mappings.
-
-AMPEL already maintains its own [Scorecard-to-Baseline policy mappings](https://github.com/carabiner-dev/policies/tree/main/groups/osps-baseline) (36 OSPS control policies, 5 of which directly consume Scorecard probe results). An official upstream mapping from Scorecard would benefit the entire ecosystem.
-
-#### AP-2: Output format — use in-toto predicates, not a custom format
-
-> "As others have mentioned, there is no _OSPS output format_ but there are two formal/in process of formalizing in-toto predicate types that are useful for this:
->
-> **[Simple Verification Results](https://github.com/in-toto/attestation/blob/main/spec/predicates/svr.md)** — a simple predicate that communicates just the verified control labels along with the tool that performed the evaluation. It is a generalization of the VSA for non-SLSA controls.
->
-> **[The "Baseline" Predicate](https://github.com/in-toto/attestation/pull/502)** — Still not merged, this predicate type was proposed by some of the baseline maintainers to capture an interoperability format more in line with the requirements in this spec, including manual assessments (what is named in this PR as 'ATTESTED')."
-
-Adolfo identifies two concrete in-toto predicate types that Scorecard should consider for output instead of inventing a custom format:
-
-1. **Simple Verification Results (SVR)**: Already merged in the in-toto attestation spec. Communicates verified control labels and the evaluating tool. Generalizes SLSA VSA to non-SLSA controls.
-2. **Baseline Predicate**: Proposed by Baseline maintainers (PR #502, not yet merged). Designed for interoperability and includes support for manual assessments (ATTESTED status).
-
-This is the most concrete guidance on output format so far and directly informs CQ-18.
-
-#### AP-3: Attestation question conflates identity and tooling
-
-> "The question here is conflating two domains. One question is _who_ signs the attestation, and how can those identities be trusted (identity). The other is _what_ (tool) generates the attestations, and more importantly, from scorecard's perspective, when. This hints at a policy implementation and the answers will most likely differ for projects and controls. Happy to chat about this one day."
-
-Adolfo clarifies that OQ-1 (attestation mechanism identity) is actually two separate questions:
-1. **Identity**: Who signs the attestation, and how are those identities trusted? (Sigstore/OIDC, personal keys, etc.)
-2. **Tooling**: What tool generates the attestation, and when? (Scorecard during scan, CI pipeline, manual process)
-
-The answers will differ per project and per control. This decomposition should inform how OQ-1 is resolved.
-
-#### AP-4: AMPEL already consumes Scorecard data for Baseline enforcement
-
-> "I agree with this role statement. Just as minder, ampel also can enforce Scorecard's data ([see an example here](https://github.com/carabiner-dev/policies/blob/ab1eb42ef179c7a0016d6b7ed72991774a48f151/scorecard/sast.json#L4)) and we also [maintain a mapping of some of scorecard's probes vs baseline controls](https://github.com/carabiner-dev/policies/blob/ab1eb42ef179c7a0016d6b7ed72991774a48f151/groups/osps-baseline/osps-vm-06.hjson#L5) that would greatly benefit from an official/upstream map.
->
-> The probes can enrich the baseline ecosystem substantially and having the data accessible from other tools encourages other projects in the ecosystem to help maintain and improve them."
-
-AMPEL is an active consumer of Scorecard data today:
-- 5 production policies directly evaluate Scorecard probe results (SAST, binary artifacts, code review, dangerous workflows, token permissions)
-- 36 OSPS Baseline policy mappings, several of which reference Scorecard checks
-- An official upstream Scorecard-to-Baseline mapping would directly benefit AMPEL's policy library
-
-This validates the proposal's direction of making Scorecard's probe results and control mappings available to the broader ecosystem.
-
-### Mike Lieberman's feedback
-
-The following feedback was provided by Mike Lieberman (@mlieberman85) on [PR #4952](https://github.com/ossf/scorecard/pull/4952).
-
-#### ML-1: No "OSPS output format" exists
-
-> "What is OSPS output format?"
-> — on ROADMAP.md, Phase 1 deliverable
-
-Mike echoes Eddie's (EK-2) and Adolfo's (AP-2) point: there is no defined "OSPS output format." This is the third reviewer to flag this, confirming it needs to be reframed. The output format question (CQ-18) now has concrete alternatives: Gemara SDK formats (Eddie), in-toto SVR/Baseline predicates (Adolfo), or extending existing Scorecard formats.
-
----
-
-The following clarifying questions require Steering Committee decisions informed by reviewer feedback.
-
-#### CQ-17: Mapping file location — Scorecard repo or shared?
-
-**Stakeholders:** Stephen, Eddie Knight, OSPS Baseline maintainers
-
-Eddie offers to host the Baseline-to-Scorecard mapping in the Baseline repository with Scorecard maintainers as CODEOWNERS (EK-1). The current proposal places it in the Scorecard repo.
-
-Options:
-1. **Scorecard repo** (`pkg/osps/mappings/`): Scorecard owns the mapping entirely. Mapping is coupled to Scorecard releases and probe changes.
-2. **Baseline repo** (or shared location): Co-owned with ORBIT WG. Other tools can consume the same mapping. Scorecard maintainers retain CODEOWNERS authority over their portion.
-3. **Both**: Scorecard maintains a local mapping for runtime use; a shared mapping in the Baseline repo serves as the cross-tool reference. Keep them in sync.
-
-Which approach do you prefer?
-
-_Note that this question is negated if consolidating check logic within `pvtr-github-repo-scanner`, because then the mappings are managed within the control catalog in Gemara format._
-
-**Stephen's response:**
-
-
-#### CQ-18: Output format — `--format=osps` vs. ecosystem formats
-
-**Stakeholders:** Stephen, Spencer (OQ-4 constrains this), Eddie Knight, Adolfo García Veytia
-
-Three reviewers (Eddie, Adolfo, Mike) independently flagged that no "OSPS output format" exists. Eddie suggests Gemara SDK formats (EK-2). Adolfo identifies two concrete in-toto predicate types (AP-2): the [Simple Verification Results (SVR)](https://github.com/in-toto/attestation/blob/main/spec/predicates/svr.md) predicate (merged) and the [Baseline Predicate](https://github.com/in-toto/attestation/pull/502) (proposed, not yet merged).
-
-Options:
-1. **Keep `--format=osps`**: Define a Scorecard-specific conformance output format. Risk: inventing a format that three reviewers have said doesn't belong.
-2. **Use `--format=gemara`** (or similar): Integrate the Gemara SDK and output Gemara assessment results in JSON/YAML. Aligns with ORBIT ecosystem, creates a Gemara SDK dependency.
-3. **Use in-toto predicates**: Output conformance results as in-toto attestations using SVR or the Baseline predicate. Aligns with in-toto ecosystem and Adolfo's guidance. The Baseline predicate is not yet merged.
-4. **Extend existing formats**: Add conformance data to `--format=json` and `--format=sarif` outputs. No new format flag needed.
-5. **Combination**: Use Gemara SDK for structured output + in-toto predicates for attestation output. These are not mutually exclusive.
-
-Which approach do you prefer?
-
-**Stephen's response:**
-
-
-#### CQ-19: Architectural direction — build vs. integrate
-
-**Stakeholders:** Stephen, Spencer, Eddie Knight, Steering Committee, at least 1 non-Steering maintainer — this is the gating decision; most other open questions depend on its outcome
-
-This is the central decision. Eddie proposes consolidating Scorecard checks/probes into the Privateer plugin and having Scorecard execute the plugin (EK-5). The current proposal has Scorecard building its own conformance engine.
-
-**Option A: Scorecard builds its own conformance engine** (current proposal)
-- Scorecard adds a mapping file, conformance evaluation logic, and output format
-- No code-level dependency on Privateer
-- Scorecard controls its own release cadence and architecture
-- Risk: duplicates evaluation logic, no technical relationship with Privateer (EK-3)
-
-**Option B: Shared plugin model** (Eddie's alternative)
-- Scorecard checks/probes are consolidated into the Privateer plugin
-- Scorecard executes the plugin under the covers
-- Bidirectional: Privateer users can also run Scorecard checks e.g., LFX Insights
-- Gemara integration comes for free via the plugin
-- Risk: Scorecard releases are coupled to plugin's release cadence; CODEOWNERS in the second repo must be meticulously managed to avoid surprises; multi-platform support (GitLab, Azure DevOps, local) will require maintenance of independent plugins with isolated data collection for each platform
-
-**Option C: Hybrid**
-- Scorecard maintains its own probe execution (its core competency)
-- Scorecard exports its probe results in a format the Privateer plugin can consume (Gemara L5)
-- The Privateer plugin consumes Scorecard output as supplementary evidence
-- Control catalog is extracted and shared, but evaluation logic stays separate
-- Users will choose between the Privateer plugin and Scorecard for Baseline evaluations
-- No code-level coupling, but interoperable output
-
-Which option do you prefer? What are your concerns about taking a dependency on the Privateer plugin codebase?
-
-**Stephen's response:**
-
-
-#### CQ-20: Catalog extraction — what does it mean concretely?
-
-**Stakeholders:** Stephen, Eddie Knight, Steering Committee
-
-Eddie is "hugely in favor" of extracting the Scorecard control catalog (EK-4) but the proposal lacks an implementation plan. Concretely, this could mean:
-
-1. **Machine-readable probe definitions**: Export `probes/*/def.yml` as a versioned catalog (already exists in the repo, but not packaged for external consumption)
-2. **Gemara L2 control definitions**: Map Scorecard probes to Gemara Layer 2 schema entries, making them available in the Gemara catalog
-3. **Shared evaluation steps**: Extract Scorecard's probe logic into a reusable Go library or Privateer plugin steps that other tools can execute
-4. **API-level catalog**: Expose probe definitions via the Scorecard API so tools can discover what Scorecard can evaluate
-
-What level of extraction do you envision? Is option 2 (Gemara L2 integration) the right target, or should we start simpler?
-
-**Stephen's response:**
-
-
-#### CQ-21: Privateer code duplication — is it acceptable?
-
-**Stakeholders:** Stephen, Spencer, Eddie Knight, Steering Committee — flows from CQ-19
-
-Eddie points out that the current proposal would result in two codebases evaluating the same OSPS controls independently (EK-3). Even if the proposal says "don't duplicate Privateer," building a separate conformance engine effectively does that.
-
-Is some duplication acceptable if it means Scorecard retains architectural independence? Or is avoiding duplication a hard constraint that should drive us toward the shared plugin model (CQ-19 Option B)?
-
-**Stephen's response:**
-
-
-#### CQ-22: Attestation decomposition — identity vs. tooling
-
-**Stakeholders:** Stephen, Spencer, Adolfo García Veytia, Eddie Knight
-
-Adolfo points out that OQ-1 (attestation mechanism identity) conflates two questions (AP-3):
-
-1. **Identity**: Who signs the attestation, and how are those identities trusted? (Sigstore/OIDC, personal keys, platform-native)
-2. **Tooling**: What tool generates the attestation, and when? (Scorecard during scan, CI pipeline post-scan, manual maintainer process)
-
-The answers will differ per project and per control. Should OQ-1 be decomposed into these two sub-questions, and should the design allow different identity/tooling combinations per control?
-
-Adolfo has offered to discuss this in depth.
-
-**Stephen's response:**
-
-
-#### CQ-23: Mapping registry — where should the canonical mapping live?
-
-**Stakeholders:** Stephen, Eddie Knight, Adolfo García Veytia, Baseline maintainers
-
-Three perspectives have emerged on where Scorecard-to-Baseline mappings should live:
-
-- **Eddie (EK-1)**: Host in the Baseline repo with Scorecard maintainers as CODEOWNERS
-- **Adolfo (AP-1)**: Prefers a single registry in the Baseline itself, but notes the Baseline's mapping support was [demoted](https://github.com/ossf/security-baseline/pull/476)
-- **Current proposal**: Host in Scorecard repo (`pkg/osps/mappings/`)
-
-Additionally, AMPEL already maintains [independent Scorecard-to-Baseline mappings](https://github.com/carabiner-dev/policies/tree/main/groups/osps-baseline) in its policy library. An official upstream mapping would benefit both AMPEL and the wider ecosystem.
-
-This extends CQ-17 with Adolfo's context about the demoted Baseline registry. Should the Scorecard mapping effort also advocate for restoring a shared registry in the Baseline spec?
-
-**Stephen's response:**
-
-
-#### CQ-16: Allstar's role in OSPS conformance enforcement
-
-**Stakeholders:** Stephen (can answer alone)
-
-[Allstar](https://github.com/ossf/allstar) is a Scorecard sub-project that continuously monitors GitHub organizations and enforces Scorecard check results as policies (branch protection, binary artifacts, security policy, dangerous workflows). It already enforces a subset of controls aligned with OSPS Baseline.
-
-With OSPS conformance output, Allstar could potentially enforce Baseline conformance at the organization level — e.g., opening issues or auto-remediating when a repository falls below Level 1 conformance. Should the proposal explicitly include Allstar as a Phase 1 consumer of OSPS output, or should that be deferred? And more broadly, should Allstar be considered part of the "enforcement" boundary that Scorecard itself does not cross, even though it is a Scorecard sub-project?
-
-**Stephen's response:**
-
-
-### Decision priority analysis
-
-The open questions have dependencies between them. Answering them in the
-wrong order will result in rework. The recommended sequence follows.
-
-#### Tier 1 — Gating decisions (answer before all others)
-
-| Question | Why it gates | Who decides |
-|----------|-------------|-------------|
-| **CQ-19** | Architectural direction (build vs. integrate vs. hybrid). If answered as Option B (shared plugin), CQ-17, CQ-18, CQ-20, and CQ-21 are either resolved or fundamentally reframed. | Stephen + Spencer + Eddie Knight + Steering Committee |
-| **OQ-1** | Attestation identity model. Spencer flagged as blocking. Determines how non-automatable controls are handled across all phases. | Spencer + Stephen + Steering Committee |
-
-#### Tier 2 — Downstream of CQ-19 (answer once Tier 1 is resolved)
-
-| Question | Dependency | Who decides |
-|----------|-----------|-------------|
-| **CQ-18** | Output format depends on CQ-19's architectural direction. Three reviewers flagged "no OSPS output format." In-toto predicates (AP-2) and Gemara SDK are concrete alternatives. | Stephen + Spencer + Eddie Knight + Adolfo |
-| **CQ-17/CQ-23** | Mapping file location depends on CQ-19 (negated if Option B). Adolfo adds context: Baseline registry was demoted (AP-1). | Stephen + Eddie Knight + Adolfo + Baseline maintainers |
-| **CQ-22** | Attestation decomposition (identity vs. tooling). Refines OQ-1. | Stephen + Spencer + Adolfo + Eddie Knight |
-| **OQ-2** | Enforcement detection scope. Affects Phase 3 scope. | Spencer + Stephen + Steering Committee |
-
-#### Tier 3 — Important but non-blocking for Phase 1 start
-
-| Question | Notes | Who decides |
-|----------|-------|-------------|
-| **CQ-20** | Catalog extraction scope. Flows from CQ-19. | Stephen + Eddie Knight |
-| **CQ-21** | Code duplication tolerance. Flows from CQ-19. | Stephen + Spencer + Eddie Knight |
-| **CQ-13** | Minder/AMPEL integration surface. Affects ecosystem positioning. | Stephen + Minder maintainers + Adolfo |
-| **CQ-11** | Output stability guarantees. Depends on CQ-18. | Stephen + Spencer |
-
-#### Tier 4 — Stephen can answer alone (any time)
-
-| Question | Notes |
-|----------|-------|
-| **CQ-9** | NOT_OBSERVABLE controls — implementation detail, UNKNOWN-first principle already agreed. |
-| **CQ-12** | Probe gap priority ordering — coverage doc already proposes an order. |
-| **CQ-14** | Darnit vs. Minder delineation — ecosystem positioning Stephen can articulate. |
-| **CQ-15** | Existing issues as Phase 1 work items — backlog triage. |
-| **CQ-16** | Allstar's role — Scorecard sub-project under same Steering Committee. |
-
-#### Effectively resolved
-
-| Question | Resolution |
-|----------|-----------|
-| **OQ-3** | Drop `scan_scope` from the schema (Spencer's feedback). |
-| **OQ-4** | Evidence is probe-based only, not check-based (adopted). |
-| **CQ-10** | Partially superseded by CQ-17 (same topic with Eddie's context). |
-
-#### Recommended next steps
+## Feedback, decisions, and next steps
 
-1. **Schedule a discussion with Eddie Knight and Adolfo García Veytia** to resolve CQ-19. Bring Spencer. This is the fork in the road — everything downstream depends on it.
-2. **Resolve OQ-1/CQ-22** with Spencer, Adolfo, and the Steering Committee. Spencer flagged OQ-1 as blocking. Adolfo's decomposition (identity vs. tooling) clarifies what needs to be decided.
-3. **Answer the Tier 4 questions** at any time — they are independent and don't block others.
-4. **Once CQ-19 and OQ-1/CQ-22 are decided**, the Tier 2 and 3 questions can be resolved quickly since they flow from the architectural direction.
+All reviewer feedback, maintainer clarifying questions, and the decision
+priority analysis are tracked in [`decisions.md`](decisions.md).

From cc7c74eedc316a52a9f37cf9f97036f48ce76372 Mon Sep 17 00:00:00 2001
From: Stephen Augustus <foo@auggie.dev>
Date: Mon, 2 Mar 2026 04:53:25 -0500
Subject: [PATCH 19/28] :seedling: Revise proposal with evidence engine framing

Major proposal revision based on Steering Committee direction:
- Add mission statement and "evidence engine" identity framing
- Add four-step processing model and three-tier architecture
- Add six design principles (evidence is the product, probes
  normalize diversity, UNKNOWN-first honesty, all consumers are
  equal, no metadata monopolies, formats are presentation)
- Reframe Security Insights as metadata ingestion layer (one source
  among several)
- Add security-baseline dependency and two-layer mapping model
- Add OSCAL Assessment Results to Phase 1 output formats
- Flatten ecosystem positioning: all consumers are equal
- Strengthen CRA language with compliance disclaimer
- Use RFC 2119 SHOULD NOT for duplicate evaluation guidance
- Note source type taxonomy as future design concept
- Note full MVVSR as follow-up deliverable

Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Stephen Augustus <foo@auggie.dev>
---
 .../osps-baseline-conformance/proposal.md     | 239 ++++++++++++++----
 1 file changed, 187 insertions(+), 52 deletions(-)

diff --git a/openspec/changes/osps-baseline-conformance/proposal.md b/openspec/changes/osps-baseline-conformance/proposal.md
index a1e77ec84c0..a1e11e8389a 100644
--- a/openspec/changes/osps-baseline-conformance/proposal.md
+++ b/openspec/changes/osps-baseline-conformance/proposal.md
@@ -2,9 +2,23 @@
 
 ## Summary
 
-Add OSPS Baseline conformance evaluation to Scorecard, making it a credible tool for determining whether open source projects meet the security requirements defined by the Open Source Project Security (OSPS) Baseline specification. This is the central initiative for Scorecard's 2026 roadmap.
-
-This is fundamentally a **product-level shift**: Scorecard today answers "how well does this repo follow best practices?" (graded 0-10 heuristics). OSPS conformance requires answering "does this project meet these MUST statements at this maturity level?" (PASS/FAIL/UNKNOWN/NOT_APPLICABLE per control, with evidence). The two models coexist — existing checks and scores are unchanged — but the conformance layer is a new product surface.
+**Mission:** Scorecard produces trusted, structured security evidence for the
+open source ecosystem. _(Full MVVSR to be developed as a follow-up deliverable
+for Steering Committee review.)_
+
+Scorecard is an **open source security evidence engine**. It accepts diverse
+inputs about a project's security practices, normalizes them through probe-based
+analysis, and packages the resulting evidence in interoperable formats for
+downstream tools to act on. OSPS Baseline conformance is the first use case that
+proves this architecture, and the central initiative for Scorecard's 2026
+roadmap.
+
+This is fundamentally a **product-level shift**: Scorecard today answers "how
+well does this repo follow best practices?" (graded 0-10 heuristics). OSPS
+conformance requires answering "does this project meet these MUST statements at
+this maturity level?" (PASS/FAIL/UNKNOWN/NOT_APPLICABLE per control, with
+evidence). Check scores and conformance labels are parallel evaluation layers
+over the same probe evidence — existing checks and scores are unchanged.
 
 ## Motivation
 
@@ -16,7 +30,7 @@ This is fundamentally a **product-level shift**: Scorecard today answers "how we
 
 3. **ORBIT WG alignment.** Scorecard sits within the OpenSSF alongside the ORBIT WG. The ORBIT charter's mission is "to develop and maintain interoperable resources related to the identification and presentation of security-relevant data." Scorecard producing interoperable conformance results is a natural fit.
 
-4. **Regulatory pressure.** The EU Cyber Resilience Act (CRA) and similar regulatory frameworks increasingly expect evidence-based security posture documentation. OSPS Baseline conformance output positions Scorecard as a tool that produces CRA-relevant evidence artifacts.
+4. **Regulatory pressure.** The EU Cyber Resilience Act (CRA) and similar regulatory frameworks increasingly expect evidence-based security posture documentation. Scorecard produces structured evidence that downstream tools and processes may use when evaluating regulatory readiness. Scorecard does not itself guarantee CRA compliance or any other regulatory compliance.
 
 ### What Scorecard brings that others don't
 
@@ -47,21 +61,36 @@ Several tools operate in adjacent spaces. Understanding their capabilities clari
 
 ```mermaid
 flowchart LR
-    Scorecard["Scorecard<br/>(Measure)"] -->|checks| Allstar["Allstar<br/>(Enforce on GitHub)"]
-    Scorecard -->|findings| Minder["Minder<br/>(Enforce + Remediate)"]
-    Scorecard -->|attestations| AMPEL["AMPEL<br/>(Attestation-based<br/>policy enforcement)"]
-    Scorecard -->|findings| Darnit["Darnit<br/>(Audit + Remediate)"]
-    Darnit -->|compliance attestation| AMPEL
-    Scorecard -->|conformance evidence| Privateer["Privateer Plugin<br/>(Baseline evaluation)"]
+    Scorecard["Scorecard<br/>(Evidence Engine)"] -->|checks| Allstar["Allstar<br/>(Enforce on GitHub)"]
+    Scorecard -->|evidence| Privateer["Privateer<br/>(Baseline evaluation)"]
+    Scorecard -->|evidence| Minder["Minder<br/>(Enforce + Remediate)"]
+    Scorecard -->|evidence| AMPEL["AMPEL<br/>(Attestation-based<br/>policy enforcement)"]
+    Scorecard -->|evidence| Darnit["Darnit<br/>(Audit + Remediate)"]
+    Darnit -->|attestation| AMPEL
 ```
 
-Scorecard is the **data source** (measures repository security). [Allstar](https://github.com/ossf/allstar) is a Scorecard sub-project that continuously monitors GitHub organizations and enforces Scorecard check results as policies (opening issues or auto-remediating settings). [Minder](https://github.com/mindersec/minder) consumes Scorecard findings to enforce policies and auto-remediate across repositories. [AMPEL](https://github.com/carabiner-dev/ampel) validates Scorecard attestations against policies and gates CI/CD pipelines — it already maintains [production policies consuming Scorecard probe results](https://github.com/carabiner-dev/policies/tree/main/scorecard) and [OSPS Baseline policy mappings](https://github.com/carabiner-dev/policies/tree/main/groups/osps-baseline). Darnit audits compliance and remediates. The Privateer plugin evaluates Baseline conformance. They are complementary, not competing.
-
-### What Scorecard must not do
-
-- **Duplicate the Privateer plugin's role.** The [Privateer plugin for GitHub repositories](https://github.com/ossf/pvtr-github-repo-scanner) is the Baseline evaluator in the ORBIT ecosystem. Scorecard should complement it with deeper analysis and interoperable output, not fork the evaluation model.
-- **Duplicate policy enforcement or remediation.** [Minder](https://github.com/mindersec/minder) (OpenSSF Sandbox project, ORBIT WG) consumes Scorecard findings and enforces security policies across repositories with auto-remediation. [AMPEL](https://github.com/carabiner-dev/ampel) (production v1.0.0) validates Scorecard attestations against policies and gates CI/CD pipelines — it already maintains [Scorecard-consuming policies](https://github.com/carabiner-dev/policies/tree/main/scorecard) and [OSPS Baseline mappings](https://github.com/carabiner-dev/policies/tree/main/groups/osps-baseline). Scorecard *produces* findings and attestations for Minder and AMPEL to consume.
-- **Duplicate compliance auditing.** Darnit handles compliance auditing and automated remediation (PR creation, file generation, AI-assisted fixes). Scorecard is read-only.
+Scorecard is the **evidence engine** (produces structured security evidence).
+All downstream tools consume Scorecard evidence on equal terms through published
+output formats. [Allstar](https://github.com/ossf/allstar) is a Scorecard
+sub-project that enforces Scorecard check results as policies.
+[Minder](https://github.com/mindersec/minder) enforces security policies across
+repositories. [AMPEL](https://github.com/carabiner-dev/ampel) validates
+attestations against policies in CI/CD pipelines — it already maintains
+[policies consuming Scorecard probe results](https://github.com/carabiner-dev/policies/tree/main/scorecard)
+and [OSPS Baseline policy mappings](https://github.com/carabiner-dev/policies/tree/main/groups/osps-baseline).
+[Darnit](https://github.com/kusari-oss/darnit) audits compliance and
+remediates. [Privateer](https://github.com/ossf/pvtr-github-repo-scanner)
+evaluates Baseline conformance. They are complementary, not competing.
+
+### What Scorecard SHOULD NOT do
+
+Scorecard SHOULD NOT (per [RFC 2119](https://www.rfc-editor.org/rfc/rfc2119))
+duplicate evaluation that downstream tools handle. There may be scenarios where
+overlapping evaluation makes sense (e.g., Scorecard brings deeper analysis or
+different evidence sources), but the default posture is complementarity.
+
+- **Duplicate policy enforcement or remediation.** Downstream tools — [Privateer](https://github.com/ossf/pvtr-github-repo-scanner), [Minder](https://github.com/mindersec/minder), [AMPEL](https://github.com/carabiner-dev/ampel), [Darnit](https://github.com/kusari-oss/darnit), and others — consume Scorecard evidence through published output formats. Scorecard *produces* findings and attestations; downstream tools enforce, remediate, and audit.
+- **Privilege any downstream consumer.** All tools consume Scorecard output on equal terms. No tool has a special integration relationship.
 - **Turn OSPS controls into Scorecard checks.** OSPS conformance is a layer that consumes existing Scorecard signals, not 59 new checks.
 
 ## Current state
@@ -74,7 +103,7 @@ A fresh analysis of Scorecard's current coverage against OSPS Baseline v2026.02.
 
 - **Checks** produce 0-10 scores — useful as signal but not conformance results
 - **Probes** produce structured boolean findings — the right granularity for control mapping
-- **Output formats** (JSON, SARIF, probe, in-toto) — OSPS output is a new format alongside these
+- **Output formats** (JSON, SARIF, probe, in-toto) — conformance evidence is delivered through these and new formats (Gemara, OSCAL)
 - **[Allstar](https://github.com/ossf/allstar)** (Scorecard sub-project) — continuously monitors GitHub organizations and enforces Scorecard checks as policies with auto-remediation. Allstar already enforces several controls aligned with OSPS Baseline (branch protection, security policy, binary artifacts, dangerous workflows). OSPS conformance output could enable Allstar to enforce Baseline conformance at the organization level.
 - **Multi-repo scanning** (`--repos`, `--org`) — needed for OSPS-QA-04.02 (subproject conformance)
 - **Serve mode** — HTTP surface for pipeline integration
@@ -88,26 +117,109 @@ only). Eddie Knight, Adolfo García Veytia, and Mike Lieberman provided
 ORBIT WG feedback on output formats, mapping file ownership, and architectural
 direction.
 
-**The central open question is CQ-19: should Scorecard build its own
-conformance engine (current proposal), adopt a shared plugin model with
-Privateer, or take a hybrid approach?** This decision gates most other
-open questions.
+The central architectural question (CQ-19) has been resolved: Scorecard takes
+the hybrid approach (Option C), designed so that scaling back to Option A remains
+straightforward if needed. Scorecard owns all probe execution and conformance
+evaluation logic. Interoperability is purely at the output layer. See the
+Architecture section below and [`decisions.md`](decisions.md) for details.
 
 For the full list of questions, reviewer feedback, maintainer responses, and
 decision priority analysis, see [`decisions.md`](decisions.md).
 
+## Architecture
+
+### Processing model
+
+Scorecard's processing model has four steps:
+
+1. **Ingest** — Accept diverse signals about a project (repository APIs,
+   metadata files, platform signals, external services)
+2. **Analyze** — Normalize signals through probes that understand multiple
+   ways to satisfy the same outcome
+3. **Evaluate** — Produce parallel assessments: check scores (0-10) and
+   conformance labels (PASS/FAIL/UNKNOWN)
+4. **Deliver** — Package evidence in interoperable formats (JSON, in-toto,
+   Gemara, SARIF, OSCAL) for downstream consumption
+
+### Three-tier evaluation model
+
+```
+Evidence layer:    Probe findings (atomic boolean measurements)
+                       |
+Evaluation layers: Check scoring (0-10, existing)
+                   Conformance evaluation (PASS/FAIL/UNKNOWN, new)
+                       |
+Output formats:    JSON, in-toto, Gemara, SARIF, OSCAL, probe, default
+```
+
+Check scores and conformance labels are *parallel interpretations* of the same
+probe evidence, not competing modes. Both can appear in the same output.
+
+### Architectural constraints
+
+1. Scorecard owns all probe execution (non-negotiable core competency)
+2. Scorecard owns its own conformance evaluation logic (mapping, PASS/FAIL,
+   applicability engine all live in Scorecard)
+3. Interoperability is purely at the output layer — Gemara, in-toto, SARIF,
+   OSCAL are presentation formats, not architectural dependencies
+4. Evaluation logic is self-contained — Scorecard can produce conformance
+   results using its own probes and mappings, independent of external
+   evaluation engines
+
+**Dependency guidance:** Only adopt reasonably stable dependencies when needed.
+The [security-baseline](https://github.com/ossf/security-baseline) repo is an
+acceptable data dependency for control definitions (see Scope).
+
+**Flexibility:** Under this structure, scaling back to a fully independent model
+(Option A) remains straightforward — deprioritize or drop specific output
+formatters without affecting the evaluation layer.
+
+### Design principles
+
+1. **Evidence is the product.** Scorecard's core output is structured,
+   normalized probe findings. Check scores and conformance labels are parallel
+   evaluation layers over the same evidence.
+2. **Probes normalize diversity.** Each probe understands multiple ways a
+   control outcome can be satisfied. A source type taxonomy (file-based,
+   API-based, metadata-based, external-service, convention-based) guides probe
+   design.
+3. **UNKNOWN-first honesty.** If Scorecard cannot observe a control, the
+   status is UNKNOWN with an explanation — never a false PASS or FAIL.
+4. **All consumers are equal.** Downstream tools — Privateer, AMPEL, Minder,
+   Darnit, and others — consume Scorecard evidence through published output
+   formats.
+5. **No metadata monopolies.** Probes may evaluate multiple sources for the
+   same data. No single metadata file is required for meaningful results,
+   though they may enrich results.
+6. **Formats are presentation.** Output formats (JSON, in-toto, Gemara, SARIF,
+   OSCAL) are views over the evidence model, optimized for different consumer
+   types. No single format is privileged.
+
+The following are implementation constraints (not top-level principles):
+**Additive, not breaking** — existing checks, probes, scores, and output formats
+do not change behavior. **Data-driven mapping** — the mapping between OSPS
+controls and Scorecard probes is a versioned data file, not hard-coded logic.
+
 ## Scope
 
 ### In scope
 
 1. **OSPS conformance engine** — new package that maps controls to Scorecard probes, evaluates per-control status, handles applicability
-2. **OSPS output format** — `--format=osps` producing a JSON conformance report
-3. **Versioned mapping file** — data-driven YAML mapping OSPS control IDs to Scorecard probes, applicability rules, and evaluation logic
-4. **Applicability engine** — detects preconditions (e.g., "has made a release") and outputs NOT_APPLICABLE
-5. **Security Insights ingestion** — reads `security-insights.yml` to satisfy metadata-dependent controls, aligning with the ORBIT ecosystem data plane; provides degraded-but-useful evaluation when absent
-6. **Attestation mechanism (v1)** — accepts repo-local metadata for non-automatable controls (pending OQ-1 resolution)
-7. **Scorecard control catalog extraction** — plan and mechanism to make Scorecard's control definitions consumable by other tools
-8. **New probes and probe enhancements** for gap controls:
+2. **Evidence model and output formats** — the evidence model is the core deliverable; output formats are presentation layers over it:
+   - Enriched JSON (Scorecard-native, no external dependency)
+   - In-toto predicates (SVR first; track [Baseline Predicate PR #502](https://github.com/in-toto/attestation/pull/502))
+   - Gemara output (transitive dependency via security-baseline)
+   - OSCAL Assessment Results (using [go-oscal](https://github.com/defenseunicorns/go-oscal))
+   - Existing Scorecard predicate type (`scorecard.dev/result/v0.1`) preserved; new predicate types added as options
+3. **Two-layer mapping model** — data-driven mappings at two levels:
+   - *Upstream* ([security-baseline](https://github.com/ossf/security-baseline) repo): Check-level relations — "OSPS-AC-03 relates to Scorecard's Branch-Protection check." Scorecard maintainers contribute via PR. Uses "informs" / "provides evidence toward" language (not "satisfies" / "demonstrates compliance with" — see [security-baseline PR #476](https://github.com/ossf/security-baseline/pull/476)).
+   - *Internal* (Scorecard repo): Probe-level mappings — "OSPS-AC-03.01 is evaluated by probes X + Y with logic Z." Depends on probe implementation details.
+4. **security-baseline dependency** — `github.com/ossf/security-baseline` as a data dependency for control definitions, Gemara types, and OSCAL catalog models
+5. **Applicability engine** — detects preconditions (e.g., "has made a release") and outputs NOT_APPLICABLE
+6. **Metadata ingestion layer** — supports Security Insights as one source among several for metadata-dependent controls (OSPS-BR-03.01, BR-03.02, QA-04.01). Architecture invites contributions for alternative sources (SBOMs, VEX, platform APIs). No single metadata file is required for meaningful results.
+7. **Attestation mechanism (v1)** — accepts repo-local metadata for non-automatable controls (pending OQ-1 resolution)
+8. **Scorecard control catalog extraction** — plan and mechanism to make Scorecard's control definitions consumable by other tools
+9. **New probes and probe enhancements** for gap controls:
    - Secrets detection (OSPS-BR-07.01)
    - Governance/docs presence (OSPS-GV-02.01, GV-03.01, DO-01.01, DO-02.01)
    - Dependency manifest presence (OSPS-QA-02.01)
@@ -115,9 +227,19 @@ decision priority analysis, see [`decisions.md`](decisions.md).
    - Release asset inspection (multiple L2/L3 controls)
    - Signed manifest support (OSPS-BR-06.01)
    - Enforcement detection (OSPS-VM-05.*, VM-06.* — pending OQ-2 resolution)
-9. **CI gating** — `--fail-on=fail` exit code for pipeline integration
-10. **Multi-repo project-level conformance** (OSPS-QA-04.02)
-11. **Gemara SDK integration** — output structurally compatible with ORBIT assessment result schemas; invest in Gemara SDK for multi-tool consumption
+10. **CI gating** — `--fail-on=fail` exit code for pipeline integration
+11. **Multi-repo project-level conformance** (OSPS-QA-04.02)
+
+### Future design concepts
+
+The following concepts are stated as design direction but deferred for detailed
+design:
+
+- **Source type taxonomy** — Probes could be designed with a source type
+  taxonomy (file-based, API-based, metadata-based, external-service,
+  convention-based) that guides probe design and helps contributors understand
+  where to add new detection paths. The probe interface should be designed to
+  accept multiple sources from the start, with the option to add sources later.
 
 ### Out of scope
 
@@ -134,8 +256,14 @@ Phases are ordered by outcome, not calendar quarter. Maintainer bandwidth dictat
 
 **Outcome:** Scorecard produces a useful OSPS Baseline Level 1 conformance report for any public GitHub repository, available across CLI, Action, and API surfaces.
 
-- OSPS output format with `--format=osps`
-- Versioned mapping file for OSPS Baseline v2026.02.19
+- Evidence model and output formats:
+  - Enriched JSON (Scorecard-native)
+  - In-toto predicates ([SVR](https://github.com/in-toto/attestation/blob/main/spec/predicates/svr.md); track [Baseline Predicate PR #502](https://github.com/in-toto/attestation/pull/502))
+  - Gemara output (transitive via [security-baseline](https://github.com/ossf/security-baseline) dependency)
+  - OSCAL Assessment Results (via [go-oscal](https://github.com/defenseunicorns/go-oscal); complements security-baseline's OSCAL Catalog export)
+- Two-layer mapping model for OSPS Baseline v2026.02.19:
+  - Check-level relations contributed upstream to security-baseline
+  - Probe-level mappings maintained in Scorecard
 - Applicability engine (detect "has made a release" and other preconditions)
 - Map existing probes to OSPS controls where coverage exists today
 - New probes for Level 1 gaps (prioritized by coverage impact):
@@ -143,9 +271,8 @@ Phases are ordered by outcome, not calendar quarter. Maintainer bandwidth dictat
   - Dependency manifest presence (QA-02.01)
   - Security policy deepening (VM-02.01, VM-03.01, VM-01.01)
   - Secrets detection (BR-07.01) — consume platform signals (e.g., GitHub secret scanning API) where possible
-- Security Insights ingestion v1 (BR-03.01, BR-03.02, QA-04.01) with degraded-but-useful evaluation when absent
+- Metadata ingestion layer v1 — Security Insights as first supported source (BR-03.01, BR-03.02, QA-04.01); architecture supports additional metadata sources
 - CI gating: `--fail-on=fail` + coverage summary
-- Design + document ORBIT interop commitments (Security Insights, Gemara compatibility, Privateer complementarity)
 - Scorecard control catalog extraction plan (enabling other tools to consume Scorecard's control definitions)
 
 ### Phase 2: Release integrity + Level 2 core
@@ -156,8 +283,8 @@ Phases are ordered by outcome, not calendar quarter. Maintainer bandwidth dictat
 - Signed manifest support (BR-06.01)
 - Release notes/changelog detection (BR-04.01)
 - Attestation mechanism v1 for non-automatable controls (pending OQ-1 resolution)
-- Evidence bundle output v1 (OSPS result JSON + in-toto statement + SARIF for failures)
-- Gemara SDK integration for interoperable output
+- Evidence bundle output v1 (conformance results + in-toto statement + SARIF for failures)
+- Additional metadata sources for the ingestion layer
 
 ### Phase 3: Enforcement detection + Level 3 + multi-repo
 
@@ -180,7 +307,7 @@ flowchart TD
         subgraph Evaluation["Evaluation"]
             Privateer["Privateer GitHub Plugin<br/>(LFX Insights driver)"]
             subgraph ScorecardEcosystem["Scorecard Ecosystem"]
-                Scorecard["OpenSSF Scorecard<br/>(deep analysis, conformance output,<br/>multi-platform, large install base)"]
+                Scorecard["OpenSSF Scorecard<br/>(evidence engine:<br/>deep analysis, multi-platform)"]
                 Allstar["Allstar<br/>(GitHub policy enforcement,<br/>Scorecard sub-project)"]
             end
         end
@@ -203,25 +330,33 @@ flowchart TD
     SI -->|provides metadata| Scorecard
     SI -->|provides metadata| Minder
     Scorecard -->|checks| Allstar
-    Scorecard -->|conformance evidence| Privateer
-    Scorecard -->|findings| Minder
-    Scorecard -->|attestations| AMPEL
-    Scorecard -->|findings| Darnit
-    Darnit -->|compliance attestation| AMPEL
+    Scorecard -->|evidence| Privateer
+    Scorecard -->|evidence| Minder
+    Scorecard -->|evidence| AMPEL
+    Scorecard -->|evidence| Darnit
+    Darnit -->|attestation| AMPEL
 ```
 
-**Scorecard's role**: Produce deep, probe-based conformance evidence that the Privateer plugin, Minder, AMPEL, and downstream consumers can use. Scorecard reads Security Insights (shared data plane), outputs interoperable results (shared schema), and fills analysis gaps where the Privateer plugin has `NotImplemented` steps.
+**Scorecard's role**: Produce deep, probe-based security evidence that
+downstream tools can consume through published output formats. Scorecard ingests
+diverse signals, normalizes them through probes, and delivers evidence in
+interoperable formats (JSON, in-toto, Gemara, SARIF, OSCAL).
+
+**All consumers are equal.** Privateer, AMPEL, Minder, Darnit, and future tools
+consume Scorecard evidence on the same terms through published output formats.
 
-**What Scorecard does NOT do**: Replace the Privateer plugin, enforce policies or remediate (Minder's and AMPEL's role), or perform compliance auditing and remediation (Darnit's role).
+**What Scorecard does NOT do**: Enforce policies or remediate (Minder's and
+AMPEL's role), perform compliance auditing and remediation (Darnit's role), or
+guarantee compliance with any regulatory framework.
 
 ## Success criteria
 
-1. `scorecard --format=osps --osps-level=1` produces a valid conformance report for any public GitHub repository
-2. OSPS Baseline Level 1 conformance is achieved (Phase 1 outcome)
-3. OSPS output is available across CLI, Action, and API surfaces
-4. OSPS output is consumable by the Privateer plugin, AMPEL, and Minder as supplementary evidence (validated with ORBIT WG)
-5. All four open questions (OQ-1 through OQ-4) are resolved with documented decisions
-6. No changes to existing check scores or behavior
+1. Scorecard produces a valid OSPS Baseline Level 1 conformance report for any public GitHub repository across CLI, Action, and API surfaces
+2. Evidence model supports multiple output formats (enriched JSON, in-toto, Gemara, OSCAL) — each validated with at least one downstream consumer
+3. Conformance evidence is consumable by any downstream tool through published output formats (validated with ORBIT WG)
+4. All open questions (OQ-1 through OQ-4) are resolved with documented decisions
+5. No changes to existing check scores or behavior
+6. Additive, not breaking: existing checks, probes, scores, and output formats do not change behavior
 
 ## Approval process
 

From 93325990e62ef01b640d839cca8f580e1fa482f9 Mon Sep 17 00:00:00 2001
From: Stephen Augustus <foo@auggie.dev>
Date: Mon, 2 Mar 2026 04:53:34 -0500
Subject: [PATCH 20/28] :seedling: Add Steering Committee responses to
 decisions.md

Add Stephen's responses to gating and downstream questions:
- CQ-19: Option C (hybrid), designed so that scaling back to
  Option A remains straightforward if needed
- CQ-18: Enriched JSON + in-toto + Gemara + OSCAL (no custom
  "OSPS format")
- CQ-17/CQ-23: Two-layer mapping model (security-baseline +
  Scorecard)
- CQ-13: All consumers equal, RFC 2119 SHOULD NOT duplicate
- CQ-21: Some duplication acceptable under Option C
- CQ-22: OQ-1 decomposed into identity vs. tooling per Adolfo
- CQ-1 update: parallel evaluation layers, not two modes
- CQ-3 update: Security Insights reframed as metadata ingestion
  layer
- Decision priority analysis updated with resolved status

Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Stephen Augustus <foo@auggie.dev>
---
 .../osps-baseline-conformance/decisions.md    | 94 ++++++++++++++-----
 1 file changed, 71 insertions(+), 23 deletions(-)

diff --git a/openspec/changes/osps-baseline-conformance/decisions.md b/openspec/changes/osps-baseline-conformance/decisions.md
index 18d54c43872..19fde884243 100644
--- a/openspec/changes/osps-baseline-conformance/decisions.md
+++ b/openspec/changes/osps-baseline-conformance/decisions.md
@@ -109,6 +109,8 @@ The proposal frames this as a "product-level shift" where Scorecard gains a seco
 
 I believe the scoring model will continue to be useful to consumers and it should be maintained. For now, both modes should coexist. There is no need to make a decision about this for the current iteration of the proposal.
 
+**Update:** Check scores and conformance labels are *parallel evaluation layers* over the same probe evidence, not two competing "modes." Both can appear in the same output. The three-tier architecture model (evidence layer → evaluation layers → output formats) replaces the original "two modes" framing. OSPS conformance is *one goal*, not *the* goal — Scorecard's broader identity is as an open source security evidence engine.
+
 #### CQ-2: OSPS Baseline version targeting
 
 The roadmap previously targeted OSPS Baseline v2025-10-10. The Privateer GitHub plugin targets v2025-02-25. The Baseline is a living spec with periodic releases. How should Scorecard handle version drift? Options:
@@ -139,6 +141,8 @@ This also raises a broader adoption question: most projects today don't have a `
 
 We should provide a degraded, but still-useful evaluation without a Security Insights file, especially since our probes today can already cover a lot of ground without it. It would be good for us to eventually support alternative metadata sources, but this should not be an immediate goal.
 
+**Update:** Reframed as a "metadata ingestion layer" that supports Security Insights as one source among several. SI is not privileged. Architecture invites contributions for alternative sources (SBOMs, VEX, platform APIs). No single metadata file is required for meaningful results, though metadata files may enrich results.
+
 #### CQ-4: PVTR relationship — complement vs. converge
 
 The proposal positions Scorecard as complementary to the Privateer plugin. But there's a deeper question: should this stay as two separate tools indefinitely, or is the long-term goal convergence (e.g., the Privateer plugin consuming Scorecard as a library, or Scorecard becoming a Privateer plugin itself)? Your position on this affects how tightly we couple the output formats and whether we invest in Gemara SDK integration.
@@ -269,6 +273,12 @@ Questions:
 
 **Stephen's response:**
 
+All downstream tools — Privateer, AMPEL, Minder, Darnit, and others — are equal consumers of Scorecard's output formats. The output formats should serve different tool types equally (policy, remediation, dashboarding).
+
+Scorecard SHOULD NOT (per [RFC 2119](https://www.rfc-editor.org/rfc/rfc2119)) duplicate evaluation that downstream tools handle, but this is not a MUST NOT. There could be scenarios where overlapping evaluation makes sense (e.g., Scorecard brings deeper analysis or different evidence sources).
+
+Coordinate with downstream tool maintainers during Phase 1 to validate that output formats are consumable.
+
 
 #### CQ-14: Darnit vs. Minder delineation
 
@@ -456,6 +466,16 @@ _Note that this question is negated if consolidating check logic within `pvtr-gi
 
 **Stephen's response:**
 
+**Decision: Option 3 (both) — two-layer mapping model.**
+
+- *Upstream* ([security-baseline](https://github.com/ossf/security-baseline) repo): Check-level relations — "OSPS-AC-03 relates to Scorecard's Branch-Protection check." Scorecard maintainers contribute via PR. The Baseline repo already has `guideline-mappings` referencing Scorecard in 9 controls (mapping to 7 checks). Scorecard can PR the missing ones.
+- *Internal* (Scorecard repo): Probe-level mappings — "OSPS-AC-03.01 is evaluated by probes X + Y with logic Z." These depend on probe implementation details and must live in Scorecard.
+
+**Language nuance** (per [security-baseline PR #476](https://github.com/ossf/security-baseline/pull/476)): Mappings were renamed to "relations" to guard against legal issues. Use "informs" / "provides evidence toward" rather than "satisfies" / "demonstrates compliance with."
+
+Taking a dependency on `github.com/ossf/security-baseline` is acceptable — it is a shared OpenSSF project with useful connectors.
+
+**Go module concern:** go.mod lives in cmd/ but module path is repo root. Import from cmd/pkg/ is unusual. Called out as potential concern, not blocking.
 
 #### CQ-18: Output format — `--format=osps` vs. ecosystem formats
 
@@ -474,6 +494,15 @@ Which approach do you prefer?
 
 **Stephen's response:**
 
+**Decision: Option 5 (combination) — the evidence model is the core deliverable; output formats are presentation layers.**
+
+Phase 1 ships:
+- **Enriched JSON** (Scorecard-native, no external dependency)
+- **In-toto predicates** — SVR first; track [Baseline Predicate PR #502](https://github.com/in-toto/attestation/pull/502). Multiple predicate types supported simultaneously. Existing Scorecard predicate type (`scorecard.dev/result/v0.1`) preserved for backwards compatibility.
+- **Gemara output** — dependency already transitive via `github.com/ossf/security-baseline` (gemara v0.7.0). The existing formatter pattern (`As<Format>()` methods) makes adding this straightforward.
+- **OSCAL Assessment Results** — using [go-oscal](https://github.com/defenseunicorns/go-oscal). The security-baseline repo already exports OSCAL Catalog format (control definitions) via go-oscal v0.6.3. Scorecard would produce OSCAL Assessment Results (findings per control for a given repo) — a complementary OSCAL model. AMPEL has native OSCAL support.
+
+There is no "OSPS output format" (confirming Eddie's, Adolfo's, and Mike's feedback). The `--format=osps` flag is replaced by the specific format flags above.
 
 #### CQ-19: Architectural direction — build vs. integrate
 
@@ -506,6 +535,18 @@ Which option do you prefer? What are your concerns about taking a dependency on
 
 **Stephen's response:**
 
+**Decision: Option C (hybrid), designed so that scaling back to Option A remains straightforward if needed.**
+
+The architecture must ensure:
+1. Scorecard owns all probe execution (non-negotiable core competency)
+2. Scorecard owns its own conformance evaluation logic (mapping, PASS/FAIL, applicability engine all live in Scorecard)
+3. Interoperability is purely at the output layer — Gemara, in-toto, SARIF, OSCAL are presentation formats, not architectural dependencies
+4. Evaluation logic is self-contained — Scorecard can produce conformance results using its own probes and mappings, independent of external evaluation engines
+
+**Dependency guidance:** Only adopt reasonably stable dependencies when needed. `github.com/ossf/security-baseline` is an acceptable data dependency for control definitions.
+
+**Flexibility:** Under this structure, scaling back to a fully independent model (Option A) remains straightforward — deprioritize or drop specific output formatters without affecting the evaluation layer.
+
 
 #### CQ-20: Catalog extraction — what does it mean concretely?
 
@@ -533,6 +574,8 @@ Is some duplication acceptable if it means Scorecard retains architectural indep
 
 **Stephen's response:**
 
+Resolved by CQ-19 decision. Option C (hybrid) accepts that some evaluation overlap may occur. Scorecard SHOULD NOT duplicate evaluation that downstream tools handle (RFC 2119 SHOULD NOT, not MUST NOT). Scorecard retains architectural independence — interoperability is at the output layer, not the evaluation layer.
+
 
 #### CQ-22: Attestation decomposition — identity vs. tooling
 
@@ -549,6 +592,8 @@ Adolfo has offered to discuss this in depth.
 
 **Stephen's response:**
 
+Acknowledged. OQ-1 should be decomposed into identity and tooling sub-questions as Adolfo suggests. The design should allow different identity/tooling combinations per control. Detailed resolution deferred to discussion with Adolfo and Spencer.
+
 
 #### CQ-23: Mapping registry — where should the canonical mapping live?
 
@@ -566,6 +611,8 @@ This extends CQ-17 with Adolfo's context about the demoted Baseline registry. Sh
 
 **Stephen's response:**
 
+Resolved by the two-layer mapping model (see CQ-17). Check-level relations are contributed upstream to `ossf/security-baseline` via PR, using the existing `guideline-mappings` structure. Probe-level mappings live in Scorecard. This approach works with the current state of the security-baseline repo without requiring restoration of the demoted mapping registry.
+
 
 ---
 
@@ -574,30 +621,30 @@ This extends CQ-17 with Adolfo's context about the demoted Baseline registry. Sh
 The open questions have dependencies between them. Answering them in the
 wrong order will result in rework. The recommended sequence follows.
 
-### Tier 1 — Gating decisions (answer before all others)
+### Tier 1 — Gating decisions
 
-| Question | Why it gates | Who decides |
-|----------|-------------|-------------|
-| **CQ-19** | Architectural direction (build vs. integrate vs. hybrid). If answered as Option B (shared plugin), CQ-17, CQ-18, CQ-20, and CQ-21 are either resolved or fundamentally reframed. | Stephen + Spencer + Eddie Knight + Steering Committee |
-| **OQ-1** | Attestation identity model. Spencer flagged as blocking. Determines how non-automatable controls are handled across all phases. | Spencer + Stephen + Steering Committee |
+| Question | Status | Resolution |
+|----------|--------|------------|
+| **CQ-19** | **RESOLVED** | Option C (hybrid), designed so that scaling back to Option A remains straightforward. Scorecard owns probe execution and evaluation; interoperability at output layer only. |
+| **OQ-1** | **OPEN** | Attestation identity model. Spencer flagged as blocking. CQ-22 decomposes into identity vs. tooling. |
 
-### Tier 2 — Downstream of CQ-19 (answer once Tier 1 is resolved)
+### Tier 2 — Downstream of CQ-19
 
-| Question | Dependency | Who decides |
-|----------|-----------|-------------|
-| **CQ-18** | Output format depends on CQ-19's architectural direction. Three reviewers flagged "no OSPS output format." In-toto predicates (AP-2) and Gemara SDK are concrete alternatives. | Stephen + Spencer + Eddie Knight + Adolfo |
-| **CQ-17/CQ-23** | Mapping file location depends on CQ-19 (negated if Option B). Adolfo adds context: Baseline registry was demoted (AP-1). | Stephen + Eddie Knight + Adolfo + Baseline maintainers |
-| **CQ-22** | Attestation decomposition (identity vs. tooling). Refines OQ-1. | Stephen + Spencer + Adolfo + Eddie Knight |
-| **OQ-2** | Enforcement detection scope. Affects Phase 3 scope. | Spencer + Stephen + Steering Committee |
+| Question | Status | Resolution |
+|----------|--------|------------|
+| **CQ-18** | **RESOLVED** | Enriched JSON + in-toto predicates + Gemara + OSCAL Assessment Results. No "OSPS output format." |
+| **CQ-17/CQ-23** | **RESOLVED** | Two-layer mapping model: check-level relations in security-baseline, probe-level mappings in Scorecard. |
+| **CQ-22** | **PARTIALLY RESOLVED** | OQ-1 decomposed into identity vs. tooling sub-questions (per Adolfo). Detailed resolution deferred to discussion with Adolfo and Spencer. |
+| **OQ-2** | **OPEN** | Enforcement detection scope. Affects Phase 3 scope. Needs Spencer + Stephen + Steering Committee. |
 
 ### Tier 3 — Important but non-blocking for Phase 1 start
 
-| Question | Notes | Who decides |
-|----------|-------|-------------|
-| **CQ-20** | Catalog extraction scope. Flows from CQ-19. | Stephen + Eddie Knight |
-| **CQ-21** | Code duplication tolerance. Flows from CQ-19. | Stephen + Spencer + Eddie Knight |
-| **CQ-13** | Minder/AMPEL integration surface. Affects ecosystem positioning. | Stephen + Minder maintainers + Adolfo |
-| **CQ-11** | Output stability guarantees. Depends on CQ-18. | Stephen + Spencer |
+| Question | Status | Notes |
+|----------|--------|-------|
+| **CQ-20** | **OPEN** | Catalog extraction scope. Flows from CQ-19 (now resolved). |
+| **CQ-21** | **RESOLVED** | Some duplication acceptable. RFC 2119 SHOULD NOT, not MUST NOT. |
+| **CQ-13** | **RESOLVED** | All consumers equal. RFC 2119 SHOULD NOT duplicate evaluation. |
+| **CQ-11** | **OPEN** | Output stability guarantees. CQ-18 now resolved; this can proceed. |
 
 ### Tier 4 — Stephen can answer alone (any time)
 
@@ -615,11 +662,12 @@ wrong order will result in rework. The recommended sequence follows.
 |----------|-----------|
 | **OQ-3** | Drop `scan_scope` from the schema (Spencer's feedback). |
 | **OQ-4** | Evidence is probe-based only, not check-based (adopted). |
-| **CQ-10** | Partially superseded by CQ-17 (same topic with Eddie's context). |
+| **CQ-10** | Superseded by CQ-17 (two-layer mapping model). |
 
 ### Recommended next steps
 
-1. **Schedule a discussion with Eddie Knight and Adolfo García Veytia** to resolve CQ-19. Bring Spencer. This is the fork in the road — everything downstream depends on it.
-2. **Resolve OQ-1/CQ-22** with Spencer, Adolfo, and the Steering Committee. Spencer flagged OQ-1 as blocking. Adolfo's decomposition (identity vs. tooling) clarifies what needs to be decided.
-3. **Answer the Tier 4 questions** at any time — they are independent and don't block others.
-4. **Once CQ-19 and OQ-1/CQ-22 are decided**, the Tier 2 and 3 questions can be resolved quickly since they flow from the architectural direction.
+1. **Resolve OQ-1/CQ-22** with Spencer, Adolfo, and the Steering Committee. Spencer flagged OQ-1 as blocking. Adolfo's decomposition (identity vs. tooling) clarifies what needs to be decided. Adolfo has offered to discuss.
+2. **Resolve CQ-20** (catalog extraction scope) — now unblocked by CQ-19 resolution.
+3. **Resolve CQ-11** (output stability guarantees) — now unblocked by CQ-18 resolution.
+4. **Answer the Tier 4 questions** at any time — they are independent and don't block others.
+5. **Begin Phase 1 implementation** — the gating architectural decisions (CQ-19, CQ-18, CQ-17) are resolved.

From b203a86de7a6722155ced9e5198a8fca7853f6b3 Mon Sep 17 00:00:00 2001
From: Stephen Augustus <foo@auggie.dev>
Date: Mon, 2 Mar 2026 04:53:42 -0500
Subject: [PATCH 21/28] :seedling: Update ROADMAP.md with evidence engine
 framing

Apply strategic direction to public-facing roadmap:
- Rebrand theme to "Open Source Security Evidence Engine"
- Add mission statement
- Replace design principles with six new principles
- Update Phase 1 deliverables: evidence model with multiple output
  formats (enriched JSON, in-toto, Gemara, OSCAL), two-layer
  mapping model, metadata ingestion layer
- Move Gemara from Phase 2 to Phase 1 (transitive dependency)
- Flatten ecosystem positioning: all consumers are equal
- Use RFC 2119 SHOULD NOT for duplicate evaluation
- Remove resolved evidence format open question

Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Stephen Augustus <foo@auggie.dev>
---
 docs/ROADMAP.md | 99 +++++++++++++++++++++++++++++--------------------
 1 file changed, 59 insertions(+), 40 deletions(-)

diff --git a/docs/ROADMAP.md b/docs/ROADMAP.md
index 11a2e87d5c6..f2f3117318e 100644
--- a/docs/ROADMAP.md
+++ b/docs/ROADMAP.md
@@ -2,18 +2,22 @@
 
 ## 2026
 
-### Theme: OSPS Baseline Conformance
+### Theme: Open Source Security Evidence Engine
+
+**Mission:** Scorecard produces trusted, structured security evidence for the
+open source ecosystem.
 
 Scorecard's primary initiative for 2026 is adding
-[OSPS Baseline](https://baseline.openssf.org/) conformance evaluation,
-enabling Scorecard to answer the question: _does this project meet the
-security requirements defined by the OSPS Baseline at a given maturity level?_
+[OSPS Baseline](https://baseline.openssf.org/) conformance evaluation as the
+first use case that proves this architecture. Scorecard accepts diverse inputs
+about a project's security practices, normalizes them through probe-based
+analysis, and packages the resulting evidence in interoperable formats for
+downstream tools to act on.
 
-This is a new product surface alongside Scorecard's existing 0-10 scoring
-model. Existing checks, probes, and scores are unchanged. The conformance
-layer consumes existing Scorecard signals and adds a per-control
-PASS/FAIL/UNKNOWN/NOT_APPLICABLE/ATTESTED output aligned with the
-[ORBIT WG](https://github.com/ossf/wg-orbit) ecosystem.
+Check scores (0-10) and conformance labels (PASS/FAIL/UNKNOWN) are parallel
+evaluation layers over the same probe evidence. Existing checks, probes, and
+scores are unchanged. The conformance layer is a new product surface aligned
+with the [ORBIT WG](https://github.com/ossf/wg-orbit) ecosystem.
 
 **Target Baseline version:** [v2026.02.19](https://baseline.openssf.org/versions/2026-02-19)
 
@@ -32,8 +36,16 @@ API surfaces.
 
 Deliverables:
 
-- OSPS output format (`--format=osps`)
-- Versioned mapping file (YAML) mapping OSPS controls to Scorecard probes
+- Evidence model and output formats:
+  - Enriched JSON (Scorecard-native)
+  - In-toto predicates (SVR; track Baseline Predicate)
+  - Gemara output (via [security-baseline](https://github.com/ossf/security-baseline)
+    dependency)
+  - OSCAL Assessment Results (via
+    [go-oscal](https://github.com/defenseunicorns/go-oscal))
+- Two-layer mapping model for OSPS Baseline v2026.02.19:
+  - Check-level relations contributed upstream to security-baseline
+  - Probe-level mappings maintained in Scorecard
 - Applicability engine detecting preconditions (e.g., "has made a release")
 - New probes for Level 1 gaps:
   - Governance and documentation presence (OSPS-GV-02.01, GV-03.01,
@@ -42,7 +54,9 @@ Deliverables:
   - Security policy deepening (OSPS-VM-02.01, VM-03.01)
   - Secrets detection (OSPS-BR-07.01) — consuming platform signals where
     available
-- Security Insights ingestion (OSPS-BR-03.01, BR-03.02, QA-04.01)
+- Metadata ingestion layer — Security Insights as first supported source
+  (OSPS-BR-03.01, BR-03.02, QA-04.01); architecture supports additional
+  metadata sources
 - CI gating via `--fail-on=fail`
 - Scorecard control catalog extraction plan
 
@@ -57,8 +71,8 @@ Deliverables:
 - Signed manifest support (OSPS-BR-06.01)
 - Release notes and changelog detection (OSPS-BR-04.01)
 - Attestation mechanism for non-automatable controls
-- Evidence bundle output (OSPS result JSON + in-toto statement)
-- Gemara SDK integration for interoperable output
+- Evidence bundle output (conformance results + in-toto statement)
+- Additional metadata sources for the ingestion layer
 
 #### Phase 3: Enforcement detection, Level 3, and multi-repo
 
@@ -74,49 +88,54 @@ Deliverables:
 
 ### Ecosystem alignment
 
-Scorecard operates within the ORBIT WG ecosystem as a measurement and
-evidence tool. [Allstar](https://github.com/ossf/allstar), a Scorecard
-sub-project, continuously monitors GitHub organizations and enforces
-Scorecard check results as policies. OSPS conformance output could enable
-Allstar to enforce Baseline conformance at the organization level.
+Scorecard operates within the ORBIT WG ecosystem as an evidence engine. All
+downstream tools consume Scorecard evidence on equal terms through published
+output formats.
+
+[Allstar](https://github.com/ossf/allstar), a Scorecard sub-project,
+continuously monitors GitHub organizations and enforces Scorecard check
+results as policies. OSPS conformance output could enable Allstar to enforce
+Baseline conformance at the organization level.
 
-Scorecard does not duplicate:
+Scorecard SHOULD NOT (per [RFC 2119](https://www.rfc-editor.org/rfc/rfc2119))
+duplicate evaluation that downstream tools handle:
 
+- **[Privateer](https://github.com/ossf/pvtr-github-repo-scanner)** — Baseline evaluation powered by Gemara and Security Insights
 - **[Minder](https://github.com/mindersec/minder)** — Policy enforcement and remediation platform (OpenSSF Sandbox, ORBIT WG)
-- **[Privateer plugin for GitHub repositories](https://github.com/ossf/pvtr-github-repo-scanner)** — Baseline evaluation powered by Gemara and Security Insights
-- **[Darnit](https://github.com/kusari-oss/darnit)** — Compliance audit and remediation
 - **[AMPEL](https://github.com/carabiner-dev/ampel)** — Attestation-based policy enforcement; already consumes Scorecard probe results via [policy library](https://github.com/carabiner-dev/policies/tree/main/scorecard)
+- **[Darnit](https://github.com/kusari-oss/darnit)** — Compliance audit and remediation
 
-Scorecard's role is to produce deep, probe-based conformance evidence that
-these tools and downstream consumers can use. Both Minder and AMPEL already
-consume Scorecard findings today — Minder to enforce security policies
-across repositories, and AMPEL to validate Scorecard attestations against
-[OSPS Baseline policies](https://github.com/carabiner-dev/policies/tree/main/groups/osps-baseline)
-in CI/CD pipelines.
+Scorecard's role is to produce deep, probe-based security evidence that these
+tools and downstream consumers can use through interoperable output formats
+(JSON, in-toto, Gemara, SARIF, OSCAL).
 
 ### Design principles
 
-1. **UNKNOWN-first honesty.** If Scorecard cannot observe a control, the
+1. **Evidence is the product.** Scorecard's core output is structured,
+   normalized probe findings. Check scores and conformance labels are parallel
+   evaluation layers over the same evidence.
+2. **Probes normalize diversity.** Each probe understands multiple ways a
+   control outcome can be satisfied.
+3. **UNKNOWN-first honesty.** If Scorecard cannot observe a control, the
    status is UNKNOWN with an explanation — never a false PASS or FAIL.
-2. **Probes are the evidence unit.** OSPS evidence references probes and
-   their findings, not check-level scores.
-3. **Additive, not breaking.** Existing checks, probes, scores, and output
-   formats do not change behavior.
-4. **Data-driven mapping.** The mapping between OSPS controls and Scorecard
-   probes is a versioned YAML file, not hard-coded logic.
-5. **Degraded-but-useful without Security Insights.** Projects without a
-   `security-insights.yml` still get a meaningful (if incomplete) report.
+4. **All consumers are equal.** Downstream tools consume Scorecard evidence
+   through published output formats.
+5. **No metadata monopolies.** Probes may evaluate multiple sources for the
+   same data. No single metadata file is required for meaningful results,
+   though they may enrich results.
+6. **Formats are presentation.** Output formats (JSON, in-toto, Gemara,
+   SARIF, OSCAL) are views over the evidence model. No single format is
+   privileged.
 
 ### Open questions
 
 The following design questions are under active discussion among maintainers:
 
 - **Attestation identity model** — How non-automatable controls are attested
-  (repo-local metadata vs. signed attestations via Sigstore/OIDC)
+  (repo-local metadata vs. signed attestations via Sigstore/OIDC). Decomposed
+  into identity (who signs) and tooling (what generates, when) sub-questions.
 - **Enforcement detection scope** — How Scorecard detects enforcement
   mechanisms without being an enforcement tool itself
-- **Evidence format** — Ensuring output compatibility with Gemara Layer 4
-  assessment schemas
 
 ### How to contribute
 

From 9905816487199e70db5a717e3541f9868373f166 Mon Sep 17 00:00:00 2001
From: Stephen Augustus <foo@auggie.dev>
Date: Mon, 2 Mar 2026 05:23:19 -0500
Subject: [PATCH 22/28] :seedling: Clean up ROADMAP and remove CI gating
 deliverable

Remove individual OSPS Baseline control references from ROADMAP.md
(covered in docs/osps-baseline-coverage.md). Add in-toto predicate
links and existing probe mapping deliverable. Remove --fail-on=fail
CI gating from all docs as an enforcement activity outside Scorecard's
scope.

Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Stephen Augustus <foo@auggie.dev>
---
 docs/ROADMAP.md                               | 30 +++++++++----------
 .../osps-baseline-conformance/decisions.md    |  2 +-
 .../osps-baseline-conformance/proposal.md     |  4 +--
 3 files changed, 16 insertions(+), 20 deletions(-)

diff --git a/docs/ROADMAP.md b/docs/ROADMAP.md
index f2f3117318e..df85c6d8a44 100644
--- a/docs/ROADMAP.md
+++ b/docs/ROADMAP.md
@@ -38,7 +38,8 @@ Deliverables:
 
 - Evidence model and output formats:
   - Enriched JSON (Scorecard-native)
-  - In-toto predicates (SVR; track Baseline Predicate)
+  - In-toto predicates ([SVR](https://github.com/in-toto/attestation/blob/main/spec/predicates/svr.md);
+    track [Baseline Predicate PR #502](https://github.com/in-toto/attestation/pull/502))
   - Gemara output (via [security-baseline](https://github.com/ossf/security-baseline)
     dependency)
   - OSCAL Assessment Results (via
@@ -47,17 +48,14 @@ Deliverables:
   - Check-level relations contributed upstream to security-baseline
   - Probe-level mappings maintained in Scorecard
 - Applicability engine detecting preconditions (e.g., "has made a release")
+- Map existing probes to OSPS controls where coverage exists today
 - New probes for Level 1 gaps:
-  - Governance and documentation presence (OSPS-GV-02.01, GV-03.01,
-    DO-01.01, DO-02.01)
-  - Dependency manifest presence (OSPS-QA-02.01)
-  - Security policy deepening (OSPS-VM-02.01, VM-03.01)
-  - Secrets detection (OSPS-BR-07.01) — consuming platform signals where
-    available
-- Metadata ingestion layer — Security Insights as first supported source
-  (OSPS-BR-03.01, BR-03.02, QA-04.01); architecture supports additional
-  metadata sources
-- CI gating via `--fail-on=fail`
+  - Governance and documentation presence
+  - Dependency manifest presence
+  - Security policy deepening
+  - Secrets detection — consuming platform signals where available
+- Metadata ingestion layer — Security Insights as first supported source;
+  architecture supports additional metadata sources
 - Scorecard control catalog extraction plan
 
 #### Phase 2: Release integrity and Level 2 core
@@ -68,8 +66,8 @@ core of Level 2 and becoming useful for downstream due diligence workflows.
 Deliverables:
 
 - Release asset inspection layer
-- Signed manifest support (OSPS-BR-06.01)
-- Release notes and changelog detection (OSPS-BR-04.01)
+- Signed manifest support
+- Release notes and changelog detection
 - Attestation mechanism for non-automatable controls
 - Evidence bundle output (conformance results + in-toto statement)
 - Additional metadata sources for the ingestion layer
@@ -81,9 +79,9 @@ and project-level aggregation.
 
 Deliverables:
 
-- SCA policy and enforcement detection (OSPS-VM-05.*)
-- SAST policy and enforcement detection (OSPS-VM-06.*)
-- Multi-repo project-level conformance aggregation (OSPS-QA-04.02)
+- SCA policy and enforcement detection
+- SAST policy and enforcement detection
+- Multi-repo project-level conformance aggregation
 - Attestation integration GA
 
 ### Ecosystem alignment
diff --git a/openspec/changes/osps-baseline-conformance/decisions.md b/openspec/changes/osps-baseline-conformance/decisions.md
index 19fde884243..e6b44a4c63a 100644
--- a/openspec/changes/osps-baseline-conformance/decisions.md
+++ b/openspec/changes/osps-baseline-conformance/decisions.md
@@ -41,7 +41,7 @@ This is a fundamental design question. Options include:
 
 This is a critical framing question. The roadmap proposes *detecting* whether enforcement exists (e.g., "are SAST results required to pass before merge?"), not *performing* enforcement. But the line between "detecting enforcement" and "being an enforcement tool" needs to be drawn clearly.
 
-**Recommendation to discuss**: Scorecard detects and reports whether enforcement mechanisms are in place. It does not itself enforce. The `--fail-on=fail` CI gating is a reporting exit code, not an enforcement action — the CI system is the enforcer. This distinction should be documented explicitly.
+**Recommendation to discuss**: Scorecard detects and reports whether enforcement mechanisms are in place. It does not itself enforce. This distinction should be documented explicitly.
 
 ### OQ-3: `scan_scope` field in output schema
 
diff --git a/openspec/changes/osps-baseline-conformance/proposal.md b/openspec/changes/osps-baseline-conformance/proposal.md
index a1e11e8389a..630302682a1 100644
--- a/openspec/changes/osps-baseline-conformance/proposal.md
+++ b/openspec/changes/osps-baseline-conformance/proposal.md
@@ -227,8 +227,7 @@ controls and Scorecard probes is a versioned data file, not hard-coded logic.
    - Release asset inspection (multiple L2/L3 controls)
    - Signed manifest support (OSPS-BR-06.01)
    - Enforcement detection (OSPS-VM-05.*, VM-06.* — pending OQ-2 resolution)
-10. **CI gating** — `--fail-on=fail` exit code for pipeline integration
-11. **Multi-repo project-level conformance** (OSPS-QA-04.02)
+10. **Multi-repo project-level conformance** (OSPS-QA-04.02)
 
 ### Future design concepts
 
@@ -272,7 +271,6 @@ Phases are ordered by outcome, not calendar quarter. Maintainer bandwidth dictat
   - Security policy deepening (VM-02.01, VM-03.01, VM-01.01)
   - Secrets detection (BR-07.01) — consume platform signals (e.g., GitHub secret scanning API) where possible
 - Metadata ingestion layer v1 — Security Insights as first supported source (BR-03.01, BR-03.02, QA-04.01); architecture supports additional metadata sources
-- CI gating: `--fail-on=fail` + coverage summary
 - Scorecard control catalog extraction plan (enabling other tools to consume Scorecard's control definitions)
 
 ### Phase 2: Release integrity + Level 2 core

From 5122d8760216917a99c436cfb5bcaa4282602df9 Mon Sep 17 00:00:00 2001
From: Stephen Augustus <foo@auggie.dev>
Date: Mon, 2 Mar 2026 05:34:37 -0500
Subject: [PATCH 23/28] :seedling: Fix diagrams in proposal.md

Convert three-tier evaluation model from ASCII to Mermaid to show
parallel fan-out from probes to evaluation layers. Fix ORBIT ecosystem
diagram: move Darnit into Enforcement & Audit subgraph, add missing
Baseline-to-Darnit arrow, differentiate AMPEL relationship (informs
policies vs defines controls), remove inaccurate SI-to-Minder arrow.

Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Stephen Augustus <foo@auggie.dev>
---
 .../osps-baseline-conformance/proposal.md     | 23 +++++++++----------
 1 file changed, 11 insertions(+), 12 deletions(-)

diff --git a/openspec/changes/osps-baseline-conformance/proposal.md b/openspec/changes/osps-baseline-conformance/proposal.md
index 630302682a1..90fc7a64791 100644
--- a/openspec/changes/osps-baseline-conformance/proposal.md
+++ b/openspec/changes/osps-baseline-conformance/proposal.md
@@ -143,13 +143,13 @@ Scorecard's processing model has four steps:
 
 ### Three-tier evaluation model
 
-```
-Evidence layer:    Probe findings (atomic boolean measurements)
-                       |
-Evaluation layers: Check scoring (0-10, existing)
-                   Conformance evaluation (PASS/FAIL/UNKNOWN, new)
-                       |
-Output formats:    JSON, in-toto, Gemara, SARIF, OSCAL, probe, default
+```mermaid
+flowchart TD
+    Probes["Probe findings<br/>(atomic boolean measurements)"]
+    Probes --> Checks["Check scoring<br/>(0-10, existing)"]
+    Probes --> Conformance["Conformance evaluation<br/>(PASS/FAIL/UNKNOWN, new)"]
+    Checks --> Formats["Output formats<br/>(JSON, in-toto, Gemara,<br/>SARIF, OSCAL, probe, default)"]
+    Conformance --> Formats
 ```
 
 Check scores and conformance labels are *parallel interpretations* of the same
@@ -310,23 +310,22 @@ flowchart TD
             end
         end
 
-        subgraph Enforcement["Policy Enforcement"]
+        subgraph Enforcement["Enforcement & Audit"]
             Minder["Minder<br/>(enforce + remediate)"]
             AMPEL["AMPEL<br/>(attestation-based<br/>policy enforcement)"]
+            Darnit["Darnit<br/>(audit + remediate)"]
         end
-
-        Darnit["Darnit<br/>(audit + remediate)"]
     end
 
     Baseline -->|defines controls| Privateer
     Baseline -->|defines controls| Scorecard
     Baseline -->|defines controls| Minder
-    Baseline -->|defines controls| AMPEL
+    Baseline -->|defines controls| Darnit
+    Baseline -->|informs policies| AMPEL
     Gemara -->|provides schemas| Privateer
     Gemara -->|provides schemas| Scorecard
     SI -->|provides metadata| Privateer
     SI -->|provides metadata| Scorecard
-    SI -->|provides metadata| Minder
     Scorecard -->|checks| Allstar
     Scorecard -->|evidence| Privateer
     Scorecard -->|evidence| Minder

From c5c167889a10cfd6a61e312d079b303b9bac68a6 Mon Sep 17 00:00:00 2001
From: Stephen Augustus <foo@auggie.dev>
Date: Tue, 3 Mar 2026 01:13:45 -0500
Subject: [PATCH 24/28] :seedling: Integrate AMPEL maintainer feedback (AP-5
 through AP-8)

Add framework-agnostic conformance language, probe composition model
(1:1 and many-to-1 mappings), bidirectional catalog framing, and
future design concepts (framework CLI option, probe-level predicate
type). Log feedback and responses in decisions.md.

Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Stephen Augustus <foo@auggie.dev>
---
 .../osps-baseline-conformance/decisions.md    | 44 +++++++++++++++++++
 .../osps-baseline-conformance/proposal.md     | 19 +++++++-
 2 files changed, 61 insertions(+), 2 deletions(-)

diff --git a/openspec/changes/osps-baseline-conformance/decisions.md b/openspec/changes/osps-baseline-conformance/decisions.md
index e6b44a4c63a..f8d0a8ac52d 100644
--- a/openspec/changes/osps-baseline-conformance/decisions.md
+++ b/openspec/changes/osps-baseline-conformance/decisions.md
@@ -420,6 +420,50 @@ AMPEL is an active consumer of Scorecard data today:
 
 This validates the proposal's direction of making Scorecard's probe results and control mappings available to the broader ecosystem.
 
+#### AP-5: Probe composition supports framework-agnostic evaluation
+
+> "This point is key. Baseline looks for outcomes. Compliance can be supported by Scorecard probe data.
+>
+> The baseline control can be a 1:1 map to a probe's data, other times it will be a composite set of probes. If you add new probes to look for something new that's useful to test a baseline control, we just need to add another composition definition to say _OSPS-XX-XXX can be [probe X] or [probe set 1] or [probe set 2]_.
+>
+> This is akin to the way checks work now, but by generalizing it, the probe data can inform other framework testing tools, beyond baseline."
+> — on proposal.md, "What Scorecard SHOULD NOT do" section
+
+Adolfo validates the probe composition model and identifies the key generalization: the same pattern used by existing checks (`probes/entries.go`) can be extended to OSPS Baseline and other frameworks. A control maps to one or more probe compositions, and new detection paths can be added without changing the composition structure.
+
+**Stephen's response:** Agreed. "Probe sets" or "compositions" is the right vocabulary, without introducing additional layers of complexity. The existing check composition pattern in `probes/entries.go` is the model.
+
+#### AP-6: Conformance engine should be framework-agnostic
+
+> "I'm assuming _conformance_ here means 'framework compliance'.
+>
+> This is cool, but also ensure that Scorecard's view of the world can be used at the check and probe level to enable projects and organizations to evaluate adherence to other frameworks. Especially useful for internal/unpublished variants (profiles) of frameworks that organizations define."
+> — on proposal.md, architectural constraints section
+
+Adolfo requests that the conformance engine not be hard-wired to OSPS Baseline. Organizations may want to evaluate against internal or unpublished framework variants (profiles).
+
+**Stephen's response:** Agreed — the conformance engine should support arbitrary frameworks and organizational profiles. The probe findings are framework-agnostic by design; OSPS Baseline is the first (non-"checks") evaluation layer over them. Made explicit in the proposal.
+
+#### AP-7: Predicate types for check and probe evaluations
+
+> "The current predicate type is the full scorecard run evaluation. For completeness' sake, it would be nice to have one type for a list of check evaluations and one for probe evaluations.
+>
+> These are only useful, though, if they have more data than what an SVR has to offer, so I would wait until there is an actual need for them."
+> — on proposal.md, output formats section
+
+Adolfo suggests dedicated in-toto predicate types for check-level and probe-level results, but self-qualifies that they should wait for concrete need beyond SVR.
+
+**Stephen's response:** Agreed. Probe-level findings are available via `--format=probe` but have no in-toto wrapper today. Worth adding when there's concrete need. This may suggest a `--framework` or `--evaluation` CLI option to select evaluation layers and determine output shape. Added as a future design concept.
+
+#### AP-8: Scorecard as consumer of control catalogs
+
+> "From reading the proposal, wouldn't Scorecard rather become a _consumer_ of control catalogs?"
+> — on proposal.md, Scorecard control catalog extraction plan
+
+Adolfo challenges the "catalog extraction" framing, suggesting Scorecard should position itself as a consumer of external control catalogs rather than a publisher of its own.
+
+**Stephen's response:** Both directions — Scorecard *consumes* the OSPS Baseline catalog (via security-baseline) for conformance evaluation, and Scorecard's own probe definitions (`probes/*/def.yml`) are already machine-readable YAML with structured metadata. The "extraction plan" is about packaging those existing definitions for consumption so that external tools like AMPEL can discover what Scorecard evaluates and compose mappings against it. Clarified in the proposal.
+
 ### Mike Lieberman's feedback
 
 The following feedback was provided by Mike Lieberman (@mlieberman85) on [PR #4952](https://github.com/ossf/scorecard/pull/4952).
diff --git a/openspec/changes/osps-baseline-conformance/proposal.md b/openspec/changes/osps-baseline-conformance/proposal.md
index 90fc7a64791..8e23768f01d 100644
--- a/openspec/changes/osps-baseline-conformance/proposal.md
+++ b/openspec/changes/osps-baseline-conformance/proposal.md
@@ -155,6 +155,12 @@ flowchart TD
 Check scores and conformance labels are *parallel interpretations* of the same
 probe evidence, not competing modes. Both can appear in the same output.
 
+The conformance evaluation layer is framework-agnostic by design. OSPS Baseline
+is the first use case, but the same probe evidence can be composed differently
+for other frameworks and organizational profiles. Probe findings carry no
+framework-specific semantics — only the mapping definitions (which probes
+compose into which control outcomes) are framework-specific.
+
 ### Architectural constraints
 
 1. Scorecard owns all probe execution (non-negotiable core competency)
@@ -213,12 +219,12 @@ controls and Scorecard probes is a versioned data file, not hard-coded logic.
    - Existing Scorecard predicate type (`scorecard.dev/result/v0.1`) preserved; new predicate types added as options
 3. **Two-layer mapping model** — data-driven mappings at two levels:
    - *Upstream* ([security-baseline](https://github.com/ossf/security-baseline) repo): Check-level relations — "OSPS-AC-03 relates to Scorecard's Branch-Protection check." Scorecard maintainers contribute via PR. Uses "informs" / "provides evidence toward" language (not "satisfies" / "demonstrates compliance with" — see [security-baseline PR #476](https://github.com/ossf/security-baseline/pull/476)).
-   - *Internal* (Scorecard repo): Probe-level mappings — "OSPS-AC-03.01 is evaluated by probes X + Y with logic Z." Depends on probe implementation details.
+   - *Internal* (Scorecard repo): Probe-level mappings — "OSPS-AC-03.01 is evaluated by probes X + Y with logic Z." Depends on probe implementation details. A control may map to a single probe (1:1) or a composition of probes with evaluation logic (many-to-1). This follows the same composition pattern used by [existing checks](https://github.com/ossf/scorecard/blob/main/probes/entries.go).
 4. **security-baseline dependency** — `github.com/ossf/security-baseline` as a data dependency for control definitions, Gemara types, and OSCAL catalog models
 5. **Applicability engine** — detects preconditions (e.g., "has made a release") and outputs NOT_APPLICABLE
 6. **Metadata ingestion layer** — supports Security Insights as one source among several for metadata-dependent controls (OSPS-BR-03.01, BR-03.02, QA-04.01). Architecture invites contributions for alternative sources (SBOMs, VEX, platform APIs). No single metadata file is required for meaningful results.
 7. **Attestation mechanism (v1)** — accepts repo-local metadata for non-automatable controls (pending OQ-1 resolution)
-8. **Scorecard control catalog extraction** — plan and mechanism to make Scorecard's control definitions consumable by other tools
+8. **Scorecard probe catalog** — Scorecard *consumes* external control catalogs (OSPS Baseline via security-baseline) for conformance evaluation. The catalog extraction plan packages Scorecard's own probe definitions (`probes/*/def.yml`) as a consumable artifact so external tools (e.g., AMPEL) can discover what Scorecard evaluates and compose their own mappings against it.
 9. **New probes and probe enhancements** for gap controls:
    - Secrets detection (OSPS-BR-07.01)
    - Governance/docs presence (OSPS-GV-02.01, GV-03.01, DO-01.01, DO-02.01)
@@ -239,6 +245,15 @@ design:
   convention-based) that guides probe design and helps contributors understand
   where to add new detection paths. The probe interface should be designed to
   accept multiple sources from the start, with the option to add sources later.
+- **Framework selection CLI option** — A `--framework` or `--evaluation`
+  option could let users select which evaluation layer(s) to run (checks,
+  OSPS Baseline, or a custom framework profile) and determine the output
+  shape (e.g., check-based vs. probe-based predicate type).
+- **Probe-level in-toto predicate type** — The existing
+  `scorecard.dev/result/v0.1` predicate wraps check-level results.
+  A dedicated probe-level predicate type could wrap flat probe findings for
+  framework evaluation tools. Worth adding when there is concrete need
+  beyond what SVR provides.
 
 ### Out of scope
 

From 0ae83ed4e70f5b0bb65ff915534458d01a4a789a Mon Sep 17 00:00:00 2001
From: Stephen Augustus <foo@auggie.dev>
Date: Thu, 5 Mar 2026 21:43:50 -0500
Subject: [PATCH 25/28] :seedling: Frame proposal as Scorecard v6; add user
 feedback

Add Scorecard v6 framing with "Why v6" section, single-run
architectural constraint, confidence scoring future concept, and
Scorecard user feedback section (FL-1 through FL-4) from community
meeting.

Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Stephen Augustus <foo@auggie.dev>
---
 docs/ROADMAP.md                               | 12 +++--
 .../osps-baseline-conformance/decisions.md    | 54 +++++++++++++++++++
 .../osps-baseline-conformance/proposal.md     | 41 +++++++++++---
 3 files changed, 96 insertions(+), 11 deletions(-)

diff --git a/docs/ROADMAP.md b/docs/ROADMAP.md
index df85c6d8a44..a515e8bc7d8 100644
--- a/docs/ROADMAP.md
+++ b/docs/ROADMAP.md
@@ -2,12 +2,13 @@
 
 ## 2026
 
-### Theme: Open Source Security Evidence Engine
+### Theme: Scorecard v6 — Open Source Security Evidence Engine
 
 **Mission:** Scorecard produces trusted, structured security evidence for the
 open source ecosystem.
 
-Scorecard's primary initiative for 2026 is adding
+Scorecard v6 evolves Scorecard from a scoring tool to an evidence engine. The
+primary initiative for 2026 is adding
 [OSPS Baseline](https://baseline.openssf.org/) conformance evaluation as the
 first use case that proves this architecture. Scorecard accepts diverse inputs
 about a project's security practices, normalizes them through probe-based
@@ -15,9 +16,10 @@ analysis, and packages the resulting evidence in interoperable formats for
 downstream tools to act on.
 
 Check scores (0-10) and conformance labels (PASS/FAIL/UNKNOWN) are parallel
-evaluation layers over the same probe evidence. Existing checks, probes, and
-scores are unchanged. The conformance layer is a new product surface aligned
-with the [ORBIT WG](https://github.com/ossf/wg-orbit) ecosystem.
+evaluation layers over the same probe evidence, produced in a single run.
+Existing checks, probes, and scores are unchanged — v6 is additive. The
+conformance layer is a new product surface aligned with the
+[ORBIT WG](https://github.com/ossf/wg-orbit) ecosystem.
 
 **Target Baseline version:** [v2026.02.19](https://baseline.openssf.org/versions/2026-02-19)
 
diff --git a/openspec/changes/osps-baseline-conformance/decisions.md b/openspec/changes/osps-baseline-conformance/decisions.md
index f8d0a8ac52d..0349048dac3 100644
--- a/openspec/changes/osps-baseline-conformance/decisions.md
+++ b/openspec/changes/osps-baseline-conformance/decisions.md
@@ -658,6 +658,60 @@ This extends CQ-17 with Adolfo's context about the demoted Baseline registry. Sh
 Resolved by the two-layer mapping model (see CQ-17). Check-level relations are contributed upstream to `ossf/security-baseline` via PR, using the existing `guideline-mappings` structure. Probe-level mappings live in Scorecard. This approach works with the current state of the security-baseline repo without requiring restoration of the demoted mapping registry.
 
 
+---
+
+## Scorecard user feedback
+
+### Felix Lange's feedback (Scorecard community meeting)
+
+The following feedback was provided by Felix Lange during the Scorecard
+community meeting on 2026-03-05.
+
+#### FL-1: Confidence scoring instead of binary UNKNOWN
+
+Felix suggested generalizing the UNKNOWN-first model into a confidence score
+that captures partial certainty, referencing [SAP Fosstars](https://sap.github.io/fosstars-rating-core/confidence.html).
+In the Fosstars model, a confidence score (0-10) accompanies every rating; if
+confidence falls below a threshold, the label becomes UNCLEAR regardless of the
+numeric score.
+
+**Stephen's response:** Interesting direction. The probe evidence model already
+provides the raw data for confidence derivation (each probe's outcome is
+independently observable). Added as a future design concept — formal confidence
+scoring may be added when consumer demand warrants it.
+
+#### FL-2: Single run for all output
+
+Output should allow consumers to obtain OSPS conformance evaluations and check
+details (like Maintained) without having to run Scorecard twice. The API
+(api.scorecard.dev) should also avoid requiring multiple requests.
+
+**Stephen's response:** Agreed. Added as architectural constraint #5 — a single
+Scorecard run produces both check scores and conformance results. This applies
+to CLI, Action, and API surfaces.
+
+#### FL-3: Existing checks should remain prominent
+
+Checks like Maintained help users identify abandoned projects and are valuable
+for risk assessment even when they don't map directly to OSPS controls. These
+should be preserved in a prominent manner.
+
+**Stephen's response:** Existing checks are fully preserved — check scores and
+conformance labels are parallel evaluation layers. All checks continue to
+produce scores as they do today, regardless of whether their probes map to OSPS
+controls. No check is elevated or deprioritized relative to others based on its
+OSPS Baseline coverage.
+
+#### FL-4: Simple for consumers to bring alternative frameworks
+
+It should be straightforward for consumers to evaluate against frameworks other
+than OSPS Baseline, including internal or unpublished variants.
+
+**Stephen's response:** Aligns with Adolfo's feedback (AP-6). The conformance
+engine is framework-agnostic by design — mapping definitions are the only
+framework-specific component. A `--framework` CLI option is noted as a future
+design concept.
+
 ---
 
 ## Decision priority analysis
diff --git a/openspec/changes/osps-baseline-conformance/proposal.md b/openspec/changes/osps-baseline-conformance/proposal.md
index 8e23768f01d..8fc9d11e2e7 100644
--- a/openspec/changes/osps-baseline-conformance/proposal.md
+++ b/openspec/changes/osps-baseline-conformance/proposal.md
@@ -13,12 +13,30 @@ downstream tools to act on. OSPS Baseline conformance is the first use case that
 proves this architecture, and the central initiative for Scorecard's 2026
 roadmap.
 
-This is fundamentally a **product-level shift**: Scorecard today answers "how
-well does this repo follow best practices?" (graded 0-10 heuristics). OSPS
-conformance requires answering "does this project meet these MUST statements at
-this maturity level?" (PASS/FAIL/UNKNOWN/NOT_APPLICABLE per control, with
-evidence). Check scores and conformance labels are parallel evaluation layers
-over the same probe evidence — existing checks and scores are unchanged.
+This is fundamentally a **product-level shift** — the defining change for
+**Scorecard v6**. Scorecard today answers "how well does this repo follow best
+practices?" (graded 0-10 heuristics). OSPS conformance requires answering "does
+this project meet these MUST statements at this maturity level?"
+(PASS/FAIL/UNKNOWN/NOT_APPLICABLE per control, with evidence). Check scores and
+conformance labels are parallel evaluation layers over the same probe evidence —
+existing checks and scores are unchanged.
+
+### Why v6
+
+Scorecard v6 represents a major evolution: from a scoring tool to an evidence
+engine. The key changes that warrant a major version:
+
+1. **New evaluation layer** — conformance labels (PASS/FAIL/UNKNOWN) alongside
+   existing check scores (0-10), produced in a single run
+2. **Framework-agnostic architecture** — probe evidence can be composed against
+   OSPS Baseline or other frameworks via pluggable mapping definitions
+3. **Interoperable output formats** — in-toto, Gemara, OSCAL Assessment Results
+   alongside existing JSON and SARIF
+4. **Probe catalog as public interface** — probe definitions become a consumable
+   artifact for external tools
+
+Existing checks, probes, scores, and output formats are preserved. v6 is
+additive — no breaking changes to existing surfaces.
 
 ## Motivation
 
@@ -171,6 +189,9 @@ compose into which control outcomes) are framework-specific.
 4. Evaluation logic is self-contained — Scorecard can produce conformance
    results using its own probes and mappings, independent of external
    evaluation engines
+5. A single Scorecard run produces both check scores and conformance results —
+   users MUST NOT need to run Scorecard twice or make separate API requests
+   to obtain both evaluation layers
 
 **Dependency guidance:** Only adopt reasonably stable dependencies when needed.
 The [security-baseline](https://github.com/ossf/security-baseline) repo is an
@@ -254,6 +275,14 @@ design:
   A dedicated probe-level predicate type could wrap flat probe findings for
   framework evaluation tools. Worth adding when there is concrete need
   beyond what SVR provides.
+- **Confidence scoring** — The current model produces binary conformance
+  labels (PASS/FAIL/UNKNOWN). A confidence score (inspired by
+  [Fosstars](https://sap.github.io/fosstars-rating-core/confidence.html))
+  could express partial certainty — e.g., "PASS with confidence 7/10" when
+  3 of 4 mapped probes returned findings. The probe evidence model already
+  provides the raw data for confidence derivation (each probe's outcome is
+  independently observable). A formal confidence score may be added when
+  consumer demand warrants it.
 
 ### Out of scope
 

From 3d3deb76d539326329c8f6780814e0ea4ca84a41 Mon Sep 17 00:00:00 2001
From: Stephen Augustus <foo@auggie.dev>
Date: Sat, 21 Mar 2026 20:05:07 +0100
Subject: [PATCH 26/28] :seedling: Integrate Spencer and Adam's PR feedback;
 refine predicate strategy

- Add Spencer's feedback section (SS-1 through SS-12) to decisions.md
- Add Adam's feedback section (AK-1 through AK-8) to decisions.md

proposal.md changes:
- Expand MVVSR acronym on first use
- Add "Why framework conformance in Scorecard?" section
- Define "downstream tools" with examples
- Add ORBIT WG context (not part of ORBIT, but ecosystem interop)
- Replace SVR with new scorecard.dev/evidence/v0.1 predicate
- Update to unified framework abstraction (drop "two-layer mapping")
- Connect Processing model (temporal) and Three-tier model (structural)
- Rename "Architectural constraints" to "Architectural target state"
- Remove "Option A" references; inline architectural description
- Clarify conformance layer includes both evaluation logic and formatting
- Clarify catalog extraction as in-project control framework
- Note that existing result/v0.1 predicate preserved (evidence/v0.1 is additive)
- Defer cron to Phase 2+ (CLI + Action in Phase 1)
- Add success criteria clarification (proposal acceptance = Phase 1 delivery)

ROADMAP.md changes:
- Add ORBIT WG context explanation
- Update to unified framework abstraction
- Update predicate references to evidence/v0.1
- Defer attestation mechanism to Phase 3
- Clarify enforcement boundary (detect not enforce)

Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Stephen Augustus <foo@auggie.dev>
---
 docs/ROADMAP.md                               |  23 +-
 .../osps-baseline-conformance/decisions.md    | 246 ++++++++++++++++++
 .../osps-baseline-conformance/proposal.md     |  90 +++++--
 3 files changed, 322 insertions(+), 37 deletions(-)

diff --git a/docs/ROADMAP.md b/docs/ROADMAP.md
index a515e8bc7d8..0aeae695654 100644
--- a/docs/ROADMAP.md
+++ b/docs/ROADMAP.md
@@ -19,7 +19,9 @@ Check scores (0-10) and conformance labels (PASS/FAIL/UNKNOWN) are parallel
 evaluation layers over the same probe evidence, produced in a single run.
 Existing checks, probes, and scores are unchanged — v6 is additive. The
 conformance layer is a new product surface aligned with the
-[ORBIT WG](https://github.com/ossf/wg-orbit) ecosystem.
+[ORBIT WG](https://github.com/ossf/wg-orbit) ecosystem. (While Scorecard is not
+part of ORBIT WG, ecosystem interoperability with ORBIT tools is an overarching
+OpenSSF goal, and Scorecard interoperates through published output formats.)
 
 **Target Baseline version:** [v2026.02.19](https://baseline.openssf.org/versions/2026-02-19)
 
@@ -40,15 +42,14 @@ Deliverables:
 
 - Evidence model and output formats:
   - Enriched JSON (Scorecard-native)
-  - In-toto predicates ([SVR](https://github.com/in-toto/attestation/blob/main/spec/predicates/svr.md);
-    track [Baseline Predicate PR #502](https://github.com/in-toto/attestation/pull/502))
+  - In-toto evidence predicate (`scorecard.dev/evidence/v0.1`)
   - Gemara output (via [security-baseline](https://github.com/ossf/security-baseline)
     dependency)
   - OSCAL Assessment Results (via
     [go-oscal](https://github.com/defenseunicorns/go-oscal))
-- Two-layer mapping model for OSPS Baseline v2026.02.19:
-  - Check-level relations contributed upstream to security-baseline
-  - Probe-level mappings maintained in Scorecard
+- Unified framework abstraction for OSPS Baseline v2026.02.19:
+  - Checks and OSPS Baseline both use the same internal probe composition interface
+  - Probe-to-control mappings maintained in Scorecard
 - Applicability engine detecting preconditions (e.g., "has made a release")
 - Map existing probes to OSPS controls where coverage exists today
 - New probes for Level 1 gaps:
@@ -70,10 +71,13 @@ Deliverables:
 - Release asset inspection layer
 - Signed manifest support
 - Release notes and changelog detection
-- Attestation mechanism for non-automatable controls
 - Evidence bundle output (conformance results + in-toto statement)
 - Additional metadata sources for the ingestion layer
 
+**Note:** Phase 1 focuses on automatically verifiable controls. Design of
+attestation mechanisms (for non-automatable controls) is deferred to Phase 3 or
+beyond.
+
 #### Phase 3: Enforcement detection, Level 3, and multi-repo
 
 **Outcome:** Scorecard covers Level 3 controls including enforcement detection
@@ -81,9 +85,10 @@ and project-level aggregation.
 
 Deliverables:
 
-- SCA policy and enforcement detection
-- SAST policy and enforcement detection
+- SCA policy and enforcement detection (Scorecard detects enforcement mechanisms without being an enforcement tool itself)
+- SAST policy and enforcement detection (Scorecard detects enforcement mechanisms without being an enforcement tool itself)
 - Multi-repo project-level conformance aggregation
+- Attestation mechanism for non-automatable controls (deferred from Phase 2)
 - Attestation integration GA
 
 ### Ecosystem alignment
diff --git a/openspec/changes/osps-baseline-conformance/decisions.md b/openspec/changes/osps-baseline-conformance/decisions.md
index 0349048dac3..a37bfb2905a 100644
--- a/openspec/changes/osps-baseline-conformance/decisions.md
+++ b/openspec/changes/osps-baseline-conformance/decisions.md
@@ -308,6 +308,252 @@ Should we adopt these existing issues as the starting work items for Phase 1, or
 **Stephen's response:**
 
 
+---
+
+## Scorecard Maintainer Feedback: Spencer Schrock
+
+The following feedback was provided by Spencer Schrock (Scorecard Steering Committee member and maintainer) on [PR #4952](https://github.com/ossf/scorecard/pull/4952).
+
+### SS-1: Conformance layer definition (ROADMAP.md:19)
+
+**Comment**: "to be clear, in this situation 'evaluation' or 'conformance' layer, just means output format?"
+
+**Response**: The conformance layer includes **both** evaluation logic (probe→control mapping, status determination, applicability detection) and output formatting (enriched JSON, in-toto, Gemara, OSCAL). It composes probe findings into control verdicts, just as checks compose them into 0-10 scores—same probes, different evaluation surfaces, single run.
+
+**Action**: Added clarification to proposal.md where conformance layer is first introduced.
+
+**Status**: RESOLVED
+
+---
+
+### SS-2: Mapping layers clarification (ROADMAP.md:51)
+
+**Comment**: "What's the value in upstreaming check-level relations? I think mapping probes to baseline controls is fine."
+
+**Response**: Checks today are effectively a framework in their own right (probe compositions). The unified framework abstraction means both checks and OSPS Baseline use the same internal representation — probe compositions mapped to framework controls. No "two layers" or "upstreaming" needed; it's all internal to Scorecard.
+
+**Action**: Updated proposal.md and ROADMAP.md to describe unified framework abstraction instead of two-layer mapping model.
+
+**Status**: RESOLVED
+
+---
+
+### SS-3: Catalog extraction target (ROADMAP.md:61)
+
+**Comment**: "extracting to where?"
+
+**Response**: The catalog extraction means extracting Scorecard checks into an in-project control framework representation that uses the same unified framework abstraction as OSPS Baseline. This enables checks and OSPS controls to be treated uniformly within the evaluation layer. Not publishing external artifacts.
+
+**Action**: Clarified catalog extraction description in Phase 1 deliverables.
+
+**Status**: RESOLVED
+
+---
+
+### SS-4: In-toto predicate compatibility (ROADMAP.md:74)
+
+**Comment**: "will this change the intoto format we offer now. which is wrapped around our check-based JSON output?"
+
+**Response**: No. The existing in-toto statement with predicate type `scorecard.dev/result/v0.1` (wrapping check-based JSON output) is **preserved unchanged**. The new evidence predicate (`scorecard.dev/evidence/v0.1`) is **additive**, not a replacement. Both predicates coexist; users choose via CLI flags.
+
+**Action**: Added explicit note about predicate preservation to output formats section.
+
+**Status**: RESOLVED
+
+---
+
+### SS-5: Attestation scope and evidence engine (ROADMAP.md:136)
+
+**Comment**: "As an evidence engine, do we even need to attest to this data? Or is this for data produced by the cron or the action?"
+
+**Response**: Phase 1 focuses on automatically verifiable controls only. Discussion and design of attestation mechanisms (both inbound for non-automatable controls and outbound for signing Scorecard's own output) is deferred beyond Phase 1. When attestation is designed, it would apply to cron/action output.
+
+**Action**: Moved attestation deliverable from Phase 2 to Phase 3/TBD. Added note to Phase 1 scope.
+
+**Status**: DEFERRED
+
+---
+
+### SS-6: Enforcement drift concern (osps-baseline-coverage.md:207)
+
+**Comment**: "Is this going to drift into policy/enforcement? Does this conflict with our goal of evidence only?"
+
+**Response**: No drift into enforcement. Scorecard detects signals of enforcement (e.g., "SCA tool is configured") but does not enforce policies. The boundary is: Scorecard observes and reports; downstream tools (Minder, AMPEL, Allstar) enforce.
+
+**Action**: Reinforced enforcement boundary in Phase 3 description and design principles.
+
+**Status**: RESOLVED
+
+---
+
+### SS-7: Elevated access observability (osps-baseline-coverage.md:36)
+
+**Comment**: "I'd say this could be observable if it just needs the right token. If this is run in the context of an OSPO self-observation I think it's fine."
+
+**Response**: Controls requiring elevated access (org admin tokens, GitHub Apps) are marked as **observable** with access requirements noted. When elevated access is unavailable, these controls return status `UNKNOWN` with reason "Requires elevated repository access." This supports OSPO self-assessment scenarios.
+
+**Action**: Updated coverage analysis guidance to mark elevated-access controls as observable with notation.
+
+**Status**: RESOLVED
+
+---
+
+### SS-8: Cron deployment costs (decisions.md:203)
+
+**Comment**: "I would say the cron has additional barriers, cost of writing/serving more data. I have no concerns with the action"
+
+**Response**: Phase 1 conformance evaluation includes CLI and GitHub Action, but defers cron to Phase 2+ due to BigQuery storage/serving costs for conformance data across 1M+ repos. Action users manage their own storage (no cost to Scorecard infrastructure).
+
+**Action**: Added explicit cron deferral note to Phase 1 scope.
+
+**Status**: RESOLVED
+
+---
+
+### SS-9: Metadata signing (decisions.md:30)
+
+**Comment**: "the metadata could still be signed by a maintainer, just involves some manual effort on their part"
+
+**Response**: Noted as part of the broader attestation design discussion, deferred to post-Phase 1.
+
+**Action**: No immediate action; attestation design deferred.
+
+**Status**: NOTED
+
+---
+
+### SS-10: Mapping file repository location (decisions.md:218)
+
+**Comment**: "this mapping file lives in which repo?"
+
+**Response**: Probe-to-control mappings for OSPS Baseline will live in the Scorecard repository, as part of the unified framework abstraction. Checks and OSPS Baseline both use internal probe composition definitions.
+
+**Action**: Clarified in unified framework abstraction description.
+
+**Status**: RESOLVED
+
+---
+
+### SS-11: Design principles endorsement (proposal.md:205)
+
+**Comment**: "I agree with these principles"
+
+**Response**: Noted. No action needed.
+
+**Status**: ACKNOWLEDGED
+
+---
+
+### SS-12: AGENTS.md relevance (AGENTS.md:3)
+
+**Comment**: "This seems unrelated to this change. Other than I assume you using it to generate some of these docs?"
+
+**Response**: AGENTS.md provides AI collaboration guidelines for the proposal development process. Now removed from git history and .gitignored per Steering Committee discussion.
+
+**Action**: AGENTS.md removed from git history in rebase, .gitignored.
+
+**Status**: RESOLVED
+
+---
+
+## Scorecard Maintainer Feedback: Adam Korczynski
+
+The following feedback was provided by Adam Korczynski (Scorecard maintainer) on [PR #4952](https://github.com/ossf/scorecard/pull/4952).
+
+### AK-1: MVVSR acronym expansion (proposal.md:6)
+
+**Comment**: "Suggestion: Expand MVVSR here."
+
+**Response**: MVVSR = Mission, Vision, Values, Strategy, and Roadmap. Expanded on first use for clarity. Reference OpenSSF's MVVSR at https://openssf.org/about/ for potential alignment.
+
+**Action**: Changed to "Mission, Vision, Values, Strategy, and Roadmap (MVVSR) to be developed as a follow-up deliverable..."
+
+**Status**: RESOLVED
+
+---
+
+### AK-2: Rationale for embedded conformance (proposal.md:16)
+
+**Comment**: "A bit of a general question: What are the reasons of adding framework conformance to Scorecard itself instead of having a standalone tool to which we can feed Scorecard findings and where the standalone tool then gives a verdict about framework conformance?"
+
+**Response**: Scorecard already performs the core evaluation work needed for framework conformance: probe execution, evidence collection, and probe composition. The conformance layer builds on Scorecard's existing architecture rather than duplicating capabilities in a separate tool. Scorecard did a lot of what we needed for evaluation, so there's no need for a new tool.
+
+**Action**: Added "Why framework conformance in Scorecard?" section to proposal explaining architectural rationale.
+
+**Status**: RESOLVED
+
+---
+
+### AK-3: Downstream tools definition (proposal.md:106)
+
+**Comment**: "Would be good with a definition of downstream tools here."
+
+**Response**: Added definition: "Downstream tools are tools that consume Scorecard's output to make policy decisions, enforce requirements, or aggregate security posture." With examples: AMPEL, Minder, Privateer, Darnit, LFX Insights, Allstar.
+
+**Action**: Added definition where "downstream tools" first appears.
+
+**Status**: RESOLVED
+
+---
+
+### AK-4: Processing model vs Three-tier model (proposal.md:149)
+
+**Comment**: "Is this ('Processing model') the current dataflow and the following section 'Three-tier evaluation model' the intended?"
+
+**Response**: Both are complementary views of the same architecture. Processing model (Ingest → Analyze → Evaluate → Deliver) is the temporal data flow view. Three-tier model (Evidence → Evaluation → Presentation) is the structural layers view. Neither is "current" vs "intended"—they describe different aspects of the v6 architecture.
+
+**Action**: Added connecting sentence explaining relationship between the two models.
+
+**Status**: RESOLVED
+
+---
+
+### AK-5: Architectural constraints framing (proposal.md:189)
+
+**Comment**: "Not sure if this is entirely correct. Currently, I wouldn't say that Scorecard can produce conformance results, but perhaps I am understanding the context of 'constraints' incorrectly here; Are these current constraints or are they constraints that should exist with the conformance layer in Scorecard?"
+
+**Response**: These are the architectural target state we're building toward, not current state constraints. The section heading "Architectural constraints" was misleading.
+
+**Action**: Renamed section to "Architectural target state" or "Architectural principles" to clarify these describe the v6 design, not current limitations.
+
+**Status**: RESOLVED
+
+---
+
+### AK-6: Option A reference (proposal.md:201)
+
+**Comment**: "Where is Option A?"
+
+**Response**: "Option A" exists only in decisions.md (architecture options discussion). Referencing it in proposal.md creates confusion.
+
+**Action**: Removed "Option A" mention from proposal.md, inlined the actual description of what we're accomplishing (unified framework abstraction).
+
+**Status**: RESOLVED
+
+---
+
+### AK-7: ORBIT WG context (ROADMAP.md:91)
+
+**Comment**: "Why does Scorecard need to operate within the ORBIT WG ecosystem? Perhaps add a bit about what the ORBIT WG ecosystem is - that may clear it up."
+
+**Response**: Scorecard is **not** part of ORBIT WG. Ecosystem interoperability with ORBIT tools is an overarching OpenSSF goal, and Scorecard interoperates through published output formats. Added one-sentence clarification.
+
+**Action**: Added explanation after first ORBIT mention in both proposal.md and ROADMAP.md.
+
+**Status**: RESOLVED
+
+---
+
+### AK-8: Success criteria ambiguity (proposal.md:393)
+
+**Comment**: "Would be nice to make this more explicit: What is the success criteria for here? The proposal or the implementation?"
+
+**Response**: These are **proposal acceptance criteria**. The proposal is accepted when Phase 1 implementation successfully delivers the described outcomes: Level 1 conformance reports, validated output formats, downstream consumer validation, open questions resolved, and no breaking changes to existing functionality.
+
+**Action**: Add clarifying note to success criteria section: "The following criteria define proposal acceptance (successful Phase 1 implementation):"
+
+**Status**: RESOLVED
+
 ---
 
 ## ORBIT WG feedback
diff --git a/openspec/changes/osps-baseline-conformance/proposal.md b/openspec/changes/osps-baseline-conformance/proposal.md
index 8fc9d11e2e7..231b411e977 100644
--- a/openspec/changes/osps-baseline-conformance/proposal.md
+++ b/openspec/changes/osps-baseline-conformance/proposal.md
@@ -3,15 +3,18 @@
 ## Summary
 
 **Mission:** Scorecard produces trusted, structured security evidence for the
-open source ecosystem. _(Full MVVSR to be developed as a follow-up deliverable
-for Steering Committee review.)_
+open source ecosystem. _(Full Mission, Vision, Values, Strategy, and Roadmap
+(MVVSR) to be developed as a follow-up deliverable for Steering Committee
+review.)_
 
 Scorecard is an **open source security evidence engine**. It accepts diverse
 inputs about a project's security practices, normalizes them through probe-based
 analysis, and packages the resulting evidence in interoperable formats for
-downstream tools to act on. OSPS Baseline conformance is the first use case that
-proves this architecture, and the central initiative for Scorecard's 2026
-roadmap.
+downstream tools to act on. (**Downstream tools** are tools that consume
+Scorecard's output to make policy decisions, enforce requirements, or aggregate
+security posture—examples: AMPEL, Minder, Privateer, Darnit, LFX Insights,
+Allstar.) OSPS Baseline conformance is the first use case that proves this
+architecture, and the central initiative for Scorecard's 2026 roadmap.
 
 This is fundamentally a **product-level shift** — the defining change for
 **Scorecard v6**. Scorecard today answers "how well does this repo follow best
@@ -21,6 +24,12 @@ this project meet these MUST statements at this maturity level?"
 conformance labels are parallel evaluation layers over the same probe evidence —
 existing checks and scores are unchanged.
 
+The conformance layer includes **both** evaluation logic (probe→control mapping,
+status determination, applicability detection) and output formatting (enriched
+JSON, in-toto, Gemara, OSCAL). It composes probe findings into control verdicts,
+just as checks compose them into 0-10 scores—same probes, different evaluation
+surfaces, single run.
+
 ### Why v6
 
 Scorecard v6 represents a major evolution: from a scoring tool to an evidence
@@ -42,7 +51,7 @@ additive — no breaking changes to existing surfaces.
 
 ### Why now
 
-1. **OSPS Baseline is the emerging standard.** The OSPS Baseline (v2026.02.19) defines controls across 3 maturity levels. It is maintained within the ORBIT Working Group and is becoming the reference framework for open source project security posture. See the [OSPS Baseline maintenance process](https://baseline.openssf.org/maintenance.html) for the versioning cadence.
+1. **OSPS Baseline is the emerging standard.** The OSPS Baseline (v2026.02.19) defines controls across 3 maturity levels. It is maintained within the ORBIT Working Group and is becoming the reference framework for open source project security posture. (While Scorecard is not part of ORBIT WG, ecosystem interoperability with ORBIT tools is an overarching OpenSSF goal, and Scorecard interoperates through published output formats.) See the [OSPS Baseline maintenance process](https://baseline.openssf.org/maintenance.html) for the versioning cadence.
 
 2. **The ecosystem is moving.** The [Privateer plugin for GitHub repositories](https://github.com/ossf/pvtr-github-repo-scanner) already evaluates 39 of 52 control requirements and powers LFX Insights security results. The OSPS Baseline GitHub Action can upload SARIF. Best Practices Badge is staging Baseline-phase work. Scorecard's large install base is an advantage, but only if it ships a conformance surface.
 
@@ -50,6 +59,17 @@ additive — no breaking changes to existing surfaces.
 
 4. **Regulatory pressure.** The EU Cyber Resilience Act (CRA) and similar regulatory frameworks increasingly expect evidence-based security posture documentation. Scorecard produces structured evidence that downstream tools and processes may use when evaluating regulatory readiness. Scorecard does not itself guarantee CRA compliance or any other regulatory compliance.
 
+### Why framework conformance in Scorecard?
+
+Scorecard already performs the core evaluation work needed for framework conformance: probe execution, evidence collection, and probe composition. Rather than create a separate tool that duplicates this capability, the conformance layer builds on Scorecard's existing architecture:
+
+- **Probe execution**: Scorecard's probes already collect the evidence needed for control evaluation
+- **Composition model**: Checks demonstrate probe composition (e.g., the Binary-Artifacts check composes multiple binary detection probes); OSPS controls use the same composition pattern
+- **Evidence normalization**: Probes already normalize diverse signals (GitHub API, file analysis, etc.) into structured findings
+- **Applicability detection**: Scorecard already evaluates preconditions (e.g., "has made a release") for checks; controls need the same capability
+
+The conformance layer is a natural extension of Scorecard's probe-based architecture, not a bolted-on feature. Scorecard did a lot of what we needed for evaluation, so there's no need for a new tool.
+
 ### What Scorecard brings that others don't
 
 - **Deep automated analysis.** 50+ probes with structured results provide granular evidence that the Privateer GitHub plugin's shallower checks cannot match (e.g., per-workflow token permission analysis, detailed branch protection rule inspection, CI/CD injection pattern detection).
@@ -135,11 +155,11 @@ only). Eddie Knight, Adolfo García Veytia, and Mike Lieberman provided
 ORBIT WG feedback on output formats, mapping file ownership, and architectural
 direction.
 
-The central architectural question (CQ-19) has been resolved: Scorecard takes
-the hybrid approach (Option C), designed so that scaling back to Option A remains
-straightforward if needed. Scorecard owns all probe execution and conformance
-evaluation logic. Interoperability is purely at the output layer. See the
-Architecture section below and [`decisions.md`](decisions.md) for details.
+The central architectural question (CQ-19) has been resolved: Scorecard owns
+all probe execution and conformance evaluation logic, with interoperability
+purely at the output layer. This hybrid approach enables scaling back to a fully
+independent model if needed. See the Architecture section below and
+[`decisions.md`](decisions.md) for details.
 
 For the full list of questions, reviewer feedback, maintainer responses, and
 decision priority analysis, see [`decisions.md`](decisions.md).
@@ -179,7 +199,13 @@ for other frameworks and organizational profiles. Probe findings carry no
 framework-specific semantics — only the mapping definitions (which probes
 compose into which control outcomes) are framework-specific.
 
-### Architectural constraints
+The three-tier model describes the structural layers of the architecture. The
+Processing model (described earlier in this section) provides the temporal data
+flow view, showing how data moves through these layers (Ingest → Analyze →
+Evaluate → Deliver). These are complementary perspectives on the same
+architecture.
+
+### Architectural target state
 
 1. Scorecard owns all probe execution (non-negotiable core competency)
 2. Scorecard owns its own conformance evaluation logic (mapping, PASS/FAIL,
@@ -197,10 +223,6 @@ compose into which control outcomes) are framework-specific.
 The [security-baseline](https://github.com/ossf/security-baseline) repo is an
 acceptable data dependency for control definitions (see Scope).
 
-**Flexibility:** Under this structure, scaling back to a fully independent model
-(Option A) remains straightforward — deprioritize or drop specific output
-formatters without affecting the evaluation layer.
-
 ### Design principles
 
 1. **Evidence is the product.** Scorecard's core output is structured,
@@ -234,13 +256,16 @@ controls and Scorecard probes is a versioned data file, not hard-coded logic.
 1. **OSPS conformance engine** — new package that maps controls to Scorecard probes, evaluates per-control status, handles applicability
 2. **Evidence model and output formats** — the evidence model is the core deliverable; output formats are presentation layers over it:
    - Enriched JSON (Scorecard-native, no external dependency)
-   - In-toto predicates (SVR first; track [Baseline Predicate PR #502](https://github.com/in-toto/attestation/pull/502))
+   - In-toto evidence predicate (`scorecard.dev/evidence/v0.1`) — Scorecard-owned, framework-agnostic predicate supporting OSPS Baseline, SLSA, and custom frameworks with probe-level evidence. Rationale: own predicate destiny, don't depend on unmerged external PRs, support BYO frameworks.
    - Gemara output (transitive dependency via security-baseline)
    - OSCAL Assessment Results (using [go-oscal](https://github.com/defenseunicorns/go-oscal))
-   - Existing Scorecard predicate type (`scorecard.dev/result/v0.1`) preserved; new predicate types added as options
-3. **Two-layer mapping model** — data-driven mappings at two levels:
-   - *Upstream* ([security-baseline](https://github.com/ossf/security-baseline) repo): Check-level relations — "OSPS-AC-03 relates to Scorecard's Branch-Protection check." Scorecard maintainers contribute via PR. Uses "informs" / "provides evidence toward" language (not "satisfies" / "demonstrates compliance with" — see [security-baseline PR #476](https://github.com/ossf/security-baseline/pull/476)).
-   - *Internal* (Scorecard repo): Probe-level mappings — "OSPS-AC-03.01 is evaluated by probes X + Y with logic Z." Depends on probe implementation details. A control may map to a single probe (1:1) or a composition of probes with evaluation logic (many-to-1). This follows the same composition pattern used by [existing checks](https://github.com/ossf/scorecard/blob/main/probes/entries.go).
+   - **Note:** Existing `scorecard.dev/result/v0.1` predicate (check-based JSON) preserved unchanged. The new evidence predicate is additive, not a replacement. Both coexist; users choose via CLI flags.
+3. **Unified framework abstraction for OSPS Baseline v2026.02.19** — Checks and OSPS Baseline both use the same internal interface/representation (probe compositions):
+   - Probe-to-control mappings maintained in Scorecard for OSPS Baseline controls
+   - Framework evaluation layer produces conformance results (PASS/FAIL/UNKNOWN/NOT_APPLICABLE)
+   - A control may map to a single probe (1:1) or a composition of probes with evaluation logic (many-to-1)
+   - This follows the same composition pattern used by [existing checks](https://github.com/ossf/scorecard/blob/main/probes/entries.go)
+   - Checks themselves are effectively a framework (the "Scorecard framework"); OSPS Baseline is another framework over the same probe evidence
 4. **security-baseline dependency** — `github.com/ossf/security-baseline` as a data dependency for control definitions, Gemara types, and OSCAL catalog models
 5. **Applicability engine** — detects preconditions (e.g., "has made a release") and outputs NOT_APPLICABLE
 6. **Metadata ingestion layer** — supports Security Insights as one source among several for metadata-dependent controls (OSPS-BR-03.01, BR-03.02, QA-04.01). Architecture invites contributions for alternative sources (SBOMs, VEX, platform APIs). No single metadata file is required for meaningful results.
@@ -297,16 +322,23 @@ Phases are ordered by outcome, not calendar quarter. Maintainer bandwidth dictat
 
 ### Phase 1: Conformance foundation + Level 1 coverage
 
-**Outcome:** Scorecard produces a useful OSPS Baseline Level 1 conformance report for any public GitHub repository, available across CLI, Action, and API surfaces.
+**Outcome:** Scorecard produces a useful OSPS Baseline Level 1 conformance report for any public GitHub repository, available across CLI and GitHub Action.
+
+**Deployment surfaces for Phase 1:**
+- ✅ CLI (local execution)
+- ✅ GitHub Action (repository-specific results, no storage cost to Scorecard project)
+- ❌ Cron service deferred to Phase 2+ (storage/serving cost for conformance data across 1M+ repos needs evaluation)
+
+Phase 1 still delivers value: organizations can self-assess via Action or CLI without waiting for cron coverage.
 
 - Evidence model and output formats:
   - Enriched JSON (Scorecard-native)
-  - In-toto predicates ([SVR](https://github.com/in-toto/attestation/blob/main/spec/predicates/svr.md); track [Baseline Predicate PR #502](https://github.com/in-toto/attestation/pull/502))
+  - In-toto evidence predicate (`scorecard.dev/evidence/v0.1`)
   - Gemara output (transitive via [security-baseline](https://github.com/ossf/security-baseline) dependency)
-  - OSCAL Assessment Results (via [go-oscal](https://github.com/defenseunicorns/go-oscal); complements security-baseline's OSCAL Catalog export)
-- Two-layer mapping model for OSPS Baseline v2026.02.19:
-  - Check-level relations contributed upstream to security-baseline
-  - Probe-level mappings maintained in Scorecard
+  - OSCAL Assessment Results (via [go-oscal](https://github.com/defenseunicorns/go-oscal))
+- Unified framework abstraction for OSPS Baseline v2026.02.19:
+  - Probe-to-control mappings maintained in Scorecard
+  - Checks and OSPS Baseline use same internal probe composition interface
 - Applicability engine (detect "has made a release" and other preconditions)
 - Map existing probes to OSPS controls where coverage exists today
 - New probes for Level 1 gaps (prioritized by coverage impact):
@@ -315,7 +347,7 @@ Phases are ordered by outcome, not calendar quarter. Maintainer bandwidth dictat
   - Security policy deepening (VM-02.01, VM-03.01, VM-01.01)
   - Secrets detection (BR-07.01) — consume platform signals (e.g., GitHub secret scanning API) where possible
 - Metadata ingestion layer v1 — Security Insights as first supported source (BR-03.01, BR-03.02, QA-04.01); architecture supports additional metadata sources
-- Scorecard control catalog extraction plan (enabling other tools to consume Scorecard's control definitions)
+- Scorecard control catalog extraction — Extract Scorecard checks into an in-project control framework representation that uses the same unified framework abstraction as OSPS Baseline. This enables checks and OSPS Baseline controls to be treated uniformly within the evaluation layer.
 
 ### Phase 2: Release integrity + Level 2 core
 
@@ -392,6 +424,8 @@ guarantee compliance with any regulatory framework.
 
 ## Success criteria
 
+The following criteria define proposal acceptance (successful Phase 1 implementation):
+
 1. Scorecard produces a valid OSPS Baseline Level 1 conformance report for any public GitHub repository across CLI, Action, and API surfaces
 2. Evidence model supports multiple output formats (enriched JSON, in-toto, Gemara, OSCAL) — each validated with at least one downstream consumer
 3. Conformance evidence is consumable by any downstream tool through published output formats (validated with ORBIT WG)

From d4596bfb185ed0b7e3580a4d381a1d3b3ac787ae Mon Sep 17 00:00:00 2001
From: Stephen Augustus <foo@auggie.dev>
Date: Sat, 21 Mar 2026 21:00:05 +0100
Subject: [PATCH 27/28] :seedling: Make proposal.md self-contained; remove
 dependencies on decisions.md

- Replace "Open questions" section with inline design decision explanations
- Remove all "pending OQ-1" and "pending OQ-2" references from Phase deliverables
- Clarify attestation mechanism deferred to Phase 3
- Clarify enforcement detection boundary (detect not enforce)
- Document predicate strategy, architecture, and unified framework abstraction inline
- Add full reviewer attribution (Spencer, Adam, Eddie, Adolfo, Mike, Felix)

Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Stephen Augustus <foo@auggie.dev>
---
 .../osps-baseline-conformance/proposal.md     | 38 +++++++++----------
 1 file changed, 19 insertions(+), 19 deletions(-)

diff --git a/openspec/changes/osps-baseline-conformance/proposal.md b/openspec/changes/osps-baseline-conformance/proposal.md
index 231b411e977..8e54a86425d 100644
--- a/openspec/changes/osps-baseline-conformance/proposal.md
+++ b/openspec/changes/osps-baseline-conformance/proposal.md
@@ -146,23 +146,23 @@ A fresh analysis of Scorecard's current coverage against OSPS Baseline v2026.02.
 - **Multi-repo scanning** (`--repos`, `--org`) — needed for OSPS-QA-04.02 (subproject conformance)
 - **Serve mode** — HTTP surface for pipeline integration
 
-## Open questions
+## Open questions and design decisions
 
-Several design questions are under active discussion. Spencer (Steering
-Committee) raised questions about attestation identity (OQ-1), enforcement
-detection scope (OQ-2), and the evidence model (OQ-4, resolved — probe-based
-only). Eddie Knight, Adolfo García Veytia, and Mike Lieberman provided
-ORBIT WG feedback on output formats, mapping file ownership, and architectural
-direction.
+The following design questions have been addressed through maintainer review and Steering Committee discussion:
 
-The central architectural question (CQ-19) has been resolved: Scorecard owns
-all probe execution and conformance evaluation logic, with interoperability
-purely at the output layer. This hybrid approach enables scaling back to a fully
-independent model if needed. See the Architecture section below and
-[`decisions.md`](decisions.md) for details.
+**Attestation mechanism for non-automatable controls** — Phase 1 focuses on automatically verifiable controls only. Discussion and design of attestation mechanisms (both inbound for non-automatable controls and outbound for signing Scorecard's own output) is deferred to Phase 3 or beyond. This avoids blocking on identity model questions (OIDC vs. repo-local metadata vs. platform-native signals) while making progress on controls Scorecard can definitively evaluate.
 
-For the full list of questions, reviewer feedback, maintainer responses, and
-decision priority analysis, see [`decisions.md`](decisions.md).
+**Enforcement detection boundary** — Scorecard detects signals of enforcement (e.g., "SCA tool is configured," "SAST results required before merge") but does not itself enforce policies. The boundary is: Scorecard observes and reports; downstream tools (Minder, AMPEL, Allstar) enforce. Phase 3 includes enforcement detection for SCA and SAST policies, but Scorecard remains an evidence engine, not an enforcement tool.
+
+**Predicate strategy** — The Steering Committee rejected in-toto SVR (too minimal, no probe-level evidence) and decided to create a new Scorecard-owned, framework-agnostic predicate type (`scorecard.dev/evidence/v0.1`). This supports OSPS Baseline, SLSA, and custom frameworks with probe-level evidence. The existing `scorecard.dev/result/v0.1` predicate (check-based) is preserved unchanged.
+
+**Architecture** — Scorecard owns all probe execution and conformance evaluation logic, with interoperability purely at the output layer (in-toto, Gemara, OSCAL). This hybrid approach enables scaling back to a fully independent model if needed.
+
+**Unified framework abstraction** — Checks and OSPS Baseline both use the same internal probe composition interface. No separate "mapping layers" or upstream contributions needed; probe-to-control mappings are maintained in Scorecard.
+
+**Evidence model** — Probe-based only, not check-based. Conformance results reference probe findings as evidence, not check scores.
+
+For the full review history including feedback from Spencer Schrock, Adam Korczynski, Eddie Knight (ORBIT WG), Adolfo García Veytia (AMPEL), Mike Lieberman, and Felix Lange, see [`decisions.md`](decisions.md).
 
 ## Architecture
 
@@ -269,7 +269,7 @@ controls and Scorecard probes is a versioned data file, not hard-coded logic.
 4. **security-baseline dependency** — `github.com/ossf/security-baseline` as a data dependency for control definitions, Gemara types, and OSCAL catalog models
 5. **Applicability engine** — detects preconditions (e.g., "has made a release") and outputs NOT_APPLICABLE
 6. **Metadata ingestion layer** — supports Security Insights as one source among several for metadata-dependent controls (OSPS-BR-03.01, BR-03.02, QA-04.01). Architecture invites contributions for alternative sources (SBOMs, VEX, platform APIs). No single metadata file is required for meaningful results.
-7. **Attestation mechanism (v1)** — accepts repo-local metadata for non-automatable controls (pending OQ-1 resolution)
+7. **Attestation mechanism (v1)** — accepts repo-local metadata for non-automatable controls
 8. **Scorecard probe catalog** — Scorecard *consumes* external control catalogs (OSPS Baseline via security-baseline) for conformance evaluation. The catalog extraction plan packages Scorecard's own probe definitions (`probes/*/def.yml`) as a consumable artifact so external tools (e.g., AMPEL) can discover what Scorecard evaluates and compose their own mappings against it.
 9. **New probes and probe enhancements** for gap controls:
    - Secrets detection (OSPS-BR-07.01)
@@ -278,7 +278,7 @@ controls and Scorecard probes is a versioned data file, not hard-coded logic.
    - Security policy deepening (OSPS-VM-02.01, VM-03.01, VM-01.01)
    - Release asset inspection (multiple L2/L3 controls)
    - Signed manifest support (OSPS-BR-06.01)
-   - Enforcement detection (OSPS-VM-05.*, VM-06.* — pending OQ-2 resolution)
+   - Enforcement detection (OSPS-VM-05.*, VM-06.*)
 10. **Multi-repo project-level conformance** (OSPS-QA-04.02)
 
 ### Future design concepts
@@ -356,7 +356,7 @@ Phase 1 still delivers value: organizations can self-assess via Action or CLI wi
 - Release asset inspection layer (detect compiled assets, SBOMs, licenses with releases)
 - Signed manifest support (BR-06.01)
 - Release notes/changelog detection (BR-04.01)
-- Attestation mechanism v1 for non-automatable controls (pending OQ-1 resolution)
+- Attestation mechanism v1 for non-automatable controls
 - Evidence bundle output v1 (conformance results + in-toto statement + SARIF for failures)
 - Additional metadata sources for the ingestion layer
 
@@ -364,8 +364,8 @@ Phase 1 still delivers value: organizations can self-assess via Action or CLI wi
 
 **Outcome:** Scorecard covers Level 3 controls including enforcement detection and project-level aggregation.
 
-- SCA policy + enforcement detection (VM-05.* — pending OQ-2 resolution)
-- SAST policy + enforcement detection (VM-06.* — pending OQ-2 resolution)
+- SCA policy + enforcement detection (VM-05.*)
+- SAST policy + enforcement detection (VM-06.*)
 - Multi-repo project-level conformance aggregation (QA-04.02)
 - Attestation integration GA
 

From 1889ff5fac41f3ee4caebdf655adbcbdebfef20c Mon Sep 17 00:00:00 2001
From: Stephen Augustus <justaugustus@users.noreply.github.com>
Date: Mon, 30 Mar 2026 13:07:30 -0400
Subject: [PATCH 28/28] ROADMAP: Add Best Practices Badge automation proposals
 integration

Signed-off-by: Stephen Augustus <foo@auggie.dev>
Co-Authored-By: David A. Wheeler <dwheeler@dwheeler.com>
---
 docs/ROADMAP.md | 1 +
 1 file changed, 1 insertion(+)

diff --git a/docs/ROADMAP.md b/docs/ROADMAP.md
index 0aeae695654..d266a94cb0f 100644
--- a/docs/ROADMAP.md
+++ b/docs/ROADMAP.md
@@ -47,6 +47,7 @@ Deliverables:
     dependency)
   - OSCAL Assessment Results (via
     [go-oscal](https://github.com/defenseunicorns/go-oscal))
+  - OpenSSF Best Practices Badge [Automation Proposals](https://github.com/coreinfrastructure/best-practices-badge/blob/main/docs/automation-proposals.md)
 - Unified framework abstraction for OSPS Baseline v2026.02.19:
   - Checks and OSPS Baseline both use the same internal probe composition interface
   - Probe-to-control mappings maintained in Scorecard