📖 Scorecard v6: OSPS Baseline conformance proposal and 2026 roadmap#4952
📖 Scorecard v6: OSPS Baseline conformance proposal and 2026 roadmap#4952justaugustus merged 28 commits intoossf:mainfrom
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #4952 +/- ##
==========================================
+ Coverage 66.80% 69.67% +2.87%
==========================================
Files 230 251 +21
Lines 16602 15654 -948
==========================================
- Hits 11091 10907 -184
+ Misses 4808 3873 -935
- Partials 703 874 +171 🚀 New features to boost your workflow:
|
justaugustus
left a comment
There was a problem hiding this comment.
@ossf/scorecard-maintainers @ossf/scorecard-fe-maintainers @eddie-knight @puerco @evankanderson @mlieberman85 — based on conversations from this week with various WG ORBIT-adjacent maintainers, I'm tossing this early draft up for review.
Feel free to comment away while I work through this!
f0b229d to
afe4c8d
Compare
4f5c6ce to
bd76d94
Compare
|
Hey @justaugustus, thanks for leading this collaboration! Looking forward to hammering this out. Some things to clarify:
An alternative plan would be to for us to spend a week consolidating checks/probes into the pvtr plugin (with relevant CODEOWNERS), then update Scorecard to selectively execute the plugin under the covers. This would allow us to:
|
Add "ORBIT WG feedback" section documenting Eddie Knight's feedback from PR ossf#4952. Eddie is the ORBIT WG TSC Chair and maintainer of Gemara, Privateer, and OSPS Baseline. Five feedback items documented as EK-1 through EK-5: - EK-1: Mapping file could live in Baseline repo with CODEOWNERS - EK-2: No "OSPS output format" exists; use Gemara SDK formats - EK-3: Current proposal duplicates Privateer despite stating otherwise - EK-4: Catalog extraction needs concrete implementation plan - EK-5: Alternative architecture — shared plugin model Add five new clarifying questions (CQ-17 through CQ-21) for Steering Committee decisions: - CQ-17: Mapping file location (Scorecard repo vs shared) - CQ-18: Output format (--format=osps vs Gemara SDK) - CQ-19: Build vs integrate (own engine vs shared plugin) - CQ-20: Catalog extraction scope - CQ-21: Privateer code duplication acceptability Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: Stephen Augustus <foo@auggie.dev>
|
@eddie-knight — Thanks for the thoughtful response and yes! Looking forward to working on this with you all! I've integrated your feedback as open questions in 2ee759f. Can you take a quick look and see if that commit accurately captures your questions and concerns before I continue? |
puerco
left a comment
There was a problem hiding this comment.
This is really cool to see. Some initial thoughts.
openspec/changes/osps-baseline-conformance/specs/osps-conformance/spec.md
Outdated
Show resolved
Hide resolved
openspec/changes/osps-baseline-conformance/specs/osps-conformance/spec.md
Outdated
Show resolved
Hide resolved
openspec/changes/osps-baseline-conformance/specs/osps-conformance/spec.md
Outdated
Show resolved
Hide resolved
openspec/changes/osps-baseline-conformance/specs/osps-conformance/spec.md
Outdated
Show resolved
Hide resolved
openspec/changes/osps-baseline-conformance/specs/osps-conformance/spec.md
Outdated
Show resolved
Hide resolved
The scope of this work is OSPS Baseline conformance within the ORBIT ecosystem — Privateer/PVTR interoperability is one aspect, not the whole story. Signed-off-by: Stephen Augustus <foo@auggie.dev> Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: Stephen Augustus <foo@auggie.dev>
Complete rewrite of the proposal and spec to cover the full scope of the 2026 roadmap, not just Privateer/PVTR interoperability: - Conformance engine producing PASS/FAIL/UNKNOWN/NOT_APPLICABLE/ATTESTED - OSPS output format (--format=osps) - Versioned control-to-probe mapping files - Applicability engine for precondition detection - Security Insights ingestion for ORBIT ecosystem interop - Attestation mechanism for non-automatable controls - Gemara Layer 4 compatibility - CI gating support - Phased delivery aligned with quarterly milestones - ORBIT ecosystem positioning (complement PVTR, don't duplicate) Highlights Spencer's review notes as numbered open questions (OQ-1 through OQ-4): - OQ-1: Attestation identity model (OIDC? tokens? workflows?) - OQ-2: Enforcement detection vs. being an enforcement tool - OQ-3: scan_scope field usefulness in output schema - OQ-4: Evidence should be probe-based only, not check-based Renames spec subdirectory from pvtr-baseline to osps-conformance. Signed-off-by: Stephen Augustus <foo@auggie.dev> Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: Stephen Augustus <foo@auggie.dev>
- Add openspec/specs/core-checks/spec.md and openspec/specs/probes/spec.md documenting existing Scorecard architecture for spec-driven development - Update .gitignore to exclude roadmap drafting notes Signed-off-by: Stephen Augustus <foo@auggie.dev> Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: Stephen Augustus <foo@auggie.dev>
Stephen's responses to clarifying questions (CQ-1 through CQ-8) and feedback on the proposal draft: - Both scoring and conformance modes coexist; no deprecation needed now - Target OSPS Baseline v2026.02.19 (latest), align with maintenance cadence - Provide degraded-but-useful evaluation without Security Insights - Invest in Gemara SDK integration for multi-tool consumption - Prioritize Level 1 conformance; consume external signals where possible - Approval requires Stephen + Spencer + 1 non-Steering maintainer - Q2 outcome should be OSPS Baseline Level 1 conformance - Land capabilities across all surfaces (CLI, Action, API) Key changes requested: - Correct PVTR references (it's the Privateer plugin, not a separate tool) - Add Darnit and AMPEL comparison - Replace quarterly timelines with phase-based outcomes - Plan to extract Scorecard's control catalog for other tools - Use Mermaid for diagrams - Create separate OSPS Baseline coverage analysis in docs/ - Create docs/ROADMAP.md for public consumption Signed-off-by: Stephen Augustus <foo@auggie.dev> Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: Stephen Augustus <foo@auggie.dev>
Changes based on Stephen's review: - Replace all "PVTR" references with "Privateer plugin for GitHub repositories" — it's the Privateer plugin, not a separate tool - Add ecosystem tooling comparison section covering Darnit (compliance audit + remediation), AMPEL (attestation-based policy enforcement), Privateer plugin (Baseline evaluation), and Scorecard (measurement) - Replace quarterly timeline (Q1-Q4) with phase-based delivery (Phase 1-3) focused on outcomes, not calendar dates - Update OSPS Baseline version from v2025-10-10 to v2026.02.19 - Convert ASCII ecosystem diagram to Mermaid - Add Scorecard control catalog extraction to scope - Add Gemara SDK integration to scope - Update coverage snapshot to reference docs/osps-baseline-coverage.md (to be created with fresh analysis) - Add approval process section based on governance answers - Update Security Insights requirement to degraded-but-useful mode - Add integration pipeline diagram (Scorecard -> Darnit -> AMPEL) Signed-off-by: Stephen Augustus <foo@auggie.dev> Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: Stephen Augustus <foo@auggie.dev>
Create docs/osps-baseline-coverage.md with a control-by-control analysis of Scorecard's current probe coverage against OSPS Baseline v2026.02.19. Coverage summary: 8 COVERED, 17 PARTIAL, 31 GAP, 3 NOT_OBSERVABLE across 59 controls. Create docs/ROADMAP.md with a publicly-consumable 2026 roadmap organized into three phases: conformance foundation + Level 1, release integrity + Level 2, and enforcement detection + Level 3. Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: Stephen Augustus <foo@auggie.dev>
…h CQ-12 Remove reference to docs/roadmap-ideas.md from the coverage analysis document since it is not committed to the repo. Add four new clarifying questions to the proposal: NOT_OBSERVABLE controls in Phase 1 (CQ-9), mapping file ownership (CQ-10), OSPS output schema stability guarantees (CQ-11), and Phase 1 probe gap prioritization (CQ-12). Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: Stephen Augustus <foo@auggie.dev>
Replace \n with <br/> in Mermaid node labels so line breaks render correctly in GitHub's Mermaid renderer. Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: Stephen Augustus <foo@auggie.dev>
Replace remaining "Darn" references with "Darnit" throughout the proposal. Add Minder to the ecosystem comparison table, integration diagram, and "What Scorecard must not do" section. Minder is an OpenSSF Sandbox project in the ORBIT WG that consumes Scorecard findings for policy enforcement and auto-remediation. Add CQ-13 (Minder integration surface) and CQ-14 (Darnit vs. Minder delineation) as new clarifying questions. Update docs/ROADMAP.md ecosystem alignment to include Minder. Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: Stephen Augustus <foo@auggie.dev>
Add a new section to docs/osps-baseline-coverage.md listing existing Scorecard issues and PRs that are directly relevant to closing OSPS Baseline coverage gaps, including: - ossf#2305 / ossf#2479 (Security Insights) - #30 (secrets scanning) - ossf#1476 / ossf#2605 (SBOM) - ossf#4824 (changelog) - ossf#2465 (private vulnerability reporting) - ossf#4080 / ossf#4823 / ossf#2684 / ossf#1417 (signed releases) - ossf#2142 (threat model) - ossf#4723 (Minder/Rego integration, closed) Add CQ-15 asking whether existing issues should be adopted as Phase 1 work items or whether new issues should reference them. Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: Stephen Augustus <foo@auggie.dev>
Remove openspec system specs (core-checks, platform-clients, probes) that were scaffolding for documenting existing Scorecard architecture. These are not part of the OSPS conformance proposal and can be recreated if needed. Remove docs/roadmap-ideas.md from .gitignore. Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: Stephen Augustus <foo@auggie.dev>
Add Allstar (Scorecard sub-project) to the ecosystem comparison table, integration flow diagram, and ORBIT ecosystem diagram. Allstar continuously monitors GitHub orgs and enforces Scorecard checks as policies with auto-remediation, and already enforces controls aligned with OSPS Baseline (branch protection, security policy, binary artifacts, dangerous workflows). Add Allstar to "Existing Scorecard surfaces that matter" section and to docs/ROADMAP.md ecosystem alignment. Add CQ-16 asking whether Allstar should be an explicit Phase 1 consumer of OSPS conformance output, and whether it is considered part of the enforcement boundary Scorecard does not cross. Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: Stephen Augustus <foo@auggie.dev>
Add "ORBIT WG feedback" section documenting Eddie Knight's feedback from PR ossf#4952. Eddie is the ORBIT WG TSC Chair and maintainer of Gemara, Privateer, and OSPS Baseline. Five feedback items documented as EK-1 through EK-5: - EK-1: Mapping file could live in Baseline repo with CODEOWNERS - EK-2: No "OSPS output format" exists; use Gemara SDK formats - EK-3: Current proposal duplicates Privateer despite stating otherwise - EK-4: Catalog extraction needs concrete implementation plan - EK-5: Alternative architecture — shared plugin model Add five new clarifying questions (CQ-17 through CQ-21) for Steering Committee decisions: - CQ-17: Mapping file location (Scorecard repo vs shared) - CQ-18: Output format (--format=osps vs Gemara SDK) - CQ-19: Build vs integrate (own engine vs shared plugin) - CQ-20: Catalog extraction scope - CQ-21: Privateer code duplication acceptability Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: Stephen Augustus <foo@auggie.dev>
These are clarifying edits to ensure we've captured the recommendations and open questions correctly. Co-authored-by: Eddie Knight <knight@linux.com> Signed-off-by: Stephen Augustus <justaugustus@users.noreply.github.com> Signed-off-by: Stephen Augustus <foo@auggie.dev>
Add stakeholders to each open question and clarifying question in the proposal. Add a new Decision Priority Analysis section that organizes questions into tiers based on dependencies. Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: Stephen Augustus <foo@auggie.dev>
Integrate PR review feedback from Adolfo García Veytia (AMPEL maintainer) and Mike Lieberman: - Add AP-1 through AP-4 (Adolfo) and ML-1 (Mike) feedback sections - Create CQ-22 (attestation decomposition) and CQ-23 (mapping registry) from Adolfo's feedback - Elevate AMPEL alongside Minder throughout proposal and roadmap - Remove premature spec.md that was causing reviewer confusion - Update ecosystem tooling comparison table Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: Stephen Augustus <foo@auggie.dev>
Split the proposal into two documents to improve readability: - proposal.md: core proposal (motivation, scope, phased delivery, ecosystem positioning) - decisions.md: reviewer feedback, open questions, maintainer responses, and decision priority analysis Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: Stephen Augustus <foo@auggie.dev>
Major proposal revision based on Steering Committee direction: - Add mission statement and "evidence engine" identity framing - Add four-step processing model and three-tier architecture - Add six design principles (evidence is the product, probes normalize diversity, UNKNOWN-first honesty, all consumers are equal, no metadata monopolies, formats are presentation) - Reframe Security Insights as metadata ingestion layer (one source among several) - Add security-baseline dependency and two-layer mapping model - Add OSCAL Assessment Results to Phase 1 output formats - Flatten ecosystem positioning: all consumers are equal - Strengthen CRA language with compliance disclaimer - Use RFC 2119 SHOULD NOT for duplicate evaluation guidance - Note source type taxonomy as future design concept - Note full MVVSR as follow-up deliverable Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: Stephen Augustus <foo@auggie.dev>
Add Stephen's responses to gating and downstream questions: - CQ-19: Option C (hybrid), designed so that scaling back to Option A remains straightforward if needed - CQ-18: Enriched JSON + in-toto + Gemara + OSCAL (no custom "OSPS format") - CQ-17/CQ-23: Two-layer mapping model (security-baseline + Scorecard) - CQ-13: All consumers equal, RFC 2119 SHOULD NOT duplicate - CQ-21: Some duplication acceptable under Option C - CQ-22: OQ-1 decomposed into identity vs. tooling per Adolfo - CQ-1 update: parallel evaluation layers, not two modes - CQ-3 update: Security Insights reframed as metadata ingestion layer - Decision priority analysis updated with resolved status Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: Stephen Augustus <foo@auggie.dev>
Apply strategic direction to public-facing roadmap: - Rebrand theme to "Open Source Security Evidence Engine" - Add mission statement - Replace design principles with six new principles - Update Phase 1 deliverables: evidence model with multiple output formats (enriched JSON, in-toto, Gemara, OSCAL), two-layer mapping model, metadata ingestion layer - Move Gemara from Phase 2 to Phase 1 (transitive dependency) - Flatten ecosystem positioning: all consumers are equal - Use RFC 2119 SHOULD NOT for duplicate evaluation - Remove resolved evidence format open question Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: Stephen Augustus <foo@auggie.dev>
Remove individual OSPS Baseline control references from ROADMAP.md (covered in docs/osps-baseline-coverage.md). Add in-toto predicate links and existing probe mapping deliverable. Remove --fail-on=fail CI gating from all docs as an enforcement activity outside Scorecard's scope. Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: Stephen Augustus <foo@auggie.dev>
Convert three-tier evaluation model from ASCII to Mermaid to show parallel fan-out from probes to evaluation layers. Fix ORBIT ecosystem diagram: move Darnit into Enforcement & Audit subgraph, add missing Baseline-to-Darnit arrow, differentiate AMPEL relationship (informs policies vs defines controls), remove inaccurate SI-to-Minder arrow. Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: Stephen Augustus <foo@auggie.dev>
Add framework-agnostic conformance language, probe composition model (1:1 and many-to-1 mappings), bidirectional catalog framing, and future design concepts (framework CLI option, probe-level predicate type). Log feedback and responses in decisions.md. Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: Stephen Augustus <foo@auggie.dev>
Add Scorecard v6 framing with "Why v6" section, single-run architectural constraint, confidence scoring future concept, and Scorecard user feedback section (FL-1 through FL-4) from community meeting. Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: Stephen Augustus <foo@auggie.dev>
- Add Spencer's feedback section (SS-1 through SS-12) to decisions.md - Add Adam's feedback section (AK-1 through AK-8) to decisions.md proposal.md changes: - Expand MVVSR acronym on first use - Add "Why framework conformance in Scorecard?" section - Define "downstream tools" with examples - Add ORBIT WG context (not part of ORBIT, but ecosystem interop) - Replace SVR with new scorecard.dev/evidence/v0.1 predicate - Update to unified framework abstraction (drop "two-layer mapping") - Connect Processing model (temporal) and Three-tier model (structural) - Rename "Architectural constraints" to "Architectural target state" - Remove "Option A" references; inline architectural description - Clarify conformance layer includes both evaluation logic and formatting - Clarify catalog extraction as in-project control framework - Note that existing result/v0.1 predicate preserved (evidence/v0.1 is additive) - Defer cron to Phase 2+ (CLI + Action in Phase 1) - Add success criteria clarification (proposal acceptance = Phase 1 delivery) ROADMAP.md changes: - Add ORBIT WG context explanation - Update to unified framework abstraction - Update predicate references to evidence/v0.1 - Defer attestation mechanism to Phase 3 - Clarify enforcement boundary (detect not enforce) Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: Stephen Augustus <foo@auggie.dev>
- Replace "Open questions" section with inline design decision explanations - Remove all "pending OQ-1" and "pending OQ-2" references from Phase deliverables - Clarify attestation mechanism deferred to Phase 3 - Clarify enforcement detection boundary (detect not enforce) - Document predicate strategy, architecture, and unified framework abstraction inline - Add full reviewer attribution (Spencer, Adam, Eddie, Adolfo, Mike, Felix) Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: Stephen Augustus <foo@auggie.dev>
Signed-off-by: Stephen Augustus <foo@auggie.dev> Co-Authored-By: David A. Wheeler <dwheeler@dwheeler.com>
aa91b4f to
1889ff5
Compare
What kind of change does this PR introduce?
Documentation: Scorecard v6 proposal and 2026 roadmap.
Scorecard v6 evolves Scorecard from a scoring tool to an open source security
evidence engine. The primary initiative for 2026 is adding
OSPS Baseline conformance evaluation as the
first use case that proves this architecture.
Mission: Scorecard produces trusted, structured security evidence for the
open source ecosystem.
Scorecard accepts diverse inputs about a project's security practices,
normalizes them through probe-based analysis, and packages the resulting
evidence in interoperable formats for downstream tools to act on. Check scores
(0-10) and conformance labels (PASS/FAIL/UNKNOWN) are parallel evaluation
layers over the same probe evidence, produced in a single run. v6 is additive —
existing checks, probes, scores, and output formats are preserved.
The goal of this PR is to create a collaboration/decision-making nexus for
Scorecard and WG ORBIT tooling maintainers to ensure that we build interfaces
that easily interact with other tools and minimize duplication of work across
our maintainers and others in the OpenSSF ecosystem.
Key changes that warrant a major version:
OSPS Baseline or other frameworks via pluggable mapping definitions
alongside existing JSON and SARIF
artifact for external tools
Key design decisions addressed in the proposal:
Architecture — Scorecard owns probe execution and conformance evaluation;
interoperability at the output layer only
Predicate strategy — New
scorecard.dev/evidence/v0.1predicate type(framework-agnostic, probe-based evidence) alongside preserved
scorecard.dev/result/v0.1(existing check-based scores)Unified framework abstraction — Checks and OSPS Baseline both use the
same internal probe composition interface
Attestation mechanism — Phase 1 focuses on automatically verifiable
controls only; attestation design deferred to Phase 3
Enforcement detection boundary — Scorecard detects signals of enforcement
(e.g., "SCA tool configured," "SAST results required") but does not itself
enforce policies
PR title follows the guidelines defined in our pull request documentation
What is the current behavior?
Scorecard produces 0-10 check scores and structured probe findings. There is no
OSPS Baseline conformance evaluation capability and no public 2026 roadmap.
What is the new behavior (if this is a feature change)?
This PR adds documentation only (no code changes):
docs/ROADMAP.md— Public 2026 roadmap with phased delivery planopenspec/changes/osps-baseline-conformance/proposal.md— Detailed proposalcovering architecture, scope, phased delivery, and ecosystem positioning
openspec/changes/osps-baseline-conformance/decisions.md— Reviewer feedback,maintainer responses, and decision rationale
Tests for the changes have been added (for bug fixes/features)
N/A — documentation only.
Which issue(s) this PR fixes
NONE
Special notes for your reviewer
The proposal (proposal.md) is self-contained and includes all key design
decisions inline. The decisions.md file provides supplementary context with
the full reviewer feedback log and detailed decision rationale.
The control-by-control coverage analysis
is maintained separately.
Feedback from Eddie Knight (ORBIT WG TSC Chair), Adolfo García Veytia (AMPEL),
Mike Lieberman, Felix Lange, Spencer Schrock, and Adam Korczynski has been
incorporated. See decisions.md for the complete feedback log and responses.
Does this PR introduce a user-facing change?