Skip to content

User story: OSV developer #13

@boblord

Description

@boblord

In the form of: "As a role performing task, I struggle with problem because reason."

As a developer on the OSV open source vulnerability tracking program performing data conversion from CVE records to OSV records, I struggle with inconsistent and low-quality CVE data because the information is represented in many different ways and often placed in the wrong fields.

The OSV development team’s goal is to make vulnerability data matchable, so people maintaining open source projects can automatically determine whether their software is affected. However, the wide variation in how CNAs format and populate CVE records creates major obstacles. For example, affected version information can appear in multiple fields—or be expressed as nonsense like “Novemeber 15” (typo and all). Some researchers mistakenly paste links to their proof-of-concept exploits into the “repo” field instead of linking to the repository for the vulnerable codebase.

To handle these inconsistencies, the team maintains conversion libraries that transform CVE data into OSV format. These libraries include numerous guardrails, but there are limits to what code can fix automatically. The conversion can fail entirely when critical fields are missing or malformed. Manual correction at scale is not feasible, so they can only make best-effort attempts to detect and adjust for common data problems.

When the CVE record includes reliable indicators—such as a valid repository URL and a specific commit hash—the conversion process usually works well. But when records are incomplete or inaccurate, OSV’s downstream consumers suffer. Their own vulnerability scanners and dependency management tools may need manual investigation to verify whether projects are affected. This slows response times and increases the risk that maintainers will wrongly conclude they are not vulnerable, putting both themselves and their downstream users at risk.

Because OSV is widely used across the open-source ecosystem, poor CVE data quality cascades outward, amplifying harm. If the OSV team could fix just one thing, it would be ensuring that version ranges are consistently and accurately represented—that single improvement would dramatically increase the reliability of automated vulnerability matching. Other attributes such as CVSS scores and CWE identifiers are useful for triage and classification, but consistent version information is the foundation on which all other automation depends.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions