Skip to content

Add schema fingerprinting to Tool and Flow #48

@dgenio

Description

@dgenio

Context

Split from #22 (schema fingerprinting, drift detection, and flow status management). This issue focuses on the schema fingerprinting utility and Tool.schema_hash property.

What to do

1. Schema fingerprinting utility

Add a utility function to compute a deterministic hash of a Pydantic model's JSON Schema:

# chainweaver/compat.py
import hashlib, json

def schema_fingerprint(model: Type[BaseModel]) -> str:
    """Compute a SHA-256 fingerprint of a Pydantic model's JSON Schema."""
    schema = model.model_json_schema()
    canonical = json.dumps(schema, sort_keys=True, separators=(",", ":"))
    return hashlib.sha256(canonical.encode()).hexdigest()[:16]

2. Tool schema hash properties

Add computed properties to Tool:

class Tool:
    @property
    def input_schema_hash(self) -> str:
        return schema_fingerprint(self.input_schema)

    @property
    def output_schema_hash(self) -> str:
        return schema_fingerprint(self.output_schema)

    @property
    def schema_hash(self) -> str:
        """Combined hash of input + output schemas."""
        combined = self.input_schema_hash + self.output_schema_hash
        return hashlib.sha256(combined.encode()).hexdigest()[:16]

3. Flow-level schema snapshot

When a flow is compiled/registered with validation, store the tool schema hashes:

class Flow(BaseModel):
    # ... existing fields ...
    tool_schema_hashes: Optional[dict[str, str]] = None  # tool_name → schema_hash

4. Compatibility validator

def check_flow_compatibility(
    flow: Flow,
    tools: dict[str, Tool],
    expected_hashes: Optional[dict[str, str]] = None,
) -> list[CompatibilityIssue]:
    """
    Check that each step's tool exists and its schema matches expectations.
    Returns a list of issues (empty = compatible).
    """

Files to create/modify

  • chainweaver/compat.py — new module with fingerprinting + validator
  • chainweaver/tools.py — add hash properties to Tool
  • chainweaver/flow.py — add tool_schema_hashes to Flow
  • tests/test_compat.py — new test file

Acceptance Criteria

  • schema_fingerprint(model) produces a deterministic, stable hash string
  • Same model always produces the same fingerprint
  • Different models (different fields, types, or constraints) produce different fingerprints
  • Tool.schema_hash property exists and is computed from input + output schemas
  • check_flow_compatibility() detects: missing tools, schema hash mismatches
  • Flow.tool_schema_hashes stores a snapshot at registration/compile time
  • At least 6 test cases: same schema → same hash, field added → different hash, field renamed → different hash, compatibility pass, compatibility fail, missing tool

Out of Scope

  • Drift detection on tool re-registration (see split issue)
  • Flow status management (see split issue)
  • Automatic schema migration
  • Semantic compatibility (e.g., int compatible with float)

Notes

  • The 16-char hex prefix is sufficient for collision avoidance in practical scenarios.
  • Pydantic v2's model_json_schema() output is stable for the same model definition — but beware of Python version differences in dict ordering (mitigated by sort_keys=True).
  • This pairs with Add flow and tool schema versioning with compatibility checks #15 (versioning) — schema hashes are the mechanism behind version compatibility checks.

Split from #22. See also: Flow status management issue, Schema drift detection issue.

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions