Skip to content

ship 0.2.0: single-agent MVP with LangChain quickstart under 10 minutes #18

@cchinchilla-dev

Description

@cchinchilla-dev

Description

Master tracking issue for the 0.2.0 release. Scope is the single-agent MVP: from a user's perspective, pip install agentanvil + a contract YAML + a LangChain agent = a passing test run in under 10 minutes, with a reproducible report.

Current version: 0.1.x (foundation patches) → Target: 0.2.0.

The version jump is justified by the scope:

  • Static consistency analyzer on the contract model
  • Analyzer, Generator, DockerRunner, Reporter — the full orchestration loop lands.
  • Evaluator layers 1 + 2 (single judge) — covers all objective metrics of Layer 1 and a single LLM-as-judge for Layer 2 (ensemble deferred to 0.3.0).
  • AgentLoomBackend guarded by agentloom[contracts,observability]>=0.5.0 optional dependency.
  • Record/replay envelope complete for single-agent flows with determinism CI job.
  • Quickstart LangChain example ≤ 10 min verified in CI.
  • Portability invariant CI job certifies pip install agentanvil (no extras) runs end-to-end via DirectBackend.
  • mkdocs-material docs site scaffolded.

Out of scope for 0.2.0 (deferred to 0.3.0 and later) is listed at the bottom.

How to use this issue

  • Each item is a coherent body of work with shared dependencies.
  • Items within a phase are parallelizable unless explicitly noted.

Phase A — Contract model completion

The static consistency analyzer is the deliverable of 0.2.0.

Parallelization: #4 is independent of #5 and #6. #5 and #6 are independent once contract types are stable.

Phase B — Runners and backends

Parallelization: all three in parallel. #7 depends on Runner ABC from 0.1.x; #8 depends on DirectBackend skeleton from 0.1.x; #9 depends on AgentLoom 0.5.0 availability.

Phase C — Evaluator and reporter

Parallelization: #10 and #11 in parallel after contract types are stable. #12 depends on evaluator output shape.

Phase D — CLI, record/replay, quickstart, CI

Parallelization: #13 depends on all earlier modules. #14 depends on #13. #15 can start once #14 lands. #16 independent.

Cross-phase dependency map

Phase A (Contract + Analyzer + Generator)
  |
  +--> Phase C (Evaluator + Reporter)
  |     |
  |     +--> Phase D (CLI, Quickstart, CI, Docs)
Phase B (Runners + Backends)
  |
  +--> Phase C

What is deliberately not in 0.2.0

The deferred items are tracked but not gated by 0.2.0:

Multi-agent and A2A (defer to 0.3.0):

  • Multi-agent topologies, ConversationGraph, pathology detection.
  • A2A spec pinning, conformance suite, differential testing.

Corpus and statistical analysis (defer to 0.3.0):

  • Dataset ingestion adapters.
  • Ensemble Agent-as-Judge.
  • Active sampling.
  • Five inferential contrasts.

Transversal (defer to 0.4.0):

  • Regression detection.
  • Security coverage (prompt injection, PII, secrets).
  • CI/CD integration templates.
  • K8sRunner.
  • ≥ 15 case studies execution.

0.5.0 release (defer to 0.5.0):

  • Replication package generator.
  • SBOM + Sigstore.
  • DOI Zenodo.

What 0.2.0 unlocks

After 0.2.0 ships:

  • First Tier A single-agent case studies are runnable: contract + scenario + recording + report.
  • Portability invariant is CI-enforced: pip install agentanvil (no extras) works end-to-end via DirectBackend.
  • Determinism is CI-enforced: record → replay × 100 yields identical reports.
  • Docs site is live: getting started, contract reference, backends, evaluators.
  • the formal-contract reference can be drafted from the static analyzer implementation.
  • Quickstart LangChain < 10 min is verifiable on a fresh venv.

Issue inventory (14 total)

Phase Count Issues
A — Contract completion 3 #4, #5, #6
B — Runners + backends 3 #7, #8, #9
C — Evaluator + reporter 3 #10, #11, #12
D — CLI, quickstart, CI, docs, case study 5 #13, #14, #15, #16, #17

Total in 0.2.0: 14 issues (12 enhancement + 1 chore + 1 docs).

Notes

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions