Skip to content

ship 0.5.0: bug-fix sweep and foundational agent capabilities #133

@cchinchilla-dev

Description

@cchinchilla-dev

Description

Master tracking issue for the 0.5.0 release. Scope is deliberately narrow: the security and correctness fixes from the recent code review, plus the foundational LLM capabilities that transform AgentLoom from a "YAML workflow orchestrator with LLM calls" into a framework that can express real agents.

Current version: 0.4.0 → Target: 0.5.0.

The version jump is justified by the scope:

  • Security: closing the router RCE and sandbox bypass surface (fix router expression sandbox to prevent arbitrary code execution #104, harden tool sandbox against command, path, and URL-scheme bypasses #105).
  • Correctness: gateway resilience, record/replay race, DAG skip propagation, provider adapter normalization, state lock bypass, subworkflow observability — eight bug-fix issues in total.
  • Cost correctness: reasoning tokens (o1/o3/Claude thinking) are unbilled today; fixing the accounting changes reported costs.
  • Foundational LLM primitives: tool calling, structured output, embeddings, conversation history, Agent primitive — none of which exist today, and all of which are prerequisites for any modern agent workflow.
  • Observability schema: align with OTel GenAI semantic conventions, centralize span/metric names, and expose custom dimensions needed by downstream trace consumers.

Out of scope for 0.5.0 (deferred to 0.6.x or later) is listed at the bottom.

How to use this issue

  • Each phase is a coherent body of work with shared dependencies.
  • Issues within a phase are parallelizable unless explicitly noted.
  • GitHub renders the task lists below as tracked tasks: checkboxes update automatically when each linked issue closes, and the parent shows aggregate progress.

Phase 0 — Bug-fix sweep and cost correctness

The bug-fix backlog from the security and correctness review, plus the reasoning-tokens cost bug. Everything downstream depends on a clean baseline.

Parallelization: all 9 simultaneously. Suggested split — security duo (#104, #105), providers/resilience trio (#106, #107, #109), core trio (#108, #110, #111), correctness solo (#127).


Phase 1 — Internal hygiene refactor

Foundation work for the LLM-primitive phase — collaborator protocols and shared retry primitives so subsequent issues (#116 tool calling, #120 Agent) have stable seams to plug into.

Sequencing: single PR. Scope is the typing protocols, heapq topo sort, retry-primitive sharing, and retryable_status_codes. The collaborator-class split (LayerRunner / RouterActivation / StepRetryLoop / BudgetEnforcer) is deferred — the current engine.py size is acceptable until #116 / #120 prove a refactor is needed.

Parallelization with Phase 0: can start once #108 lands — the typing work uses the iterative DFS rewrite as its baseline.


Phase 2 — Observability foundation

Stabilize the observability contract before adding features that emit data. Every feature landed after this phase will emit with the correct schema from day one — no second migration later. This phase also unblocks downstream trace consumers that need custom dimensions like version or run tags.

Parallelization: #125 lands the schema.py module first; #77 and #59 follow once the namespace exists.


Phase 3 — LLM core primitives

The three capabilities AgentLoom currently lacks structurally. Foundation for every agentic workflow.

Parallelization: 3 independent PRs in parallel. #116 is the largest (likely 4 sub-PRs: one per provider plus the unified surface).


Phase 4 — Composition primitives

Step types and conversation primitive. Most are independent of Phase 3; #119 strictly depends on #116.

Parallelization: #38, #39, #43 in parallel from day one. #119 starts when #116 closes.


Phase 5 — Agent + typed state + templating

The capability layer. #120 is the deliverable that makes AgentLoom express real agents end-to-end: system prompt plus tools plus memory plus a bounded decision loop plus typed output.

Parallelization: #128 and #129 in parallel; #120 last.


Phase 6 — Testing infrastructure

Primitives that make workflows testable at scale: deterministic failure replay, stress-test chaos injection, cross-provider benchmarking, and a deterministic inter-agent replay contract.

Parallelization: #62 and #124 immediately. #123 after Phase 0 (#106, #107). #135 after #107.


Cross-phase dependency map

Phase 0 (parallel sweep)
  |
  +--> Phase 1 (#112)
  |
  +--> Phase 2 (#125 -> #77, #59)
  |      |
  |      +--> Phase 3 (#116, #117, #118 parallel)
  |             |
  |             +--> #119 (#116)
  |             +--> #128 (#117)
  |             +--> #124 (#118)
  |
  +--> Phase 4 (parallel: #38, #39, #43)
  |
  +--> Phase 5 (#128, #129 parallel; #120 closes)
  |
  +--> Phase 6 (#62, #124 anytime; #123 and #135 after Phase 0)

What is deliberately not in 0.5.0

The deferred items are tracked but not gated by 0.5.0:

Provider catalog (defer to 0.6.0):

Production scale beyond cost correctness (defer to 0.6.x):

Security hardening for regulated industries (defer to 0.6.x or 0.7.x):

DX polish (defer to 0.6.x+):

Step types and primitives not on the 0.5.0 critical path:

What 0.5.0 unlocks

After 0.5.0 ships:

  • AgentLoom is production-deployable: all known critical bugs closed, cost reporting correct.
  • Agent workflows are expressible natively: tool calling, structured output, conversation state, and the Agent primitive cover the modern agent pattern end-to-end.
  • Comparative benchmarks possible: agentloom bench enables side-by-side evaluation across provider configs with unified reporting.
  • Observability is contract-stable: schemas centralized in schema.py, gen_ai.* aligned, custom dimensions propagating. External consumers can build trace-parsing code that survives future releases without breakage.

The 32 deferred issues remain valuable but are not on the critical path for those four outcomes.

Issue inventory (27 total)

Phase Count Issues
0 — Bug fix sweep 9 #104, #105, #106, #107, #108, #109, #110, #111, #127
1 — Internal hygiene 1 #112
2 — Observability 3 #125, #77, #59
3 — LLM core 3 #116, #117, #118
4 — Composition 4 #38, #39, #43, #119
5 — Agent + typing 3 #128, #129, #120
6 — Testing infra 4 #123, #62, #124, #135

Total in 0.5.0: 27 issues (8 critical bugs + 1 cost bug + 1 refactor + 17 features).
Deferred to 0.6.x+: 32 issues.

Notes

  • This is a release-tracking issue, not an implementation issue. Each child issue retains its own discussion, scope, and PR linkage.
  • Progress is visible at the top of this issue once any child closes.
  • Adjust phase membership freely as scope clarifies — this is a living plan, not a contract.
  • If a deferred issue turns out to be blocking a consumer after all, promote it into the appropriate phase; if a 0.5.0 issue turns out to be smaller than expected, ship it early.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requestreleaseVersion bump and release prep

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions