You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Master tracking issue for the 0.5.0 release. Scope is deliberately narrow: the security and correctness fixes from the recent code review, plus the foundational LLM capabilities that transform AgentLoom from a "YAML workflow orchestrator with LLM calls" into a framework that can express real agents.
Correctness: gateway resilience, record/replay race, DAG skip propagation, provider adapter normalization, state lock bypass, subworkflow observability — eight bug-fix issues in total.
Cost correctness: reasoning tokens (o1/o3/Claude thinking) are unbilled today; fixing the accounting changes reported costs.
Foundational LLM primitives: tool calling, structured output, embeddings, conversation history, Agent primitive — none of which exist today, and all of which are prerequisites for any modern agent workflow.
Observability schema: align with OTel GenAI semantic conventions, centralize span/metric names, and expose custom dimensions needed by downstream trace consumers.
Out of scope for 0.5.0 (deferred to 0.6.x or later) is listed at the bottom.
How to use this issue
Each phase is a coherent body of work with shared dependencies.
Issues within a phase are parallelizable unless explicitly noted.
GitHub renders the task lists below as tracked tasks: checkboxes update automatically when each linked issue closes, and the parent shows aggregate progress.
Phase 0 — Bug-fix sweep and cost correctness
The bug-fix backlog from the security and correctness review, plus the reasoning-tokens cost bug. Everything downstream depends on a clean baseline.
Parallelization: all 9 simultaneously. Suggested split — security duo (#104, #105), providers/resilience trio (#106, #107, #109), core trio (#108, #110, #111), correctness solo (#127).
Phase 1 — Internal hygiene refactor
Foundation work for the LLM-primitive phase — collaborator protocols and shared retry primitives so subsequent issues (#116 tool calling, #120 Agent) have stable seams to plug into.
Sequencing: single PR. Scope is the typing protocols, heapq topo sort, retry-primitive sharing, and retryable_status_codes. The collaborator-class split (LayerRunner / RouterActivation / StepRetryLoop / BudgetEnforcer) is deferred — the current engine.py size is acceptable until #116 / #120 prove a refactor is needed.
Parallelization with Phase 0: can start once #108 lands — the typing work uses the iterative DFS rewrite as its baseline.
Phase 2 — Observability foundation
Stabilize the observability contract before adding features that emit data. Every feature landed after this phase will emit with the correct schema from day one — no second migration later. This phase also unblocks downstream trace consumers that need custom dimensions like version or run tags.
Parallelization:#38, #39, #43 in parallel from day one. #119 starts when #116 closes.
Phase 5 — Agent + typed state + templating
The capability layer. #120 is the deliverable that makes AgentLoom express real agents end-to-end: system prompt plus tools plus memory plus a bounded decision loop plus typed output.
Parallelization:#128 and #129 in parallel; #120 last.
Phase 6 — Testing infrastructure
Primitives that make workflows testable at scale: deterministic failure replay, stress-test chaos injection, cross-provider benchmarking, and a deterministic inter-agent replay contract.
AgentLoom is production-deployable: all known critical bugs closed, cost reporting correct.
Agent workflows are expressible natively: tool calling, structured output, conversation state, and the Agent primitive cover the modern agent pattern end-to-end.
Comparative benchmarks possible: agentloom bench enables side-by-side evaluation across provider configs with unified reporting.
Observability is contract-stable: schemas centralized in schema.py, gen_ai.* aligned, custom dimensions propagating. External consumers can build trace-parsing code that survives future releases without breakage.
The 32 deferred issues remain valuable but are not on the critical path for those four outcomes.
Total in 0.5.0: 27 issues (8 critical bugs + 1 cost bug + 1 refactor + 17 features). Deferred to 0.6.x+: 32 issues.
Notes
This is a release-tracking issue, not an implementation issue. Each child issue retains its own discussion, scope, and PR linkage.
Progress is visible at the top of this issue once any child closes.
Adjust phase membership freely as scope clarifies — this is a living plan, not a contract.
If a deferred issue turns out to be blocking a consumer after all, promote it into the appropriate phase; if a 0.5.0 issue turns out to be smaller than expected, ship it early.
Description
Master tracking issue for the 0.5.0 release. Scope is deliberately narrow: the security and correctness fixes from the recent code review, plus the foundational LLM capabilities that transform AgentLoom from a "YAML workflow orchestrator with LLM calls" into a framework that can express real agents.
Current version: 0.4.0 → Target: 0.5.0.
The version jump is justified by the scope:
Out of scope for 0.5.0 (deferred to 0.6.x or later) is listed at the bottom.
How to use this issue
Phase 0 — Bug-fix sweep and cost correctness
The bug-fix backlog from the security and correctness review, plus the reasoning-tokens cost bug. Everything downstream depends on a clean baseline.
Parallelization: all 9 simultaneously. Suggested split — security duo (#104, #105), providers/resilience trio (#106, #107, #109), core trio (#108, #110, #111), correctness solo (#127).
Phase 1 — Internal hygiene refactor
Foundation work for the LLM-primitive phase — collaborator protocols and shared retry primitives so subsequent issues (#116 tool calling, #120 Agent) have stable seams to plug into.
Sequencing: single PR. Scope is the typing protocols, heapq topo sort, retry-primitive sharing, and
retryable_status_codes. The collaborator-class split (LayerRunner/RouterActivation/StepRetryLoop/BudgetEnforcer) is deferred — the currentengine.pysize is acceptable until #116 / #120 prove a refactor is needed.Parallelization with Phase 0: can start once #108 lands — the typing work uses the iterative DFS rewrite as its baseline.
Phase 2 — Observability foundation
Stabilize the observability contract before adding features that emit data. Every feature landed after this phase will emit with the correct schema from day one — no second migration later. This phase also unblocks downstream trace consumers that need custom dimensions like version or run tags.
Parallelization: #125 lands the
schema.pymodule first; #77 and #59 follow once the namespace exists.Phase 3 — LLM core primitives
The three capabilities AgentLoom currently lacks structurally. Foundation for every agentic workflow.
Parallelization: 3 independent PRs in parallel. #116 is the largest (likely 4 sub-PRs: one per provider plus the unified surface).
Phase 4 — Composition primitives
Step types and conversation primitive. Most are independent of Phase 3; #119 strictly depends on #116.
Parallelization: #38, #39, #43 in parallel from day one. #119 starts when #116 closes.
Phase 5 — Agent + typed state + templating
The capability layer. #120 is the deliverable that makes AgentLoom express real agents end-to-end: system prompt plus tools plus memory plus a bounded decision loop plus typed output.
Parallelization: #128 and #129 in parallel; #120 last.
Phase 6 — Testing infrastructure
Primitives that make workflows testable at scale: deterministic failure replay, stress-test chaos injection, cross-provider benchmarking, and a deterministic inter-agent replay contract.
Parallelization: #62 and #124 immediately. #123 after Phase 0 (#106, #107). #135 after #107.
Cross-phase dependency map
What is deliberately not in 0.5.0
The deferred items are tracked but not gated by 0.5.0:
Provider catalog (defer to 0.6.0):
Production scale beyond cost correctness (defer to 0.6.x):
Security hardening for regulated industries (defer to 0.6.x or 0.7.x):
DX polish (defer to 0.6.x+):
Step types and primitives not on the 0.5.0 critical path:
What 0.5.0 unlocks
After 0.5.0 ships:
agentloom benchenables side-by-side evaluation across provider configs with unified reporting.schema.py,gen_ai.*aligned, custom dimensions propagating. External consumers can build trace-parsing code that survives future releases without breakage.The 32 deferred issues remain valuable but are not on the critical path for those four outcomes.
Issue inventory (27 total)
Total in 0.5.0: 27 issues (8 critical bugs + 1 cost bug + 1 refactor + 17 features).
Deferred to 0.6.x+: 32 issues.
Notes