You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Create a robust versioning system for graphs that enables zero-downtime updates by ensuring running workflows continue with their original graph version while new triggers use the latest version.
Problem Statement
Currently, Exosphere's graph templates are mutable entities identified only by (namespace, name). When a graph is updated via the upsert endpoint, the existing template is directly overwritten. This creates several critical issues:
Running workflows may break - If a graph is updated while workflows are executing, the in-flight states may reference nodes or inputs that no longer exist
No version history - There's no audit trail of graph changes
No rollback capability - If a bad graph version is deployed, there's no quick way to revert
Coupling between deployment and execution - Updates require coordination to ensure no workflows are running
Goals
Zero-downtime updates: Deploy new graph versions without affecting running workflows
Version isolation: Each workflow run is pinned to a specific graph version
Simple updates: Developers can push updates without worrying about in-flight executions
Auditability: Complete history of graph versions with timestamps and metadata
Rollback support: Quickly revert to a previous known-good version
Non-Goals
Real-time migration of running workflows to new versions
Summary
Create a robust versioning system for graphs that enables zero-downtime updates by ensuring running workflows continue with their original graph version while new triggers use the latest version.
Problem Statement
Currently, Exosphere's graph templates are mutable entities identified only by
(namespace, name). When a graph is updated via the upsert endpoint, the existing template is directly overwritten. This creates several critical issues:Goals
Non-Goals
Proposed Solution
High-Level Architecture
flowchart TB subgraph "Current State" GT1[GraphTemplate<br/>name + namespace] S1[State] --> |references| GT1 end subgraph "Proposed State" GTV[GraphTemplateVersion<br/>name + namespace + version] GTL[GraphTemplate<br/>name + namespace<br/>latest_version pointer] GTL --> |points to| GTV S2[State] --> |pinned to| GTV GTV1[Version 1] GTV2[Version 2] GTV3[Version 3 - Latest] GTL --> GTV3 endThe core idea is to:
GraphTemplate) to the active versionEdge Cases and Considerations
latest_valid_versionSecurity Considerations
Observability
New Metrics
graph_versions_total{namespace, graph_name}- Total versions per graphgraph_version_active{namespace, graph_name, version}- Currently active versionruns_by_version{namespace, graph_name, version}- Runs per versionDashboard Enhancements
Open Questions
References
GraphTemplatemodel:state-manager/app/models/db/graph_template_model.pyStatemodel:state-manager/app/models/db/state.pystate-manager/app/controller/upsert_graph_template.pystate-manager/app/controller/trigger_graph.pyGoals
Design and plan a simple, effective system for supporting zero-downtime deployment of graph templates. The solution should: