From 3daa0e9d858f3af24c9c968473cf53b3462cdcef Mon Sep 17 00:00:00 2001 From: Ruben Fiszel Date: Wed, 15 Apr 2026 11:24:30 +0000 Subject: [PATCH 01/12] docs: add blog post primer comparing workflow engines (Airflow, Prefect, Temporal, Inngest, Windmill) Co-Authored-By: Claude Opus 4.6 (1M context) --- .../index.mdx | 1088 +++++++++++++++++ 1 file changed, 1088 insertions(+) create mode 100644 blog/2026-04-15-workflow-engines-primer/index.mdx diff --git a/blog/2026-04-15-workflow-engines-primer/index.mdx b/blog/2026-04-15-workflow-engines-primer/index.mdx new file mode 100644 index 000000000..334f082af --- /dev/null +++ b/blog/2026-04-15-workflow-engines-primer/index.mdx @@ -0,0 +1,1088 @@ +--- +slug: workflow-engines-primer +authors: [rubenfiszel] +tags: ['Benchmarks', 'Workflow Engines'] +description: 'A deep dive comparing Airflow, Prefect, Temporal, Inngest, and Windmill — how they work internally, their trade-offs, and real benchmarks. Plus Restate, DBOS, and Hatchet.' +title: 'From Cron to Durable Execution: A Primer on Workflow Engines' +--- + +import DocCard from '@site/src/components/DocCard'; + +A deep dive comparing Airflow, Prefect, Temporal, Inngest, and Windmill — how they work internally, their trade-offs, and real benchmarks. Plus honorable mentions for Restate, DBOS, and Hatchet. + +{/* truncate */} + +--- + +## Why Workflow Engines Exist + +Every backend eventually grows a function like this: + +```typescript +async function processOrder(order: Order) { + const validated = await validateInventory(order); + const payment = await chargePayment(validated); + const shipment = await createShipment(payment); + await sendConfirmationEmail(shipment); +} +``` + +This works until it doesn't. What happens when the server crashes after `chargePayment` but before `createShipment`? The customer was charged, but nothing shipped. Do you retry? You'd charge them twice. Do you skip? They paid but get nothing. + +The fundamental problem: **a sequence of side-effects spread across time and network boundaries cannot be made atomic.** You can wrap two database writes in a transaction, but you can't wrap "call Stripe" + "call FedEx" + "call SendGrid" in one. + +Every workflow engine is a different answer to the same question: **how do you coordinate multiple fallible side-effects so that the overall process makes progress, even when individual steps fail?** + +The answers cluster into three generations, each with a different core abstraction. + +--- + +## The Three Generations + +``` +Generation 1: DAG Schedulers Airflow, Prefect + "Define a graph of tasks, + a scheduler runs them in order" + +Generation 2: Durable Execution Temporal, Inngest, Windmill WAC + "Write normal code, the runtime + makes it survive crashes" + +Hybrid: Visual Flow Builder Windmill Flows + "Drag-and-drop steps, + JSON-defined DAG with code steps" +``` + +The shift from Gen 1 to Gen 2 mirrors a broader shift in computer science: from **declarative** (describe the computation) to **imperative** (write the computation, let the infrastructure handle durability). Neither is universally better — they solve different problems. + +--- + +## Generation 1: DAG Schedulers + +### The Abstraction + +A DAG scheduler separates **what to do** (your task code) from **when and where to do it** (the scheduler's job). You declare tasks and their dependencies as a directed acyclic graph. The scheduler inspects the graph, determines which tasks are ready, and dispatches them. + +The key property: **tasks are independent units of work.** They don't share memory. They don't know about each other. They communicate through external storage. The scheduler is the only component that understands the full picture. + +``` +┌──────────────────────────────────────────────────────┐ +│ DAG Scheduler Model │ +│ │ +│ You define: Scheduler does: │ +│ │ +│ [Task A] ──┐ 1. Parse graph │ +│ ├──→ 2. Poll: which tasks are ready? │ +│ [Task B] ──┘ 3. Dispatch ready tasks │ +│ │ 4. Wait for completion │ +│ ▼ 5. Repeat from 2 │ +│ [Task C] │ +│ │ +│ Data passes via external storage (DB, S3, XCom) │ +│ Tasks are independent processes │ +└──────────────────────────────────────────────────────┘ +``` + +### Airflow: The Incumbent + +Airflow (Airbnb, 2014) is the canonical DAG scheduler. You write Python files that define DAGs: + +```python +from airflow.decorators import dag, task +from datetime import datetime + +@task +def extract(): + return {"data": [1, 2, 3]} + +@task +def transform(raw): + return [x * 2 for x in raw["data"]] + +@task +def load(transformed): + db.insert(transformed) + +@dag(schedule="@hourly", start_date=datetime(2024, 1, 1)) +def etl_pipeline(): + raw = extract() + transformed = transform(raw) + load(transformed) +``` + +**The fundamental misunderstanding about Airflow**: this looks like Python calling functions, but it isn't. At parse time, no functions execute. Airflow builds a dependency graph from the return value annotations. The actual execution happens later — possibly minutes later, on a different machine. + +#### How the Scheduler Works + +The Airflow scheduler is a **polling loop over a relational database**: + +``` +Every ~5 seconds: + 1. Parse all DAG Python files (discover tasks, dependencies) + 2. Query DB: which DagRuns need new TaskInstances? + 3. Query DB: which TaskInstances are ready to run? + 4. Enter critical section (SELECT ... FOR UPDATE) + 5. Check pool limits, concurrency limits + 6. Enqueue ready tasks to the executor +``` + +Each task passes through a state machine stored in the database: + +``` + none → scheduled → queued → running → success + └──→ failed → up_for_retry → scheduled → ... +``` + +Every state transition is a database write. The scheduler owns `scheduled → queued`. The executor owns `queued → running`. The worker owns `running → success/failed`. + +#### Data Passing: The XCom Problem + +Since tasks run in separate processes (possibly different machines), data must be serialized to shared storage. Airflow calls this "XCom" (cross-communication): + +```python +# Task A: push data +@task +def extract(): + return {"data": [1, 2, 3]} # Serialized to DB as a JSON blob + +# Task B: receives via function argument (deserialized from DB) +@task +def transform(raw): # raw = the JSON blob from extract() + return [x * 2 for x in raw["data"]] +``` + +Under the hood, this is `INSERT INTO xcom (key, value, ...) VALUES (...)` and `SELECT value FROM xcom WHERE ...`. There's typically a size limit (48KB in Postgres by default). For anything larger, you must use external storage (S3) and pass references. + +This is not a limitation of Airflow's implementation — it's inherent to the DAG scheduler model. If tasks are independent processes, they can't share memory, so all inter-task data flows through external storage. + +#### The Executor Layer + +Airflow's executor is pluggable — one of its best design decisions: + +- **LocalExecutor**: forks a subprocess per task. Simple, single-machine. +- **CeleryExecutor**: sends tasks to a message broker (Redis/RabbitMQ). Celery workers pick them up. Most common production setup. +- **KubernetesExecutor**: spins up a fresh Kubernetes pod per task. Maximum isolation, ~10-30s cold start per task. + +Each executor makes a different trade-off between isolation, latency, and operational complexity. But all share the fundamental constraint: **each task is an independent execution unit**. + +#### What Airflow Gets Right + +- **Massive ecosystem**: hundreds of "operators" (pre-built integrations) for AWS, GCP, databases, Spark, dbt, etc. +- **Scheduling**: sophisticated time-based scheduling with backfill, catchup, data intervals. +- **Monitoring**: built-in UI showing DAG runs, task statuses, logs, Gantt charts. +- **Battle-tested**: runs at Airbnb, Google, PayPal, thousands of companies. You will find answers on StackOverflow. + +#### What Airflow Gets Wrong + +**Latency.** A task takes 2-40 seconds to start (scheduler loop + executor dispatch + cold start), even if the actual work takes 1ms. This is fine for ETL pipelines where tasks run for minutes, but makes Airflow useless for real-time workflows. + +**Static DAGs.** The dependency graph is fixed at parse time. You can't say "if the result of Task A is X, skip Task B" in a truly dynamic way. Airflow 2.x added `@task.branch` and dynamic task mapping, but these are limited — you're still declaring branches upfront, not writing arbitrary control flow. + +**No durable execution.** If a task crashes mid-execution, all progress within that task is lost. Airflow retries the entire task from the beginning. There's no concept of "resume from where it left off." + +**Parse overhead.** The scheduler re-parses all Python DAG files periodically. With thousands of DAGs, this alone can consume significant CPU and cause scheduling delays. + +--- + +### Prefect: The Pythonic Successor + +Prefect (2018) was built explicitly as "Airflow, but for Python developers who want less ceremony." Its core insight: **use Python's native execution model instead of fighting it.** + +```python +from prefect import flow, task + +@task +def extract(): + return [1, 2, 3] + +@task +def transform(data): + return [x * 2 for x in data] + +@task +def load(results): + db.insert(results) + +@flow +def etl_pipeline(): + data = extract() + transformed = transform(data) + load(transformed) +``` + +This looks almost identical to Airflow, but with a crucial difference: **the code actually runs as Python.** When `etl_pipeline()` is called, `extract()` really executes `extract()`. There's no graph construction phase — the DAG is implicit from the call order. + +#### The Hybrid Execution Model + +Prefect sits between Generation 1 and Generation 2. Tasks execute in the same process as the flow (by default), so there's no XCom problem — data passes through Python variables. But each task run is tracked by the Prefect server via a REST API: + +``` +@task runs: + 1. POST /task_runs → server creates TaskRun with state Pending + 2. PUT /task_runs/{id}/state → Running + 3. function body executes (in same Python process) + 4. PUT /task_runs/{id}/state → Completed (with result) +``` + +Every state transition is an HTTP call to the Prefect API server, which persists it in Postgres. + +#### Concurrency via Futures + +Prefect uses Python's native async/futures for parallelism: + +```python +@flow +def parallel_pipeline(): + futures = [transform.submit(item) for item in items] # Submit all + results = [f.result() for f in futures] # Collect +``` + +`.submit()` creates a future (using Python's `concurrent.futures` or a task runner). The function call runs in a thread/process pool. This is simpler than Airflow's DAG-level parallelism but limited by Python's GIL for CPU-bound work. + +#### What Prefect Gets Right + +- **Zero new concepts for Python developers.** Decorators on regular functions. Python control flow. Python data passing. +- **Dynamic workflows.** Since the code is real Python, you can use `if/else`, `for` loops, `try/except` — anything. The "DAG" is whatever Python actually executes. +- **Lower ceremony than Airflow.** No scheduler process. No DAG file parsing. Just run the flow. + +#### What Prefect Gets Wrong + +- **No durable execution.** Like Airflow, if the process crashes mid-task, work is lost. Task-level retries restart the task from the beginning. +- **State-tracking overhead.** Every task run creates multiple HTTP calls + DB writes for state transitions (Pending → Running → Completed). For workflows with hundreds of short tasks, this overhead dominates. +- **Python-only.** The server is Python (FastAPI). The workers are Python. The SDK is Python. If your workflow involves non-Python code, Prefect can shell out, but there's no native multi-language support. +- **No server-side sleep.** `time.sleep(60)` in a flow holds the worker process for 60 seconds. There's no "schedule me to wake up in 60 seconds" primitive (unlike Temporal or Windmill). + +--- + +### The DAG Scheduler Trade-off + +Both Airflow and Prefect share the same fundamental model: **tasks are tracked externally, data passes through storage, and the orchestrator drives execution.** The workflow code describes what to do, but doesn't directly control how it's executed. + +``` +Pro: Simple mental model. Tasks are independent. Easy to monitor. +Pro: Mature ecosystems (especially Airflow). +Pro: Natural fit for scheduled batch processing. + +Con: No durable execution within a task. +Con: High per-task overhead (state transitions, data serialization). +Con: Static or weakly dynamic control flow (Airflow worse, Prefect better). +Con: Data passing is clunky (XCom / serialization / size limits). +``` + +For scheduled ETL pipelines where tasks run for minutes, these trade-offs are excellent. For real-time, latency-sensitive, or long-running workflows, they're not. + +--- + +## Generation 2: Durable Execution + +### The Abstraction + +Durable execution inverts the DAG scheduler model: **instead of an external orchestrator driving tasks, the workflow code drives itself, and the runtime makes the code survive crashes.** + +You write what looks like a normal program: + +```typescript +async function processOrder(order) { + const payment = await chargePayment(order); + const shipment = await createShipment(payment); + await sendConfirmation(shipment); +} +``` + +The runtime intercepts each `await` and ensures that: +1. The result is **durably persisted** before execution continues +2. On crash, the function **resumes from where it left off** — already-completed steps are not re-executed +3. Side effects happen **at least once** (and ideally exactly once) + +The key insight: **the `await` keyword is the persistence boundary.** Everything between two `await`s is either fully completed or fully retried — never partially executed. + +But the implementations differ wildly in how they achieve this. + +### Temporal: Event Sourcing + Deterministic Replay + +Temporal (2019, ex-Uber Cadence team) is the most well-known durable execution engine. Its core abstraction: **record every state change as an immutable event, then replay events to reconstruct state.** + +```typescript +// Workflow — runs in a deterministic sandbox +export async function processOrder(orderId: string) { + const order = await activities.getOrder(orderId); + const payment = await activities.chargePayment(order); + await activities.shipOrder(payment); +} + +// Activity — runs in normal Node.js, can do anything +export async function chargePayment(order: Order): Promise { + return stripe.charges.create({ amount: order.total }); +} +``` + +#### The Workflow / Activity Split + +Temporal enforces a strict separation: + +- **Workflow code** runs in a **deterministic sandbox**. No I/O, no randomness, no clock access. The TypeScript SDK achieves this with a stripped-down V8 isolate that blocks non-deterministic APIs. You cannot call `fetch()`, `Math.random()`, or `Date.now()` inside a workflow. +- **Activity code** runs in normal Node.js / Python / Go. It can do anything — call APIs, write to databases, generate random numbers. + +This split exists because of Temporal's replay mechanism. + +#### How Replay Works + +Every time a workflow makes a decision (schedule an activity, start a timer, send a signal), Temporal records it as an **event** in an immutable **event history** stored in the database. + +When the workflow needs to resume (after an activity completes, after a crash, after a timer fires), the **entire workflow function re-executes from the beginning.** But this time, the SDK checks the event history: + +``` +Execution 1: run → await getOrder → [no event] → schedule activity → YIELD +Execution 2: run → await getOrder → [event: completed(order)] → return recorded result + → await chargePayment → [no event] → schedule activity → YIELD +Execution 3: run → await getOrder → [event] → skip + → await chargePayment → [event] → skip + → await shipOrder → [no event] → schedule activity → YIELD +``` + +Each execution replays all previous steps (returning results from the event history) and then advances one step. This is **event sourcing applied to code execution**. + +#### Concrete Example: Event History + +For the 3-step workflow above, here's what Temporal actually stores in Postgres: + +``` +Event# EventType Details +────── ──────────────────────────── ───────────────────────────── + 1 WorkflowExecutionStarted {input: orderId} + 2 WorkflowTaskScheduled {taskQueue: "main"} + 3 WorkflowTaskStarted {worker: "w1"} + 4 WorkflowTaskCompleted {commands: [ScheduleActivity("getOrder")]} + 5 ActivityTaskScheduled {type: "getOrder"} + 6 ActivityTaskStarted {worker: "w1"} + 7 ActivityTaskCompleted {result: {id: 123, total: 99}} + 8 WorkflowTaskScheduled {taskQueue: "main"} + 9 WorkflowTaskStarted {worker: "w1"} + 10 WorkflowTaskCompleted {commands: [ScheduleActivity("chargePayment")]} + 11 ActivityTaskScheduled {type: "chargePayment"} + 12 ActivityTaskStarted {worker: "w1"} + 13 ActivityTaskCompleted {result: {receipt: "ch_xxx"}} + 14 WorkflowTaskScheduled {taskQueue: "main"} + 15 WorkflowTaskStarted {worker: "w1"} + 16 WorkflowTaskCompleted {commands: [ScheduleActivity("shipOrder")]} + 17 ActivityTaskScheduled {type: "shipOrder"} + 18 ActivityTaskStarted {worker: "w1"} + 19 ActivityTaskCompleted {result: {tracking: "FDX123"}} + 20 WorkflowTaskScheduled {taskQueue: "main"} + 21 WorkflowTaskStarted {worker: "w1"} + 22 WorkflowTaskCompleted {commands: [CompleteWorkflow]} + 23 WorkflowExecutionCompleted {result: "ok"} +``` + +**23 events for 3 steps.** Each activity generates ~7 events. This is the write amplification cost of event sourcing. But you get a complete, queryable audit trail of exactly what happened and when. + +#### The Determinism Requirement + +Since the workflow function is replayed from the beginning on every resume, it **must produce the same sequence of commands on every execution.** If you used `Math.random()` to decide whether to call activity A or B, replay would make a different choice and Temporal would throw a **non-determinism error**. + +This is the most common source of developer pain with Temporal. You must learn to think about which code is "workflow" (deterministic orchestration) and which is "activity" (actual work). Third-party libraries that use randomness or timestamps silently break. + +```typescript +// ❌ BROKEN — non-deterministic +export async function myWorkflow() { + if (Math.random() > 0.5) { // Different on replay! + await activities.pathA(); + } else { + await activities.pathB(); + } +} + +// ✅ CORRECT — decision based on activity result +export async function myWorkflow() { + const coin = await activities.flipCoin(); // Recorded in history + if (coin > 0.5) { + await activities.pathA(); + } +} +``` + +#### Architecture + +Temporal's server is 4 services (Frontend, History, Matching, Worker) backed by PostgreSQL or Cassandra. Workers connect via gRPC and long-poll for tasks. This is the highest operational complexity of any engine in this comparison. + +``` + Workflow Worker Temporal Server (4 services) Activity Worker + │ │ │ + │◀── gRPC WorkflowTask ───│ │ + │ (with event history) │ │ + │ │ │ + │ replay, hit new await │ │ + │ │ │ + │── gRPC Command ────────▶│── append events ─▶ Postgres │ + │ ScheduleActivityTask │── enqueue on task queue ──────▶│ + │ │ │ + │ │ execute fn() + │ │ │ + │ │◀── gRPC result ────────────────│ + │ │── append events ─▶ Postgres │ + │ │ │ + │◀── gRPC WorkflowTask ───│ │ + │ (updated history) │ │ + │ replay all, advance │ │ +``` + +#### What Temporal Gets Right + +- **True durable execution.** Workflows can run for months. Crash anywhere, resume exactly where you left off. +- **Full audit trail.** Every event is recorded. You can inspect and replay any workflow. +- **Multi-language SDKs.** TypeScript, Go, Java, Python, .NET, PHP. +- **Rich primitives.** Signals, queries, child workflows, timers, cancellation, search attributes. + +#### What Temporal Gets Wrong + +- **Operational complexity.** 4 server services + database + optionally Elasticsearch. Many moving parts. +- **Determinism tax.** Developers must constantly think about what's deterministic. Subtle bugs from non-deterministic libraries. +- **Write amplification.** 7+ events per activity. A 100-step workflow generates 700+ database writes. +- **Replay cost.** Each workflow task replays from the beginning. Mitigated by sticky execution (caching state on the same worker), but cold replay of long histories is expensive. + +--- + +### Inngest: HTTP Callbacks + Memoization + +Inngest (2022) took a radically different approach: **what if the execution engine was just an HTTP middleware?** + +```typescript +export const processOrder = inngest.createFunction( + { id: "process-order" }, + { event: "order/created" }, + async ({ event, step }) => { + const order = await step.run("get-order", () => + db.orders.findById(event.data.orderId) + ); + const payment = await step.run("charge", () => + stripe.charges.create({ amount: order.total }) + ); + await step.run("ship", () => + shipping.dispatch(order) + ); + } +); +``` + +#### The HTTP Round-Trip Model + +Inngest's execution model is unlike anything else. Your code runs as a **stateless HTTP endpoint**. The Inngest server orchestrates execution by making HTTP calls to your endpoint: + +``` +Request 1 (no steps completed): + Server POST → your endpoint + Code runs: step.run("get-order", fn) → fn executes → returns order + Response: { step_result: "get-order", data: order } + Server stores result. + +Request 2 (get-order completed): + Server POST → your endpoint (with memoized results) + Code runs: step.run("get-order", fn) → memoized, returns stored result + Code runs: step.run("charge", fn) → fn executes → returns receipt + Response: { step_result: "charge", data: receipt } + Server stores result. + +Request 3 (get-order + charge completed): + Server POST → your endpoint (with memoized results) + Code runs: step.run("get-order") → memoized + Code runs: step.run("charge") → memoized + Code runs: step.run("ship", fn) → fn executes + Response: { step_result: "ship", data: tracking } + +Request 4 (all steps completed): + Server POST → your endpoint + All steps memoized, function returns. + Response: { complete: true, result: ... } +``` + +**Each `step.run()` = one HTTP round-trip.** The function re-executes from the top on every request, but completed steps return instantly from memoized results. + +#### Why HTTP? + +This design choice has profound implications: + +**Pro: Truly stateless workers.** Your code is a regular HTTP endpoint — deploy it on Vercel, AWS Lambda, Cloudflare Workers, a Docker container, anywhere. No persistent worker process, no gRPC connection to maintain, no special runtime. The Inngest server handles all state. + +**Pro: No new infrastructure for the developer.** You add Inngest to your existing Express/Next.js/Flask app. No separate worker binary, no task queue, no Celery/RabbitMQ. + +**Pro: Language-agnostic by design.** Any language that can serve HTTP can be an Inngest worker. + +**Con: Highest per-step latency.** Every step = HTTP request + response + memoized replay of all previous steps. A 10-step workflow makes 10 HTTP requests, and the 10th request re-executes (and skips) all 9 previous steps before running the 10th. + +**Con: Full re-execution per step.** Like Temporal, the function re-runs from the beginning. Unlike Temporal, there's no compiled workflow bundle or V8 isolate — it's a full HTTP request with all the associated overhead (routing, middleware, JSON parsing). + +#### The Memoization Distinction + +Inngest and Temporal both re-execute code and skip completed steps, but the mechanism differs: + +- **Temporal**: The SDK intercepts `await` calls and checks an in-memory event history. If a matching event exists, the call returns instantly. This happens within a single process execution. +- **Inngest**: The server sends memoized results in the HTTP request body. The SDK checks its local cache. If found, `step.run()` returns immediately. This happens across HTTP requests. + +The practical difference: Temporal's replay is in-process (fast, ~microseconds per replayed step). Inngest's replay is across HTTP (slower, but the memoized steps are essentially free since the function body isn't called). + +#### What Inngest Gets Right + +- **Simplest deployment model.** Add it to your existing app. No infrastructure beyond the Inngest server (which can be self-hosted or cloud). +- **Serverless-native.** Works perfectly with Lambda/Vercel/Cloudflare. No persistent connections to maintain. +- **Event-driven.** First-class event system with fan-out, debounce, throttle. +- **Server-side sleep.** `step.sleep("1h")` doesn't hold a process — the server wakes your function after 1 hour. + +#### What Inngest Gets Wrong + +- **Latency.** Each step = HTTP round-trip. For workflows with many fast steps, the HTTP overhead dominates. +- **Re-execution cost.** The function code (parsing, importing, middleware) runs on every step, not just the new one. +- **Observability.** Debugging is harder when execution is spread across multiple HTTP requests. + +--- + +### Windmill WAC: Suspend/Resume + Checkpoint + +Windmill (2022) introduced Workflow-as-Code (WAC) with a unique mechanism: **exception-based suspend/resume with mutable checkpoints.** + +```typescript +import { task, step, workflow } from "windmill-client"; + +const getOrder = task(async (id: string) => { + return db.orders.findById(id); +}); + +export const main = workflow(async () => { + const order = await getOrder("order-123"); + + // step() executes inline — no child job, no dispatch + const total = await step("calc-total", () => + order.items.reduce((sum, i) => sum + i.price, 0) + ); + + const payment = await chargePayment(total); + return { payment }; +}); +``` + +#### Two Primitives: task() and step() + +Windmill is unique in offering two step types with very different execution models: + +**`task()`** — dispatches a child job. Separate process, separate resource limits, visible as an independent job in the UI. Like Temporal's activities. + +**`step()`** — executes inline in the same process. No child job, no queue hop. Result is checkpointed to the database. Like Temporal's local activities, but with a fast-path optimization: the SDK POSTs the step result directly to the API server while the script continues running. No suspend/resume cycle. + +This dual model reflects a real insight: **not all steps are equally expensive, and forcing them through the same dispatch mechanism is wasteful.** A database query and a CSV parsing step don't need the same isolation guarantees. + +#### How task() Works: The StepSuspend Exception + +When you `await task()`, the Windmill SDK uses a JavaScript trick: the `task()` function returns a **thenable** (not a full Promise) whose `.then()` method **throws a `StepSuspend` exception.** + +```typescript +// Simplified internal logic +return { + then: (): never => { + const steps = [...this.pending]; // Collect ALL unawaited tasks + throw new StepSuspend({ + mode: steps.length > 1 ? "parallel" : "sequential", + steps, + }); + }, +}; +``` + +When the worker catches `StepSuspend`: +1. **Suspend** the parent job in Postgres (`SET suspend = N` where N = child count) +2. **Save checkpoint** (completed step results as JSONB) +3. **Push child jobs** to the queue + +When all children complete, the parent is re-executed. Completed steps return from the checkpoint. + +#### How step() Works: The Inline Fast Path + +With `step()`, there's no suspend/resume at all. The SDK: +1. Executes the function body in-process +2. POSTs the result to `POST /wac/inline_checkpoint/{job_id}` on the API server +3. The API server writes a single JSONB delta to Postgres +4. The script continues immediately — no process restart + +This means a 100-step workflow using `step()` runs as **one continuous Bun process** making 100 HTTP POSTs. There's no re-execution, no replay, no suspend/resume. + +#### How Parallelism Works + +The `StepSuspend` exception enables elegant parallelism: + +```typescript +const [a, b] = await Promise.all([taskA(), taskB()]); +``` + +Both `taskA()` and `taskB()` return thenables that accumulate in a `pending` array. When `Promise.all` resolves, the first `.then()` triggers `StepSuspend` with **both tasks** in the dispatch info. Both child jobs are pushed to the queue in one batch. + +#### The Checkpoint Model + +Unlike Temporal's append-only event history, Windmill uses a **mutable JSONB checkpoint**: + +```json +{ + "source_hash": "a1b2c3d4", + "completed_steps": { + "get-order": {"id": 123, "total": 99}, + "calc-total": 99 + }, + "pending_steps": null, + "job_ids": {"get-order": "uuid-1"} +} +``` + +This is O(completed_steps) regardless of how many replays. Temporal's history is O(total_events), which includes scheduling, starting, and completing events for every step. + +The trade-off: checkpoints don't give you an audit trail of *when* each step ran. Temporal's history does. For debugging, Temporal wins. For storage efficiency, Windmill wins. + +#### What Windmill WAC Gets Right + +- **Dual step model (task/step).** Choose the right cost/isolation trade-off per step. +- **Fast inline steps.** `step()` with the inline fast path: ~0.5ms per step, no process restart. +- **Efficient parallelism.** Batch dispatch of parallel tasks via StepSuspend. +- **Compact checkpoints.** JSONB, O(completed_steps), not O(events). +- **No determinism requirement.** Unlike Temporal, you can use `Math.random()` between steps. The checkpoint stores results, not replay commands. + +#### What Windmill WAC Gets Wrong + +- **Per-workflow cold start.** Each workflow spawns a new Bun process (~12ms). Temporal's workers are persistent. This is the main throughput bottleneck for short workflows. +- **One job per worker.** Workers process one workflow at a time. Temporal handles 200+ concurrent activities per worker. +- **Workers talk to Postgres directly.** No server-mediated batching. Each step result is an individual PG transaction. + +--- + +## The Hybrid: Visual Flow Builder + +### Windmill Flows: JSON DAG + Code Steps + +Windmill also offers a traditional flow builder — a visual drag-and-drop editor that produces JSON-defined DAGs: + +```json +{ + "modules": [ + { + "id": "a", + "value": { + "type": "rawscript", + "language": "bun", + "content": "export function main() { return fetch('...').then(r => r.json()); }" + } + }, + { + "id": "b", + "value": { + "type": "rawscript", + "language": "python3", + "content": "def main(data): return [x * 2 for x in data]" + }, + "input_transforms": { + "data": { "type": "javascript", "expr": "results.a" } + } + } + ] +} +``` + +This is closer to Airflow's model — each step is an independent execution unit dispatched to a worker. But unlike Airflow: + +- **Steps can be in different languages** (TypeScript, Python, Go, Bash, SQL, etc.) within the same flow +- **Data passes via `results` context** (in-memory JSON between steps, not XCom) +- **The flow executor runs as a state machine** in the Windmill worker, not as a separate scheduler process +- **Branching, loops, error handlers, and approval steps** are built-in flow constructs +- **Each step can be a full Windmill script** with auto-generated UIs, schedules, etc. + +The flow builder is not programmatic — it's a UI. This makes it more accessible to non-developers but less flexible than code-based approaches. It sits between Airflow (Python DAGs) and Temporal (pure code) in the abstraction spectrum. + +--- + +## Comparing the Abstractions + +### The Same Workflow in Every Engine + +Let's implement the same 3-step workflow across all engines to see the differences in expressiveness: + +**Airflow:** +```python +@task +def get_order(id): + return db.orders.find(id) + +@task +def charge(order): + return stripe.charges.create(amount=order["total"]) + +@task +def ship(payment): + return shipping.dispatch(payment) + +@dag(schedule=None) +def process_order(): + order = get_order("123") + payment = charge(order) + ship(payment) +``` + +**Prefect:** +```python +@task +def get_order(id): + return db.orders.find(id) + +@task +def charge(order): + return stripe.charges.create(amount=order.total) + +@task +def ship(payment): + return shipping.dispatch(payment) + +@flow +def process_order(): + order = get_order("123") + payment = charge(order) + ship(payment) +``` + +**Temporal:** +```typescript +// Workflow file (deterministic sandbox) +export async function processOrder() { + const order = await activities.getOrder("123"); + const payment = await activities.charge(order); + await activities.ship(payment); +} + +// Activities file (separate, normal Node.js) +export async function getOrder(id) { return db.orders.find(id); } +export async function charge(order) { return stripe.charges.create({amount: order.total}); } +export async function ship(payment) { return shipping.dispatch(payment); } +``` + +**Inngest:** +```typescript +export const processOrder = inngest.createFunction( + { id: "process-order" }, + { event: "order/created" }, + async ({ event, step }) => { + const order = await step.run("get-order", () => db.orders.find("123")); + const payment = await step.run("charge", () => stripe.charges.create({amount: order.total})); + await step.run("ship", () => shipping.dispatch(payment)); + } +); +``` + +**Windmill WAC:** +```typescript +import { step, workflow } from "windmill-client"; + +export const main = workflow(async () => { + const order = await step("get-order", () => db.orders.find("123")); + const payment = await step("charge", () => stripe.charges.create({amount: order.total})); + await step("ship", () => shipping.dispatch(payment)); +}); +``` + +Notice how the code converges. Gen 2 engines (Temporal, Inngest, Windmill) all look like decorated async functions. The differences are: + +- **Temporal** forces you to split into workflow + activities files. Strictest, but enables a deterministic sandbox. +- **Inngest** wraps each step in `step.run()`. Simplest — no build step, works in any HTTP framework. +- **Windmill** offers `step()` (inline) and `task()` (dispatched). Most flexible per-step control. +- **Airflow / Prefect** look similar but the execution model is fundamentally different — no durable execution within a task. + +### Dynamic Control Flow + +Where the engines truly diverge is control flow: + +```typescript +// "If payment fails, send to manual review queue" + +// Temporal — full imperative control flow +export async function processOrder() { + const order = await activities.getOrder("123"); + try { + await activities.charge(order); + } catch { + await activities.sendToReview(order); + const approval = await workflow.waitForSignal("review-approved"); + if (!approval) return; + } + await activities.ship(order); +} + +// Inngest — same, via step.run() +async ({ step }) => { + const order = await step.run("get-order", () => getOrder("123")); + let charged = false; + try { + await step.run("charge", () => charge(order)); + charged = true; + } catch {} + if (!charged) { + await step.run("review", () => sendToReview(order)); + const approval = await step.waitForEvent("review-approved", { timeout: "7d" }); + if (!approval) return; + } + await step.run("ship", () => ship(order)); +} + +// Airflow — you can't. +# DAGs are static. You can use @task.branch, but it's limited: +@task.branch +def check_payment(result): + if result["success"]: + return "ship_order" + return "send_to_review" +# This creates a static branch in the DAG — not a try/catch with dynamic resumption. +``` + +This is the fundamental expressiveness gap between DAG schedulers and durable execution engines. Temporal, Inngest, and Windmill can express any control flow (loops, recursion, try/catch, dynamic branching). Airflow and Prefect are limited to what a DAG can represent. + +--- + +## Theoretical Framework + +### The Persistence Spectrum + +Every workflow engine makes a choice: **when do you persist state, and at what granularity?** + +``` + No persistence Per-task Per-step Per-side-effect + (plain code) (Airflow) (Temporal, (Restate-style + Inngest, journaling) + Windmill WAC) + + ◄─────────────────────────────────────────────────────────────────────────► + Fastest Most + No durability durable + No overhead Most overhead +``` + +- **Airflow/Prefect**: persist after each task completes. If a task has 100 lines of code, a crash at line 50 loses all 50 lines of work. +- **Temporal/Inngest/Windmill**: persist after each step (activity/step.run/step). Crashes only lose the current in-flight step. + +### The State Representation Trade-off + +How you represent workflow state determines what you can query, how much storage you use, and how fast recovery is: + +| Representation | Engine | Storage growth | Queryability | Recovery | +|---|---|---|---|---| +| **State machine in DB** | Airflow | O(tasks) | Full SQL | Instant (read current state) | +| **Event log** | Temporal | O(total_events) | Full (events + search attributes) | Replay from history | +| **HTTP memoization** | Inngest | O(completed_steps) | Via runs API | Re-invoke with memo cache | +| **JSONB checkpoint** | Windmill WAC | O(completed_steps) | Limited (checkpoint blob) | Re-execute from checkpoint | + +### The Worker Architecture Question + +How workers relate to the coordination layer fundamentally determines throughput: + +``` + Workers → DB directly Workers → Server → DB Workers = Runtime + (Airflow, Windmill) (Temporal, Inngest) (Restate) + + Each step = DB round-trip Server mediates + batches No external DB + ~1-5ms per step ~0.3-1ms per step ~0.01ms per step +``` + +Temporal's workers never touch the database directly — they communicate with the Temporal server via gRPC, and the server handles all database access. This is a key architectural advantage: the server can batch writes, cache state, and optimize queries. Windmill's workers currently talk to Postgres directly, paying a full round-trip per step. + +--- + +## Performance Characteristics + +We benchmarked equivalent workflows (using `step()` / local activities — inline execution, no dispatch) on the same hardware: + +| Workflow | Windmill WAC | Temporal | Notes | +|----------|------------:|--------:|-------| +| seq_2 (2 steps) | 80 wf/s | 124 wf/s | Temporal 1.5x — persistent worker vs Bun cold start | +| seq_3 (3 steps) | 77 wf/s | 110 wf/s | Adding steps barely costs in Windmill (fast path) | +| seq_100 (100 steps) | 12.6 wf/s | 29 wf/s | Per-step: Windmill 0.8ms, Temporal 0.3ms | +| par_2 (2 parallel) | 79 wf/s | 60 wf/s | **Windmill wins** — batch dispatch | +| fan_out_10 (10-way parallel) | 80 wf/s | 45 wf/s | **Windmill wins +79%** — StepSuspend batch | + +*Windmill v1.683 (CE, single worker, Docker). Temporal v1.29 (single worker, Docker).* + +**Key observations:** + +- **Temporal is faster on sequential workflows** due to persistent workers (no cold start) and server-mediated DB writes (batched events vs individual PG round-trips). +- **Windmill is faster on parallel workflows** due to batch dispatch. `StepSuspend` collects all parallel tasks and pushes them in one operation. Temporal schedules activities individually. +- **Windmill's step() fast path scales well** — seq_2 (80) and seq_3 (77) are nearly identical because the script stays alive and each step is just an HTTP POST. +- **At 100 steps, per-step cost dominates** over cold start. The 2.3x gap (0.8ms vs 0.3ms per step) reflects Windmill writing each step as an individual PG transaction vs Temporal's server batching event writes. + +### Where the Remaining Gap Comes From + +For Windmill, two factors explain the performance gap with Temporal on sequential workloads: + +1. **Per-workflow Bun cold start (~12ms).** Each workflow spawns a new Bun process. Temporal's worker is a persistent Node.js process. With dedicated workers (persistent Bun process), this drops to zero. + +2. **Per-step: individual PG writes vs server-batched writes.** Each `step()` call POSTs to the API server, which does one PG transaction. Temporal's server accumulates events and writes them in batches. This is a ~2.5x difference per step (0.8ms vs 0.3ms). + +The fix for both is architectural: +- Dedicated workers → eliminate cold start +- Server-mediated dispatch → batch PG writes across concurrent workflows + +--- + +## Trade-off Matrix + +| | Airflow | Prefect | Temporal | Inngest | Windmill WAC | Windmill Flows | +|---|---|---|---|---|---|---| +| **Model** | DAG scheduler | DAG + Python | Event sourcing | HTTP callbacks | Checkpoint | JSON DAG | +| **Durable execution** | No | No | Yes | Yes | Yes | Yes (per step) | +| **Dynamic control flow** | Limited | Full Python | Full code | Full code | Full code | Limited (visual) | +| **Per-step latency** | ~2-40s | ~50-200ms | ~5-20ms (activity), ~1ms (local) | ~10-50ms (HTTP) | ~0.5ms (step), ~200ms (task) | ~5-50ms | +| **Determinism required** | N/A | N/A | Yes (strict) | No | No | N/A | +| **Worker model** | 1 task/process | N tasks/process | N tasks/process | Stateless HTTP | 1 job/process | 1 step/process | +| **Languages** | Python | Python | TS, Go, Java, Python, .NET | TS, Go, Python | TS, Python, Go, Bash, +10 | Any (per step) | +| **Server-side sleep** | No | No | Yes | Yes | Yes | Yes | +| **Self-hosted** | Yes (OSS) | Yes (OSS) | Yes (OSS) | Yes (dev server) | Yes (OSS) | Yes (OSS) | +| **Visual builder** | No (code DAGs) | No | No | No | No | **Yes** | +| **Operational complexity** | Medium | Low | **High** (4 services) | Low | Low | Low | + +--- + +## Choosing the Right Abstraction + +**You need scheduled batch ETL:** +→ Airflow (ecosystem) or Prefect (simplicity) + +**You need durable long-running business processes with audit trails:** +→ Temporal (event sourcing gives you complete visibility) + +**You need durable execution with minimal infrastructure:** +→ Inngest (add to existing app) or Windmill WAC (self-hosted platform) + +**You need a mix of visual workflows and code:** +→ Windmill (Flows for visual, WAC for code, both in one platform) + +**You need the lowest per-step latency:** +→ Windmill WAC with `step()` (0.5ms/step inline) — or Temporal with local activities (1-2ms/step) + +**You need the best parallel throughput:** +→ Windmill WAC with `task()` (batch dispatch via StepSuspend) + +**You need serverless/edge deployment:** +→ Inngest (HTTP-native, works with Lambda/Vercel/Cloudflare) + +--- + +## Honorable Mentions: Restate, DBOS, Hatchet + +The five engines above cover the main archetypes, but three other projects are worth knowing — each represents a genuinely different point in the design space. + +### Restate: Co-Located Storage (No External DB) + +[Restate](https://restate.dev) (3.7K stars) is the most architecturally radical engine in this space. Built by the team behind Apache Flink, it eliminates the external database entirely. + +**How it works:** Your code runs as a normal HTTP service. Restate sits between the client and your service as a proxy, intercepting every side-effect (`ctx.run()`) via a bidirectional HTTP/2 stream. Each side-effect result is journaled to an embedded replicated log (Bifrost, backed by RocksDB) — not Postgres, not MySQL, not any external database. + +```typescript +const service = restate.service({ + name: "orders", + handlers: { + process: async (ctx: restate.Context, orderId: string) => { + const order = await ctx.run("get", () => db.getOrder(orderId)); + await ctx.run("charge", () => stripe.charges.create({amount: order.total})); + await ctx.run("ship", () => shipping.dispatch(order)); + }, + }, +}); +``` + +**Why it's fast:** The commit point for "a step happened" is a local RocksDB write + quorum replication across nodes — no network round-trip to an external database. In our benchmarks, Restate achieved **4,600-6,700 workflows/sec** on the same hardware where Temporal did ~100 and Windmill did ~80. That's a **50x advantage** over Temporal. + +**The insight:** This 50x gap isn't an implementation detail — it's a fundamental consequence of storage topology. Every engine that routes through an external Postgres pays ~1ms per step (network + SQL + WAL sync). Restate pays ~0.01-0.1ms (local disk + quorum ACK). Remove the network hop and you gain an order of magnitude. + +**The trade-off:** No SQL access to workflow state. No familiar Postgres tooling for backup/replication — you rely on Restate's built-in mechanisms (S3 snapshots, log-based replication). Younger ecosystem (SDK for TypeScript, Java, Rust, Go). And for durable execution specifically: no separate activity/workflow distinction — all code runs in the handler, and `ctx.run()` is your only escape hatch for side-effects. + +**Unique feature:** Virtual Objects — keyed handlers with exclusive access and durable state. Essentially the actor model (Akka/Erlang) with persistence and exactly-once semantics. No other engine offers this. + +### DBOS: Durable Execution as a Library + +[DBOS](https://dbos.dev) (1.1K stars TS, 1.3K stars Python) takes the opposite approach from Restate: instead of replacing Postgres, it **leans into it.** DBOS is not a server — it's a library you import into your application. Postgres IS the durable execution engine. + +```typescript +import { DBOS } from "@dbos-inc/dbos-sdk"; + +class OrderWorkflow { + @DBOS.workflow() + static async processOrder(orderId: string) { + const order = await OrderWorkflow.getOrder(orderId); + await OrderWorkflow.charge(order); + await OrderWorkflow.ship(order); + } + + @DBOS.step() + static async getOrder(id: string) { return db.orders.find(id); } + + @DBOS.step() + static async charge(order: Order) { return stripe.charges.create({...}); } + + @DBOS.step() + static async ship(order: Order) { return shipping.dispatch(order); } +} +``` + +**How it works:** The `@DBOS.workflow()` and `@DBOS.step()` decorators instrument your code. Each step result is written to Postgres in a transaction. On crash, the library reads the completed step results from Postgres and replays the workflow, skipping completed steps. + +**The appeal:** No server to deploy, no infrastructure to manage. Just `npm install @dbos-inc/dbos-sdk` and use Postgres you already have. For teams that are allergic to adding infrastructure, this is compelling. + +**The insight for Windmill:** DBOS proves that "durable execution on Postgres" can be simple and fast. Their approach — a library that uses Postgres transactions directly — is the lightest possible weight. It's the same insight behind Windmill's WAC `step()` fast path: use the database you already have, minimize ceremony around it. + +**The trade-off:** Being a library means no centralized UI, no built-in monitoring, no visual flow builder. You get durability but not orchestration platform features. Also, being Postgres-bound means the same per-step latency ceiling as any Postgres-backed engine (~1ms per step). + +### Hatchet: DAG Steps with a Go Engine + +[Hatchet](https://hatchet.run) (6.8K stars) is a Go-based workflow engine that uses an explicit DAG model for steps — you declare parent dependencies, and the engine handles scheduling: + +```typescript +const workflow = hatchet.workflow("process-order"); +workflow.step("get-order", async (ctx) => { return db.getOrder(ctx.input.id); }); +workflow.step("charge", async (ctx) => { return stripe.charge(ctx.stepOutput("get-order")); }, + { parents: ["get-order"] }); +workflow.step("ship", async (ctx) => { return shipping.dispatch(ctx.stepOutput("charge")); }, + { parents: ["charge"] }); +``` + +**How it works:** Steps declare their parent dependencies explicitly. The Hatchet engine (Go) builds the DAG and dispatches steps via Postgres + RabbitMQ. Workers connect via gRPC. No deterministic replay — steps execute once, results are persisted. + +**Where it sits:** Between Airflow (static DAG) and Temporal (imperative code). You get DAG-level parallelism (steps with no parents run in parallel automatically) without the determinism constraints of Temporal. But you lose the ability to use arbitrary control flow (if/else, loops) within the workflow — the DAG structure is fixed at registration time. + +--- + +## The Landscape at a Glance + +``` + ┌─────────────────────────────────────────┐ + │ GitHub Stars (Apr 2026) │ + │ │ + Airflow ████████████████████████████████████████████████ 45,050 │ + Prefect ████████████████████████ 22,177 │ + Temporal ███████████████████ 19,598 │ + Windmill ████████████████ 16,241 │ + Hatchet ██████ 6,826 │ + Inngest █████ 5,202 │ + Restate ███ 3,729 │ + DBOS █ 1,137 (TS) / 1,267 (Python) │ + └─────────────────────────────────────────┘ +``` + +Stars don't measure quality, but they measure mindshare. Airflow's dominance reflects a decade of production use. Temporal's growth reflects the industry shift toward durable execution. Windmill's position reflects being both a workflow engine and a broader platform. + +The interesting trend: **every engine launched after 2020 supports durable execution.** The market has decided that "crash → retry entire task" (Airflow's model) is not good enough. The question is now *how* to implement durable execution — event sourcing vs checkpoints vs journals vs HTTP memoization vs Postgres transactions — not *whether* to implement it. + +--- + +## The Future: Convergence + +All workflow engines are converging on the same insight from different directions: **move state closer to compute, reduce I/O hops, batch writes.** + +- Airflow added connection pooling, batch scheduling, the KubernetesExecutor. +- Temporal put a server between workers and the database. Workers never touch Postgres directly. +- Inngest optimized its HTTP protocol to reduce round-trips. +- Windmill added the inline-persist fast path, eliminating suspend/resume for `step()` calls. + +The theoretical limit is: **per-step cost = one durable write.** Whether that write goes to Postgres, a replicated log, or an embedded database determines how close you get. + +We're not there yet. But the gap between engines is shrinking, and the choice increasingly comes down to which model fits your team's mental model — not which one is fundamentally faster. + +--- + +*All benchmark code is [open source](https://github.com/windmill-labs/windmill/tree/main/benchmarks/comparison) and reproducible. We tried to be fair — if you think we weren't, the code is there. File an issue.* From 8c13681bc177f4018c536043df28a014fd9feeb9 Mon Sep 17 00:00:00 2001 From: Ruben Fiszel Date: Wed, 15 Apr 2026 11:33:10 +0000 Subject: [PATCH 02/12] docs: rename "gets right/wrong" to "pros/cons" Co-Authored-By: Claude Opus 4.6 (1M context) --- .../index.mdx | 20 +++++++++---------- 1 file changed, 10 insertions(+), 10 deletions(-) diff --git a/blog/2026-04-15-workflow-engines-primer/index.mdx b/blog/2026-04-15-workflow-engines-primer/index.mdx index 334f082af..b9c12cbb3 100644 --- a/blog/2026-04-15-workflow-engines-primer/index.mdx +++ b/blog/2026-04-15-workflow-engines-primer/index.mdx @@ -165,14 +165,14 @@ Airflow's executor is pluggable — one of its best design decisions: Each executor makes a different trade-off between isolation, latency, and operational complexity. But all share the fundamental constraint: **each task is an independent execution unit**. -#### What Airflow Gets Right +#### Pros - **Massive ecosystem**: hundreds of "operators" (pre-built integrations) for AWS, GCP, databases, Spark, dbt, etc. - **Scheduling**: sophisticated time-based scheduling with backfill, catchup, data intervals. - **Monitoring**: built-in UI showing DAG runs, task statuses, logs, Gantt charts. - **Battle-tested**: runs at Airbnb, Google, PayPal, thousands of companies. You will find answers on StackOverflow. -#### What Airflow Gets Wrong +#### Cons **Latency.** A task takes 2-40 seconds to start (scheduler loop + executor dispatch + cold start), even if the actual work takes 1ms. This is fine for ETL pipelines where tasks run for minutes, but makes Airflow useless for real-time workflows. @@ -239,13 +239,13 @@ def parallel_pipeline(): `.submit()` creates a future (using Python's `concurrent.futures` or a task runner). The function call runs in a thread/process pool. This is simpler than Airflow's DAG-level parallelism but limited by Python's GIL for CPU-bound work. -#### What Prefect Gets Right +#### Pros - **Zero new concepts for Python developers.** Decorators on regular functions. Python control flow. Python data passing. - **Dynamic workflows.** Since the code is real Python, you can use `if/else`, `for` loops, `try/except` — anything. The "DAG" is whatever Python actually executes. - **Lower ceremony than Airflow.** No scheduler process. No DAG file parsing. Just run the flow. -#### What Prefect Gets Wrong +#### Cons - **No durable execution.** Like Airflow, if the process crashes mid-task, work is lost. Task-level retries restart the task from the beginning. - **State-tracking overhead.** Every task run creates multiple HTTP calls + DB writes for state transitions (Pending → Running → Completed). For workflows with hundreds of short tasks, this overhead dominates. @@ -426,14 +426,14 @@ Temporal's server is 4 services (Frontend, History, Matching, Worker) backed by │ replay all, advance │ │ ``` -#### What Temporal Gets Right +#### Pros - **True durable execution.** Workflows can run for months. Crash anywhere, resume exactly where you left off. - **Full audit trail.** Every event is recorded. You can inspect and replay any workflow. - **Multi-language SDKs.** TypeScript, Go, Java, Python, .NET, PHP. - **Rich primitives.** Signals, queries, child workflows, timers, cancellation, search attributes. -#### What Temporal Gets Wrong +#### Cons - **Operational complexity.** 4 server services + database + optionally Elasticsearch. Many moving parts. - **Determinism tax.** Developers must constantly think about what's deterministic. Subtle bugs from non-deterministic libraries. @@ -520,14 +520,14 @@ Inngest and Temporal both re-execute code and skip completed steps, but the mech The practical difference: Temporal's replay is in-process (fast, ~microseconds per replayed step). Inngest's replay is across HTTP (slower, but the memoized steps are essentially free since the function body isn't called). -#### What Inngest Gets Right +#### Pros - **Simplest deployment model.** Add it to your existing app. No infrastructure beyond the Inngest server (which can be self-hosted or cloud). - **Serverless-native.** Works perfectly with Lambda/Vercel/Cloudflare. No persistent connections to maintain. - **Event-driven.** First-class event system with fan-out, debounce, throttle. - **Server-side sleep.** `step.sleep("1h")` doesn't hold a process — the server wakes your function after 1 hour. -#### What Inngest Gets Wrong +#### Cons - **Latency.** Each step = HTTP round-trip. For workflows with many fast steps, the HTTP overhead dominates. - **Re-execution cost.** The function code (parsing, importing, middleware) runs on every step, not just the new one. @@ -633,7 +633,7 @@ This is O(completed_steps) regardless of how many replays. Temporal's history is The trade-off: checkpoints don't give you an audit trail of *when* each step ran. Temporal's history does. For debugging, Temporal wins. For storage efficiency, Windmill wins. -#### What Windmill WAC Gets Right +#### Pros - **Dual step model (task/step).** Choose the right cost/isolation trade-off per step. - **Fast inline steps.** `step()` with the inline fast path: ~0.5ms per step, no process restart. @@ -641,7 +641,7 @@ The trade-off: checkpoints don't give you an audit trail of *when* each step ran - **Compact checkpoints.** JSONB, O(completed_steps), not O(events). - **No determinism requirement.** Unlike Temporal, you can use `Math.random()` between steps. The checkpoint stores results, not replay commands. -#### What Windmill WAC Gets Wrong +#### Cons - **Per-workflow cold start.** Each workflow spawns a new Bun process (~12ms). Temporal's workers are persistent. This is the main throughput bottleneck for short workflows. - **One job per worker.** Workers process one workflow at a time. Temporal handles 200+ concurrent activities per worker. From b9440ed455459bf3b4dec2d6038ab6bd870a4ac2 Mon Sep 17 00:00:00 2001 From: Ruben Fiszel Date: Wed, 15 Apr 2026 11:35:51 +0000 Subject: [PATCH 03/12] =?UTF-8?q?fix:=20correct=20Temporal=20determinism?= =?UTF-8?q?=20claims=20=E2=80=94=20sandbox=20is=20TS-only,=20Go/Java=20are?= =?UTF-8?q?=20by=20convention?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Co-Authored-By: Claude Opus 4.6 (1M context) --- blog/2026-04-15-workflow-engines-primer/index.mdx | 9 ++++++--- 1 file changed, 6 insertions(+), 3 deletions(-) diff --git a/blog/2026-04-15-workflow-engines-primer/index.mdx b/blog/2026-04-15-workflow-engines-primer/index.mdx index b9c12cbb3..8d4ac64cf 100644 --- a/blog/2026-04-15-workflow-engines-primer/index.mdx +++ b/blog/2026-04-15-workflow-engines-primer/index.mdx @@ -303,7 +303,7 @@ But the implementations differ wildly in how they achieve this. Temporal (2019, ex-Uber Cadence team) is the most well-known durable execution engine. Its core abstraction: **record every state change as an immutable event, then replay events to reconstruct state.** ```typescript -// Workflow — runs in a deterministic sandbox +// Workflow — must be deterministic (sandboxed in TS, by convention in Go/Java) export async function processOrder(orderId: string) { const order = await activities.getOrder(orderId); const payment = await activities.chargePayment(order); @@ -320,7 +320,10 @@ export async function chargePayment(order: Order): Promise { Temporal enforces a strict separation: -- **Workflow code** runs in a **deterministic sandbox**. No I/O, no randomness, no clock access. The TypeScript SDK achieves this with a stripped-down V8 isolate that blocks non-deterministic APIs. You cannot call `fetch()`, `Math.random()`, or `Date.now()` inside a workflow. +- **Workflow code** must be deterministic — no I/O, no randomness, no direct clock access. How strictly this is enforced depends on the SDK: + - **TypeScript**: the strictest. Workflows run in a V8 isolate with `Math.random()`, `Date()`, `setTimeout()` replaced by deterministic versions. Node.js APIs (`fs`, `http`, `fetch`) are blocked at the bundler level. + - **Python**: a sandbox using proxy objects and a custom module importer restricts most non-deterministic access at runtime. + - **Go and Java**: **no sandbox.** Determinism is enforced by convention — developers are told not to use goroutines/threads, system clocks, or randomness. Violations are only caught at replay time (non-determinism error), not at compile time. - **Activity code** runs in normal Node.js / Python / Go. It can do anything — call APIs, write to databases, generate random numbers. This split exists because of Temporal's replay mechanism. @@ -380,7 +383,7 @@ Event# EventType Details Since the workflow function is replayed from the beginning on every resume, it **must produce the same sequence of commands on every execution.** If you used `Math.random()` to decide whether to call activity A or B, replay would make a different choice and Temporal would throw a **non-determinism error**. -This is the most common source of developer pain with Temporal. You must learn to think about which code is "workflow" (deterministic orchestration) and which is "activity" (actual work). Third-party libraries that use randomness or timestamps silently break. +This is the most common source of developer pain with Temporal. You must learn to think about which code is "workflow" (deterministic orchestration) and which is "activity" (actual work). In TypeScript, the sandbox catches most violations immediately. In Go or Java, a third-party library that calls `time.Now()` or `Math.random()` will silently work until replay fails — potentially in production, weeks after deployment. ```typescript // ❌ BROKEN — non-deterministic From 8da7bcd002ebab44c3424d7f3a789b770a44cf1c Mon Sep 17 00:00:00 2001 From: Ruben Fiszel Date: Wed, 15 Apr 2026 11:43:07 +0000 Subject: [PATCH 04/12] fix: clarify dedicated workers as existing solution for cold start Co-Authored-By: Claude Opus 4.6 (1M context) --- blog/2026-04-15-workflow-engines-primer/index.mdx | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/blog/2026-04-15-workflow-engines-primer/index.mdx b/blog/2026-04-15-workflow-engines-primer/index.mdx index 8d4ac64cf..c6329d0fb 100644 --- a/blog/2026-04-15-workflow-engines-primer/index.mdx +++ b/blog/2026-04-15-workflow-engines-primer/index.mdx @@ -646,7 +646,7 @@ The trade-off: checkpoints don't give you an audit trail of *when* each step ran #### Cons -- **Per-workflow cold start.** Each workflow spawns a new Bun process (~12ms). Temporal's workers are persistent. This is the main throughput bottleneck for short workflows. +- **Per-workflow cold start (without dedicated workers).** By default, each workflow spawns a new Bun process (~12ms). This is the main throughput bottleneck for short workflows. Windmill offers [dedicated workers](/docs/core_concepts/dedicated_workers) — persistent Bun processes assigned to specific scripts or workspaces — which eliminate this cold start entirely and bring execution closer to Temporal's persistent worker model. - **One job per worker.** Workers process one workflow at a time. Temporal handles 200+ concurrent activities per worker. - **Workers talk to Postgres directly.** No server-mediated batching. Each step result is an individual PG transaction. @@ -913,7 +913,7 @@ We benchmarked equivalent workflows (using `step()` / local activities — inlin For Windmill, two factors explain the performance gap with Temporal on sequential workloads: -1. **Per-workflow Bun cold start (~12ms).** Each workflow spawns a new Bun process. Temporal's worker is a persistent Node.js process. With dedicated workers (persistent Bun process), this drops to zero. +1. **Per-workflow Bun cold start (~12ms).** Without dedicated workers, each workflow spawns a new Bun process. Temporal's worker is a persistent Node.js process. Windmill's [dedicated workers](/docs/core_concepts/dedicated_workers) — persistent Bun processes assigned to specific scripts or workspaces — eliminate this cold start entirely. 2. **Per-step: individual PG writes vs server-batched writes.** Each `step()` call POSTs to the API server, which does one PG transaction. Temporal's server accumulates events and writes them in batches. This is a ~2.5x difference per step (0.8ms vs 0.3ms). From 75d3ea614a72a80b7e04875693bb623da8b58099 Mon Sep 17 00:00:00 2001 From: Ruben Fiszel Date: Wed, 15 Apr 2026 11:43:43 +0000 Subject: [PATCH 05/12] fix: mention agent mode as existing non-direct-DB worker option Co-Authored-By: Claude Opus 4.6 (1M context) --- blog/2026-04-15-workflow-engines-primer/index.mdx | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/blog/2026-04-15-workflow-engines-primer/index.mdx b/blog/2026-04-15-workflow-engines-primer/index.mdx index c6329d0fb..a062bf01b 100644 --- a/blog/2026-04-15-workflow-engines-primer/index.mdx +++ b/blog/2026-04-15-workflow-engines-primer/index.mdx @@ -648,7 +648,7 @@ The trade-off: checkpoints don't give you an audit trail of *when* each step ran - **Per-workflow cold start (without dedicated workers).** By default, each workflow spawns a new Bun process (~12ms). This is the main throughput bottleneck for short workflows. Windmill offers [dedicated workers](/docs/core_concepts/dedicated_workers) — persistent Bun processes assigned to specific scripts or workspaces — which eliminate this cold start entirely and bring execution closer to Temporal's persistent worker model. - **One job per worker.** Workers process one workflow at a time. Temporal handles 200+ concurrent activities per worker. -- **Workers talk to Postgres directly.** No server-mediated batching. Each step result is an individual PG transaction. +- **Workers talk to Postgres directly by default.** No server-mediated batching — each step result is an individual PG transaction. Windmill also supports an [agent mode](/docs/advanced/worker_groups#agent-workers) where workers communicate with the server over HTTP/WebSocket instead of connecting to Postgres directly, which is useful for remote/edge deployments but doesn't yet batch writes. --- From 9977879b931f876cd54be6fb1afd0f6aaf527a56 Mon Sep 17 00:00:00 2001 From: Ruben Fiszel Date: Wed, 15 Apr 2026 14:24:43 +0000 Subject: [PATCH 06/12] fix: correct broken link to agent workers docs Co-Authored-By: Claude Opus 4.6 (1M context) --- blog/2026-04-15-workflow-engines-primer/index.mdx | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/blog/2026-04-15-workflow-engines-primer/index.mdx b/blog/2026-04-15-workflow-engines-primer/index.mdx index a062bf01b..35833e6b3 100644 --- a/blog/2026-04-15-workflow-engines-primer/index.mdx +++ b/blog/2026-04-15-workflow-engines-primer/index.mdx @@ -648,7 +648,7 @@ The trade-off: checkpoints don't give you an audit trail of *when* each step ran - **Per-workflow cold start (without dedicated workers).** By default, each workflow spawns a new Bun process (~12ms). This is the main throughput bottleneck for short workflows. Windmill offers [dedicated workers](/docs/core_concepts/dedicated_workers) — persistent Bun processes assigned to specific scripts or workspaces — which eliminate this cold start entirely and bring execution closer to Temporal's persistent worker model. - **One job per worker.** Workers process one workflow at a time. Temporal handles 200+ concurrent activities per worker. -- **Workers talk to Postgres directly by default.** No server-mediated batching — each step result is an individual PG transaction. Windmill also supports an [agent mode](/docs/advanced/worker_groups#agent-workers) where workers communicate with the server over HTTP/WebSocket instead of connecting to Postgres directly, which is useful for remote/edge deployments but doesn't yet batch writes. +- **Workers talk to Postgres directly by default.** No server-mediated batching — each step result is an individual PG transaction. Windmill also supports an [agent mode](/docs/core_concepts/agent_workers) where workers communicate with the server over HTTP/WebSocket instead of connecting to Postgres directly, which is useful for remote/edge deployments but doesn't yet batch writes. --- From 96f79caf26511f0520b2026eddc5788380fc77b1 Mon Sep 17 00:00:00 2001 From: Ruben Fiszel Date: Wed, 15 Apr 2026 21:36:20 +0000 Subject: [PATCH 07/12] =?UTF-8?q?fix:=20correct=20DB=20assumptions=20?= =?UTF-8?q?=E2=80=94=20not=20all=20use=20Postgres,=20data=20passing=20thro?= =?UTF-8?q?ugh=20DB=20is=20universal?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Co-Authored-By: Claude Opus 4.6 (1M context) --- .../2026-04-15-workflow-engines-primer/index.mdx | 16 ++++++++-------- 1 file changed, 8 insertions(+), 8 deletions(-) diff --git a/blog/2026-04-15-workflow-engines-primer/index.mdx b/blog/2026-04-15-workflow-engines-primer/index.mdx index 35833e6b3..4822436fc 100644 --- a/blog/2026-04-15-workflow-engines-primer/index.mdx +++ b/blog/2026-04-15-workflow-engines-primer/index.mdx @@ -135,7 +135,7 @@ Each task passes through a state machine stored in the database: Every state transition is a database write. The scheduler owns `scheduled → queued`. The executor owns `queued → running`. The worker owns `running → success/failed`. -#### Data Passing: The XCom Problem +#### Data Passing Between Steps Since tasks run in separate processes (possibly different machines), data must be serialized to shared storage. Airflow calls this "XCom" (cross-communication): @@ -153,7 +153,7 @@ def transform(raw): # raw = the JSON blob from extract() Under the hood, this is `INSERT INTO xcom (key, value, ...) VALUES (...)` and `SELECT value FROM xcom WHERE ...`. There's typically a size limit (48KB in Postgres by default). For anything larger, you must use external storage (S3) and pass references. -This is not a limitation of Airflow's implementation — it's inherent to the DAG scheduler model. If tasks are independent processes, they can't share memory, so all inter-task data flows through external storage. +This pattern — step results stored in the database, next step reads them — is actually universal across engines where steps run in separate processes. Temporal stores activity results in the event history. Windmill stores step results in `v2_job.result` (Postgres JSONB). The difference is mostly in ergonomics: Airflow's XCom historically required explicit push/pull calls and had tight size limits, while newer engines handle serialization transparently. #### The Executor Layer @@ -266,7 +266,7 @@ Pro: Natural fit for scheduled batch processing. Con: No durable execution within a task. Con: High per-task overhead (state transitions, data serialization). Con: Static or weakly dynamic control flow (Airflow worse, Prefect better). -Con: Data passing is clunky (XCom / serialization / size limits). +Con: Data passing goes through the database (all engines share this when steps are separate processes, but Airflow's XCom has historically been the most limited in size and ergonomics). ``` For scheduled ETL pipelines where tasks run for minutes, these trade-offs are excellent. For real-time, latency-sensitive, or long-running workflows, they're not. @@ -347,7 +347,7 @@ Each execution replays all previous steps (returning results from the event hist #### Concrete Example: Event History -For the 3-step workflow above, here's what Temporal actually stores in Postgres: +For the 3-step workflow above, here's what Temporal actually stores in its database (Postgres, MySQL, or Cassandra depending on deployment): ``` Event# EventType Details @@ -416,13 +416,13 @@ Temporal's server is 4 services (Frontend, History, Matching, Worker) backed by │ │ │ │ replay, hit new await │ │ │ │ │ - │── gRPC Command ────────▶│── append events ─▶ Postgres │ + │── gRPC Command ────────▶│── append events ─▶ DB │ │ ScheduleActivityTask │── enqueue on task queue ──────▶│ │ │ │ │ │ execute fn() │ │ │ │ │◀── gRPC result ────────────────│ - │ │── append events ─▶ Postgres │ + │ │── append events ─▶ DB │ │ │ │ │◀── gRPC WorkflowTask ───│ │ │ (updated history) │ │ @@ -687,7 +687,7 @@ Windmill also offers a traditional flow builder — a visual drag-and-drop edito This is closer to Airflow's model — each step is an independent execution unit dispatched to a worker. But unlike Airflow: - **Steps can be in different languages** (TypeScript, Python, Go, Bash, SQL, etc.) within the same flow -- **Data passes via `results` context** (in-memory JSON between steps, not XCom) +- **Data passes via `results` context** — step results are stored in Postgres (like every engine where steps are separate jobs), but accessed transparently via `results.step_name` expressions in the flow definition - **The flow executor runs as a state machine** in the Windmill worker, not as a separate scheduler process - **Branching, loops, error handlers, and approval steps** are built-in flow constructs - **Each step can be a full Windmill script** with auto-generated UIs, schedules, etc. @@ -991,7 +991,7 @@ const service = restate.service({ **Why it's fast:** The commit point for "a step happened" is a local RocksDB write + quorum replication across nodes — no network round-trip to an external database. In our benchmarks, Restate achieved **4,600-6,700 workflows/sec** on the same hardware where Temporal did ~100 and Windmill did ~80. That's a **50x advantage** over Temporal. -**The insight:** This 50x gap isn't an implementation detail — it's a fundamental consequence of storage topology. Every engine that routes through an external Postgres pays ~1ms per step (network + SQL + WAL sync). Restate pays ~0.01-0.1ms (local disk + quorum ACK). Remove the network hop and you gain an order of magnitude. +**The insight:** This 50x gap isn't an implementation detail — it's a fundamental consequence of storage topology. Every engine that routes through an external database (Postgres, MySQL, Cassandra) pays ~1ms per step (network + query + WAL sync). Restate pays ~0.01-0.1ms (local disk + quorum ACK). Remove the network hop and you gain an order of magnitude. **The trade-off:** No SQL access to workflow state. No familiar Postgres tooling for backup/replication — you rely on Restate's built-in mechanisms (S3 snapshots, log-based replication). Younger ecosystem (SDK for TypeScript, Java, Rust, Go). And for durable execution specifically: no separate activity/workflow distinction — all code runs in the handler, and `ctx.run()` is your only escape hatch for side-effects. From 96e42a1ec21cd074bb2409f4273e949c28b60acf Mon Sep 17 00:00:00 2001 From: Ruben Fiszel Date: Wed, 15 Apr 2026 21:39:46 +0000 Subject: [PATCH 08/12] =?UTF-8?q?fix:=20rework=20Airflow=20cons=20to=20foc?= =?UTF-8?q?us=20on=20real=20issues=20=E2=80=94=20latency,=20Python-only,?= =?UTF-8?q?=20no=20visual=20editor?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Co-Authored-By: Claude Opus 4.6 (1M context) --- .../index.mdx | 28 ++++++------------- 1 file changed, 8 insertions(+), 20 deletions(-) diff --git a/blog/2026-04-15-workflow-engines-primer/index.mdx b/blog/2026-04-15-workflow-engines-primer/index.mdx index 4822436fc..2578dcfb5 100644 --- a/blog/2026-04-15-workflow-engines-primer/index.mdx +++ b/blog/2026-04-15-workflow-engines-primer/index.mdx @@ -137,23 +137,7 @@ Every state transition is a database write. The scheduler owns `scheduled → qu #### Data Passing Between Steps -Since tasks run in separate processes (possibly different machines), data must be serialized to shared storage. Airflow calls this "XCom" (cross-communication): - -```python -# Task A: push data -@task -def extract(): - return {"data": [1, 2, 3]} # Serialized to DB as a JSON blob - -# Task B: receives via function argument (deserialized from DB) -@task -def transform(raw): # raw = the JSON blob from extract() - return [x * 2 for x in raw["data"]] -``` - -Under the hood, this is `INSERT INTO xcom (key, value, ...) VALUES (...)` and `SELECT value FROM xcom WHERE ...`. There's typically a size limit (48KB in Postgres by default). For anything larger, you must use external storage (S3) and pass references. - -This pattern — step results stored in the database, next step reads them — is actually universal across engines where steps run in separate processes. Temporal stores activity results in the event history. Windmill stores step results in `v2_job.result` (Postgres JSONB). The difference is mostly in ergonomics: Airflow's XCom historically required explicit push/pull calls and had tight size limits, while newer engines handle serialization transparently. +Since tasks run in separate processes (possibly different machines), data must be serialized to shared storage. Airflow calls this "XCom" (cross-communication). All engines where steps are separate jobs share this pattern — Temporal stores results in event history, Windmill in Postgres JSONB — but Airflow's XCom has historically had the worst developer experience: tight size limits (48KB default), and in older versions, explicit `xcom_push`/`xcom_pull` calls. Newer Airflow versions with the `@task` decorator make this more transparent, but the size limits remain. #### The Executor Layer @@ -174,11 +158,15 @@ Each executor makes a different trade-off between isolation, latency, and operat #### Cons -**Latency.** A task takes 2-40 seconds to start (scheduler loop + executor dispatch + cold start), even if the actual work takes 1ms. This is fine for ETL pipelines where tasks run for minutes, but makes Airflow useless for real-time workflows. +**Latency and cold start.** A task takes 2-40 seconds to start (scheduler polling loop + executor dispatch + worker cold start), even if the actual work takes 1ms. The scheduler loop alone adds 1-5 seconds of delay. With the KubernetesExecutor, cold start can be 10-30 seconds per task (pod creation). This makes Airflow unsuitable for anything latency-sensitive. + +**Python-only.** DAGs are Python files. Tasks are Python functions. If your pipeline needs a TypeScript transform or a Go data processor, you shell out or use a BashOperator — no first-class polyglot support. + +**No visual editor.** Airflow has a monitoring UI (DAG view, Gantt charts, logs), but no visual flow builder. You define workflows in Python code, which is powerful but excludes non-developers from authoring workflows. -**Static DAGs.** The dependency graph is fixed at parse time. You can't say "if the result of Task A is X, skip Task B" in a truly dynamic way. Airflow 2.x added `@task.branch` and dynamic task mapping, but these are limited — you're still declaring branches upfront, not writing arbitrary control flow. +**Static DAGs.** The dependency graph is fixed at parse time. Airflow 2.x added `@task.branch` and dynamic task mapping, but you're still declaring branches upfront, not writing arbitrary runtime control flow. -**No durable execution.** If a task crashes mid-execution, all progress within that task is lost. Airflow retries the entire task from the beginning. There's no concept of "resume from where it left off." +**No durable execution.** If a task crashes mid-execution, all progress within that task is lost. Airflow retries the entire task from the beginning. **Parse overhead.** The scheduler re-parses all Python DAG files periodically. With thousands of DAGs, this alone can consume significant CPU and cause scheduling delays. From 9b65199996642c6ac41aa62e93e5a7e924b4acac Mon Sep 17 00:00:00 2001 From: Ruben Fiszel Date: Wed, 15 Apr 2026 21:41:26 +0000 Subject: [PATCH 09/12] docs: add concrete Airflow benchmark numbers and explain why cold start is high Co-Authored-By: Claude Opus 4.6 (1M context) --- blog/2026-04-15-workflow-engines-primer/index.mdx | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/blog/2026-04-15-workflow-engines-primer/index.mdx b/blog/2026-04-15-workflow-engines-primer/index.mdx index 2578dcfb5..833d9e170 100644 --- a/blog/2026-04-15-workflow-engines-primer/index.mdx +++ b/blog/2026-04-15-workflow-engines-primer/index.mdx @@ -158,7 +158,12 @@ Each executor makes a different trade-off between isolation, latency, and operat #### Cons -**Latency and cold start.** A task takes 2-40 seconds to start (scheduler polling loop + executor dispatch + worker cold start), even if the actual work takes 1ms. The scheduler loop alone adds 1-5 seconds of delay. With the KubernetesExecutor, cold start can be 10-30 seconds per task (pod creation). This makes Airflow unsuitable for anything latency-sensitive. +**Latency and cold start.** In [our benchmarks](https://www.windmill.dev/docs/misc/benchmarks/competitors/airflow), Airflow took **56 seconds to run 40 lightweight tasks** (~0.7 tasks/sec). Windmill completed the same workload in 2.4 seconds (~16.5 tasks/sec) — a **23x difference.** The overhead comes from multiple layers: +- **Scheduler polling loop**: the scheduler queries the DB every few seconds to find ready tasks. There's a built-in idle sleep between scheduling cycles. This alone adds 1-5 seconds of delay per task. +- **Python cold start**: each task forks a subprocess that loads the DAG file and all Python imports. A simple task can take 1-2 seconds just to start. Compare with Windmill's Bun cold start (~12ms) or dedicated workers (0ms). +- **Multi-hop dispatch**: task goes scheduler → DB → executor → message broker (Redis/RabbitMQ for Celery) → worker. Each hop adds latency. Windmill workers pull directly from Postgres (`SELECT ... FOR UPDATE SKIP LOCKED`). + +With the KubernetesExecutor, cold start grows to 10-30 seconds per task (pod creation). This makes Airflow unsuitable for anything latency-sensitive. **Python-only.** DAGs are Python files. Tasks are Python functions. If your pipeline needs a TypeScript transform or a Go data processor, you shell out or use a BashOperator — no first-class polyglot support. From a3061de06e0451cd406f12a2c7366bb0a9bdc981 Mon Sep 17 00:00:00 2001 From: Ruben Fiszel Date: Wed, 15 Apr 2026 21:43:06 +0000 Subject: [PATCH 10/12] fix: focus Airflow latency analysis on architecture (3-hop vs 1-hop), not language Co-Authored-By: Claude Opus 4.6 (1M context) --- blog/2026-04-15-workflow-engines-primer/index.mdx | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/blog/2026-04-15-workflow-engines-primer/index.mdx b/blog/2026-04-15-workflow-engines-primer/index.mdx index 833d9e170..fccedcc37 100644 --- a/blog/2026-04-15-workflow-engines-primer/index.mdx +++ b/blog/2026-04-15-workflow-engines-primer/index.mdx @@ -158,10 +158,10 @@ Each executor makes a different trade-off between isolation, latency, and operat #### Cons -**Latency and cold start.** In [our benchmarks](https://www.windmill.dev/docs/misc/benchmarks/competitors/airflow), Airflow took **56 seconds to run 40 lightweight tasks** (~0.7 tasks/sec). Windmill completed the same workload in 2.4 seconds (~16.5 tasks/sec) — a **23x difference.** The overhead comes from multiple layers: -- **Scheduler polling loop**: the scheduler queries the DB every few seconds to find ready tasks. There's a built-in idle sleep between scheduling cycles. This alone adds 1-5 seconds of delay per task. -- **Python cold start**: each task forks a subprocess that loads the DAG file and all Python imports. A simple task can take 1-2 seconds just to start. Compare with Windmill's Bun cold start (~12ms) or dedicated workers (0ms). -- **Multi-hop dispatch**: task goes scheduler → DB → executor → message broker (Redis/RabbitMQ for Celery) → worker. Each hop adds latency. Windmill workers pull directly from Postgres (`SELECT ... FOR UPDATE SKIP LOCKED`). +**Latency and cold start.** In [our benchmarks](https://www.windmill.dev/docs/misc/benchmarks/competitors/airflow), Airflow took **56 seconds to run 40 lightweight tasks** (~0.7 tasks/sec). Windmill completed the same workload in 2.4 seconds (~16.5 tasks/sec) — a **23x difference.** The overhead comes from architectural differences: +- **Three-hop dispatch**: in Airflow, a task goes scheduler (polls DB, resolves dependencies, checks pool limits) → DB state update → executor → message broker (Redis/RabbitMQ for Celery) → worker. Three separate components, each with their own polling interval and latency. In Windmill, the worker polls Postgres directly with `SELECT ... FOR UPDATE SKIP LOCKED` — one component, one hop. +- **Scheduler overhead**: Airflow's scheduler is a Python process that re-parses DAG files, evaluates dependencies, and checks concurrency limits — all in Python — before a task can even be enqueued. This adds 1-5 seconds per scheduling cycle. Windmill has no separate scheduler; workers self-schedule by pulling from the queue. +- **Cold start per task**: each Airflow task forks a subprocess that loads the entire DAG file + Airflow framework imports. Even for a trivial task, this can take 1-2 seconds. Windmill's cold start is lighter (~26ms for Python, ~12ms for Bun), and with dedicated workers it's 0ms — the process stays alive across jobs. With the KubernetesExecutor, cold start grows to 10-30 seconds per task (pod creation). This makes Airflow unsuitable for anything latency-sensitive. From bbe47b4c7980c3352d984734458de017781dc784 Mon Sep 17 00:00:00 2001 From: Ruben Fiszel Date: Thu, 16 Apr 2026 18:05:24 +0000 Subject: [PATCH 11/12] blog: address feedback on workflow engines primer - Reframe "Three Generations" as two generations plus a visual UX over Gen 1 (flow builders are an authoring mode, not a distinct generation) - Wrap the 5-engine code comparison and dynamic-control-flow comparison in Tabs so readers don't have to scroll top-to-bottom - Add a Benchmark Methodology section explaining why only Temporal is compared and defining seq_N / par_N / fan_out_N; note repro repo is planned - Replace most ASCII box/sequence diagrams with React components: DAG scheduler model, Airflow state machine, Temporal replay sequence, Inngest HTTP roundtrips, persistence spectrum, worker architecture, GitHub stars bar chart - Reduce em dashes across the post (periods/colons/commas where they were lazy sentence joiners; kept genuine parentheticals) Co-Authored-By: Claude Opus 4.6 (1M context) --- .../_diagrams.tsx | 360 ++++++++++++++++++ .../index.mdx | 301 +++++++-------- 2 files changed, 497 insertions(+), 164 deletions(-) create mode 100644 blog/2026-04-15-workflow-engines-primer/_diagrams.tsx diff --git a/blog/2026-04-15-workflow-engines-primer/_diagrams.tsx b/blog/2026-04-15-workflow-engines-primer/_diagrams.tsx new file mode 100644 index 000000000..d1afb7fb8 --- /dev/null +++ b/blog/2026-04-15-workflow-engines-primer/_diagrams.tsx @@ -0,0 +1,360 @@ +import React from 'react'; + +// Shared styles +const box = "rounded-md border border-gray-200 dark:border-slate-700 bg-white dark:bg-slate-900 p-3 text-sm"; +const label = "text-xs font-mono text-gray-500 dark:text-gray-400 uppercase tracking-wide mb-2"; +const arrow = "flex items-center justify-center text-gray-400 dark:text-gray-500"; + +// --- 1. DAG Scheduler Model --- +export function DAGSchedulerModel() { + const tasks = ['Task A', 'Task B', 'Task C']; + const steps = [ + 'Parse graph', + 'Poll: which tasks are ready?', + 'Dispatch ready tasks', + 'Wait for completion', + 'Repeat', + ]; + return ( +
+
+ DAG Scheduler Model +
+
+
+
You define
+
+ {tasks.map((t, i) => ( +
+ + {String.fromCharCode(65 + i)} + + {t} +
+ ))} +
+
+
+
Scheduler does
+
    + {steps.map((s, i) => ( +
  1. + + {i + 1} + + {s} +
  2. + ))} +
+
+
+
+ Data passes via external storage (DB, S3, XCom). Tasks are independent processes. +
+
+ ); +} + +// --- 2. Airflow State Machine --- +export function AirflowStateMachine() { + const states = [ + { name: 'none', color: 'gray' }, + { name: 'scheduled', color: 'blue' }, + { name: 'queued', color: 'indigo' }, + { name: 'running', color: 'amber' }, + { name: 'success', color: 'green' }, + ]; + const colorMap: Record = { + gray: 'bg-gray-100 dark:bg-slate-800 text-gray-700 dark:text-gray-300 border-gray-300 dark:border-slate-600', + blue: 'bg-blue-100 dark:bg-blue-950 text-blue-700 dark:text-blue-300 border-blue-300 dark:border-blue-800', + indigo: 'bg-indigo-100 dark:bg-indigo-950 text-indigo-700 dark:text-indigo-300 border-indigo-300 dark:border-indigo-800', + amber: 'bg-amber-100 dark:bg-amber-950 text-amber-700 dark:text-amber-300 border-amber-300 dark:border-amber-800', + green: 'bg-green-100 dark:bg-green-950 text-green-700 dark:text-green-300 border-green-300 dark:border-green-800', + red: 'bg-red-100 dark:bg-red-950 text-red-700 dark:text-red-300 border-red-300 dark:border-red-800', + }; + return ( +
+
+ {states.map((s, i) => ( + + {s.name} + {i < states.length - 1 && } + + ))} +
+
+ running + + failed + + up_for_retry + + scheduled + → ... +
+
+ ); +} + +// --- 3. Persistence Spectrum --- +export function PersistenceSpectrum() { + const points = [ + { label: 'No persistence', sub: 'plain code', pos: 0 }, + { label: 'Per-task', sub: 'Airflow', pos: 33 }, + { label: 'Per-step', sub: 'Temporal, Inngest, Windmill WAC', pos: 66 }, + { label: 'Per-side-effect', sub: 'Restate-style journaling', pos: 100 }, + ]; + return ( +
+
+ {points.map((p) => ( +
+
+
+
{p.label}
+
{p.sub}
+
+
+ ))} +
+
+
+
Fastest
+
No durability
+
No overhead
+
+
+
Most durable
+
Most overhead
+
+
+
+ ); +} + +// --- 4. Worker Architecture --- +export function WorkerArchitecture() { + const columns = [ + { + title: 'Workers → DB directly', + examples: 'Airflow, Windmill', + cost: '~1-5ms per step', + note: 'Each step = DB round-trip', + color: 'amber', + }, + { + title: 'Workers → Server → DB', + examples: 'Temporal, Inngest', + cost: '~0.3-1ms per step', + note: 'Server mediates + batches', + color: 'blue', + }, + { + title: 'Workers = Runtime', + examples: 'Restate', + cost: '~0.01ms per step', + note: 'No external DB', + color: 'green', + }, + ]; + const bar: Record = { + amber: 'bg-amber-200 dark:bg-amber-900', + blue: 'bg-blue-200 dark:bg-blue-900', + green: 'bg-green-200 dark:bg-green-900', + }; + return ( +
+ {columns.map((c) => ( +
+
{c.title}
+
{c.examples}
+
+
{c.cost}
+
{c.note}
+
+ ))} +
+ ); +} + +// --- 5. GitHub Stars Bar Chart --- +export function GithubStarsChart() { + const engines = [ + { name: 'Airflow', stars: 45050, color: 'bg-sky-500' }, + { name: 'Prefect', stars: 22177, color: 'bg-indigo-500' }, + { name: 'Temporal', stars: 19598, color: 'bg-violet-500' }, + { name: 'Windmill', stars: 16241, color: 'bg-emerald-500' }, + { name: 'Hatchet', stars: 6826, color: 'bg-orange-500' }, + { name: 'Inngest', stars: 5202, color: 'bg-pink-500' }, + { name: 'Restate', stars: 3729, color: 'bg-rose-500' }, + { name: 'DBOS (TS+Py)', stars: 2404, color: 'bg-amber-500' }, + ]; + const max = Math.max(...engines.map((e) => e.stars)); + return ( +
+
+ GitHub Stars (April 2026) +
+
+ {engines.map((e) => ( +
+
+ {e.name} +
+
+
+ {e.stars.toLocaleString()} +
+
+
+ ))} +
+
+ DBOS combined: 1,137 TypeScript + 1,267 Python +
+
+ ); +} + +// --- 6. Replay Sequence Diagram (Temporal) --- +export function TemporalReplaySequence() { + const runs = [ + { + label: 'Execution 1', + steps: [ + { name: 'await getOrder', status: 'run', note: 'no event' }, + { name: 'schedule activity', status: 'yield', note: '' }, + ], + }, + { + label: 'Execution 2', + steps: [ + { name: 'await getOrder', status: 'cached', note: 'event: completed(order)' }, + { name: 'await chargePayment', status: 'run', note: 'no event' }, + { name: 'schedule activity', status: 'yield', note: '' }, + ], + }, + { + label: 'Execution 3', + steps: [ + { name: 'await getOrder', status: 'cached', note: 'event' }, + { name: 'await chargePayment', status: 'cached', note: 'event' }, + { name: 'await shipOrder', status: 'run', note: 'no event' }, + { name: 'schedule activity', status: 'yield', note: '' }, + ], + }, + ]; + const statusStyle: Record = { + cached: 'bg-green-100 dark:bg-green-950 text-green-700 dark:text-green-300 border-green-200 dark:border-green-900', + run: 'bg-blue-100 dark:bg-blue-950 text-blue-700 dark:text-blue-300 border-blue-200 dark:border-blue-900', + yield: 'bg-amber-100 dark:bg-amber-950 text-amber-700 dark:text-amber-300 border-amber-200 dark:border-amber-900', + }; + const statusLabel: Record = { + cached: 'skipped (replay)', + run: 'executed', + yield: 'suspend', + }; + return ( +
+ {runs.map((run) => ( +
+
{run.label}
+
+ {run.steps.map((s, i) => ( + +
+
{s.name}
+
+ {statusLabel[s.status]}{s.note && ` · ${s.note}`} +
+
+ {i < run.steps.length - 1 && } +
+ ))} +
+
+ ))} +
+ ); +} + +// --- 7. Inngest HTTP Roundtrips --- +export function InngestHttpRoundtrips() { + const requests = [ + { + label: 'Request 1', + memo: [], + body: [ + { name: 'get-order', status: 'run' }, + ], + response: 'step_result: get-order', + }, + { + label: 'Request 2', + memo: ['get-order'], + body: [ + { name: 'get-order', status: 'cached' }, + { name: 'charge', status: 'run' }, + ], + response: 'step_result: charge', + }, + { + label: 'Request 3', + memo: ['get-order', 'charge'], + body: [ + { name: 'get-order', status: 'cached' }, + { name: 'charge', status: 'cached' }, + { name: 'ship', status: 'run' }, + ], + response: 'step_result: ship', + }, + { + label: 'Request 4', + memo: ['get-order', 'charge', 'ship'], + body: [ + { name: 'get-order', status: 'cached' }, + { name: 'charge', status: 'cached' }, + { name: 'ship', status: 'cached' }, + ], + response: 'complete: true', + }, + ]; + const statusStyle: Record = { + cached: 'bg-green-100 dark:bg-green-950 text-green-700 dark:text-green-300', + run: 'bg-blue-100 dark:bg-blue-950 text-blue-700 dark:text-blue-300 font-semibold', + }; + return ( +
+ {requests.map((r) => ( +
+
+
{r.label}
+
+ memo: [{r.memo.join(', ') || '∅'}] +
+
+
+ {r.body.map((s, i) => ( + + + {s.name} + + {i < r.body.length - 1 && } + + ))} +
+
+ ← response: {`{ ${r.response} }`} +
+
+ ))} +
+ ); +} diff --git a/blog/2026-04-15-workflow-engines-primer/index.mdx b/blog/2026-04-15-workflow-engines-primer/index.mdx index fccedcc37..eac658b92 100644 --- a/blog/2026-04-15-workflow-engines-primer/index.mdx +++ b/blog/2026-04-15-workflow-engines-primer/index.mdx @@ -6,7 +6,18 @@ description: 'A deep dive comparing Airflow, Prefect, Temporal, Inngest, and Win title: 'From Cron to Durable Execution: A Primer on Workflow Engines' --- +import Tabs from '@theme/Tabs'; +import TabItem from '@theme/TabItem'; import DocCard from '@site/src/components/DocCard'; +import { + DAGSchedulerModel, + AirflowStateMachine, + PersistenceSpectrum, + WorkerArchitecture, + GithubStarsChart, + TemporalReplaySequence, + InngestHttpRoundtrips, +} from './_diagrams'; A deep dive comparing Airflow, Prefect, Temporal, Inngest, and Windmill — how they work internally, their trade-offs, and real benchmarks. Plus honorable mentions for Restate, DBOS, and Hatchet. @@ -33,11 +44,11 @@ The fundamental problem: **a sequence of side-effects spread across time and net Every workflow engine is a different answer to the same question: **how do you coordinate multiple fallible side-effects so that the overall process makes progress, even when individual steps fail?** -The answers cluster into three generations, each with a different core abstraction. +The answers cluster into two generations of core abstractions, with a visual UX mode sitting on top of Gen 1. --- -## The Three Generations +## Two Generations (and a Visual UX) ``` Generation 1: DAG Schedulers Airflow, Prefect @@ -48,12 +59,13 @@ Generation 2: Durable Execution Temporal, Inngest, Windmill WAC "Write normal code, the runtime makes it survive crashes" -Hybrid: Visual Flow Builder Windmill Flows - "Drag-and-drop steps, - JSON-defined DAG with code steps" +Visual UX: Flow Builder Windmill Flows + (built on top of a Gen 1 engine) "Drag-and-drop steps, JSON DAG" ``` -The shift from Gen 1 to Gen 2 mirrors a broader shift in computer science: from **declarative** (describe the computation) to **imperative** (write the computation, let the infrastructure handle durability). Neither is universally better — they solve different problems. +The shift from Gen 1 to Gen 2 mirrors a broader shift in computer science: from **declarative** (describe the computation) to **imperative** (write the computation, let the infrastructure handle durability). Neither is universally better. They solve different problems. + +Visual flow builders like Windmill Flows are not a third generation. They are a different authoring experience (drag-and-drop, JSON DAG) over the same Gen 1 execution model, aimed at users who prefer a UI over code. --- @@ -65,23 +77,7 @@ A DAG scheduler separates **what to do** (your task code) from **when and where The key property: **tasks are independent units of work.** They don't share memory. They don't know about each other. They communicate through external storage. The scheduler is the only component that understands the full picture. -``` -┌──────────────────────────────────────────────────────┐ -│ DAG Scheduler Model │ -│ │ -│ You define: Scheduler does: │ -│ │ -│ [Task A] ──┐ 1. Parse graph │ -│ ├──→ 2. Poll: which tasks are ready? │ -│ [Task B] ──┘ 3. Dispatch ready tasks │ -│ │ 4. Wait for completion │ -│ ▼ 5. Repeat from 2 │ -│ [Task C] │ -│ │ -│ Data passes via external storage (DB, S3, XCom) │ -│ Tasks are independent processes │ -└──────────────────────────────────────────────────────┘ -``` + ### Airflow: The Incumbent @@ -110,7 +106,7 @@ def etl_pipeline(): load(transformed) ``` -**The fundamental misunderstanding about Airflow**: this looks like Python calling functions, but it isn't. At parse time, no functions execute. Airflow builds a dependency graph from the return value annotations. The actual execution happens later — possibly minutes later, on a different machine. +**The fundamental misunderstanding about Airflow**: this looks like Python calling functions, but it isn't. At parse time, no functions execute. Airflow builds a dependency graph from the return value annotations. The actual execution happens later, possibly minutes later, on a different machine. #### How the Scheduler Works @@ -128,20 +124,17 @@ Every ~5 seconds: Each task passes through a state machine stored in the database: -``` - none → scheduled → queued → running → success - └──→ failed → up_for_retry → scheduled → ... -``` + Every state transition is a database write. The scheduler owns `scheduled → queued`. The executor owns `queued → running`. The worker owns `running → success/failed`. #### Data Passing Between Steps -Since tasks run in separate processes (possibly different machines), data must be serialized to shared storage. Airflow calls this "XCom" (cross-communication). All engines where steps are separate jobs share this pattern — Temporal stores results in event history, Windmill in Postgres JSONB — but Airflow's XCom has historically had the worst developer experience: tight size limits (48KB default), and in older versions, explicit `xcom_push`/`xcom_pull` calls. Newer Airflow versions with the `@task` decorator make this more transparent, but the size limits remain. +Since tasks run in separate processes (possibly different machines), data must be serialized to shared storage. Airflow calls this "XCom" (cross-communication). All engines where steps are separate jobs share this pattern (Temporal stores results in event history, Windmill in Postgres JSONB), but Airflow's XCom has historically had the worst developer experience: tight size limits (48KB default), and in older versions, explicit `xcom_push`/`xcom_pull` calls. Newer Airflow versions with the `@task` decorator make this more transparent, but the size limits remain. #### The Executor Layer -Airflow's executor is pluggable — one of its best design decisions: +Airflow's executor is pluggable, one of its best design decisions: - **LocalExecutor**: forks a subprocess per task. Simple, single-machine. - **CeleryExecutor**: sends tasks to a message broker (Redis/RabbitMQ). Celery workers pick them up. Most common production setup. @@ -158,14 +151,14 @@ Each executor makes a different trade-off between isolation, latency, and operat #### Cons -**Latency and cold start.** In [our benchmarks](https://www.windmill.dev/docs/misc/benchmarks/competitors/airflow), Airflow took **56 seconds to run 40 lightweight tasks** (~0.7 tasks/sec). Windmill completed the same workload in 2.4 seconds (~16.5 tasks/sec) — a **23x difference.** The overhead comes from architectural differences: -- **Three-hop dispatch**: in Airflow, a task goes scheduler (polls DB, resolves dependencies, checks pool limits) → DB state update → executor → message broker (Redis/RabbitMQ for Celery) → worker. Three separate components, each with their own polling interval and latency. In Windmill, the worker polls Postgres directly with `SELECT ... FOR UPDATE SKIP LOCKED` — one component, one hop. -- **Scheduler overhead**: Airflow's scheduler is a Python process that re-parses DAG files, evaluates dependencies, and checks concurrency limits — all in Python — before a task can even be enqueued. This adds 1-5 seconds per scheduling cycle. Windmill has no separate scheduler; workers self-schedule by pulling from the queue. -- **Cold start per task**: each Airflow task forks a subprocess that loads the entire DAG file + Airflow framework imports. Even for a trivial task, this can take 1-2 seconds. Windmill's cold start is lighter (~26ms for Python, ~12ms for Bun), and with dedicated workers it's 0ms — the process stays alive across jobs. +**Latency and cold start.** In [our benchmarks](https://www.windmill.dev/docs/misc/benchmarks/competitors/airflow), Airflow took **56 seconds to run 40 lightweight tasks** (~0.7 tasks/sec). Windmill completed the same workload in 2.4 seconds (~16.5 tasks/sec), a **23x difference.** The overhead comes from architectural differences: +- **Three-hop dispatch**: in Airflow, a task goes scheduler (polls DB, resolves dependencies, checks pool limits) → DB state update → executor → message broker (Redis/RabbitMQ for Celery) → worker. Three separate components, each with their own polling interval and latency. In Windmill, the worker polls Postgres directly with `SELECT ... FOR UPDATE SKIP LOCKED`: one component, one hop. +- **Scheduler overhead**: Airflow's scheduler is a Python process that re-parses DAG files, evaluates dependencies, and checks concurrency limits, all in Python, before a task can even be enqueued. This adds 1-5 seconds per scheduling cycle. Windmill has no separate scheduler; workers self-schedule by pulling from the queue. +- **Cold start per task**: each Airflow task forks a subprocess that loads the entire DAG file + Airflow framework imports. Even for a trivial task, this can take 1-2 seconds. Windmill's cold start is lighter (~26ms for Python, ~12ms for Bun), and with dedicated workers it's 0ms: the process stays alive across jobs. With the KubernetesExecutor, cold start grows to 10-30 seconds per task (pod creation). This makes Airflow unsuitable for anything latency-sensitive. -**Python-only.** DAGs are Python files. Tasks are Python functions. If your pipeline needs a TypeScript transform or a Go data processor, you shell out or use a BashOperator — no first-class polyglot support. +**Python-only.** DAGs are Python files. Tasks are Python functions. If your pipeline needs a TypeScript transform or a Go data processor, you shell out or use a BashOperator. There is no first-class polyglot support. **No visual editor.** Airflow has a monitoring UI (DAG view, Gantt charts, logs), but no visual flow builder. You define workflows in Python code, which is powerful but excludes non-developers from authoring workflows. @@ -203,11 +196,11 @@ def etl_pipeline(): load(transformed) ``` -This looks almost identical to Airflow, but with a crucial difference: **the code actually runs as Python.** When `etl_pipeline()` is called, `extract()` really executes `extract()`. There's no graph construction phase — the DAG is implicit from the call order. +This looks almost identical to Airflow, but with a crucial difference: **the code actually runs as Python.** When `etl_pipeline()` is called, `extract()` really executes `extract()`. There's no graph construction phase. The DAG is implicit from the call order. #### The Hybrid Execution Model -Prefect sits between Generation 1 and Generation 2. Tasks execute in the same process as the flow (by default), so there's no XCom problem — data passes through Python variables. But each task run is tracked by the Prefect server via a REST API: +Prefect sits between Generation 1 and Generation 2. Tasks execute in the same process as the flow (by default), so there's no XCom problem: data passes through Python variables. But each task run is tracked by the Prefect server via a REST API: ``` @task runs: @@ -235,7 +228,7 @@ def parallel_pipeline(): #### Pros - **Zero new concepts for Python developers.** Decorators on regular functions. Python control flow. Python data passing. -- **Dynamic workflows.** Since the code is real Python, you can use `if/else`, `for` loops, `try/except` — anything. The "DAG" is whatever Python actually executes. +- **Dynamic workflows.** Since the code is real Python, you can use `if/else`, `for` loops, `try/except`, anything. The "DAG" is whatever Python actually executes. - **Lower ceremony than Airflow.** No scheduler process. No DAG file parsing. Just run the flow. #### Cons @@ -284,10 +277,10 @@ async function processOrder(order) { The runtime intercepts each `await` and ensures that: 1. The result is **durably persisted** before execution continues -2. On crash, the function **resumes from where it left off** — already-completed steps are not re-executed +2. On crash, the function **resumes from where it left off**. Already-completed steps are not re-executed 3. Side effects happen **at least once** (and ideally exactly once) -The key insight: **the `await` keyword is the persistence boundary.** Everything between two `await`s is either fully completed or fully retried — never partially executed. +The key insight: **the `await` keyword is the persistence boundary.** Everything between two `await`s is either fully completed or fully retried, never partially executed. But the implementations differ wildly in how they achieve this. @@ -296,14 +289,14 @@ But the implementations differ wildly in how they achieve this. Temporal (2019, ex-Uber Cadence team) is the most well-known durable execution engine. Its core abstraction: **record every state change as an immutable event, then replay events to reconstruct state.** ```typescript -// Workflow — must be deterministic (sandboxed in TS, by convention in Go/Java) +// Workflow: must be deterministic (sandboxed in TS, by convention in Go/Java) export async function processOrder(orderId: string) { const order = await activities.getOrder(orderId); const payment = await activities.chargePayment(order); await activities.shipOrder(payment); } -// Activity — runs in normal Node.js, can do anything +// Activity: runs in normal Node.js, can do anything export async function chargePayment(order: Order): Promise { return stripe.charges.create({ amount: order.total }); } @@ -313,11 +306,11 @@ export async function chargePayment(order: Order): Promise { Temporal enforces a strict separation: -- **Workflow code** must be deterministic — no I/O, no randomness, no direct clock access. How strictly this is enforced depends on the SDK: +- **Workflow code** must be deterministic: no I/O, no randomness, no direct clock access. How strictly this is enforced depends on the SDK: - **TypeScript**: the strictest. Workflows run in a V8 isolate with `Math.random()`, `Date()`, `setTimeout()` replaced by deterministic versions. Node.js APIs (`fs`, `http`, `fetch`) are blocked at the bundler level. - **Python**: a sandbox using proxy objects and a custom module importer restricts most non-deterministic access at runtime. - - **Go and Java**: **no sandbox.** Determinism is enforced by convention — developers are told not to use goroutines/threads, system clocks, or randomness. Violations are only caught at replay time (non-determinism error), not at compile time. -- **Activity code** runs in normal Node.js / Python / Go. It can do anything — call APIs, write to databases, generate random numbers. + - **Go and Java**: **no sandbox.** Determinism is enforced by convention. Developers are told not to use goroutines/threads, system clocks, or randomness. Violations are only caught at replay time (non-determinism error), not at compile time. +- **Activity code** runs in normal Node.js / Python / Go. It can do anything: call APIs, write to databases, generate random numbers. This split exists because of Temporal's replay mechanism. @@ -327,14 +320,7 @@ Every time a workflow makes a decision (schedule an activity, start a timer, sen When the workflow needs to resume (after an activity completes, after a crash, after a timer fires), the **entire workflow function re-executes from the beginning.** But this time, the SDK checks the event history: -``` -Execution 1: run → await getOrder → [no event] → schedule activity → YIELD -Execution 2: run → await getOrder → [event: completed(order)] → return recorded result - → await chargePayment → [no event] → schedule activity → YIELD -Execution 3: run → await getOrder → [event] → skip - → await chargePayment → [event] → skip - → await shipOrder → [no event] → schedule activity → YIELD -``` + Each execution replays all previous steps (returning results from the event history) and then advances one step. This is **event sourcing applied to code execution**. @@ -376,10 +362,10 @@ Event# EventType Details Since the workflow function is replayed from the beginning on every resume, it **must produce the same sequence of commands on every execution.** If you used `Math.random()` to decide whether to call activity A or B, replay would make a different choice and Temporal would throw a **non-determinism error**. -This is the most common source of developer pain with Temporal. You must learn to think about which code is "workflow" (deterministic orchestration) and which is "activity" (actual work). In TypeScript, the sandbox catches most violations immediately. In Go or Java, a third-party library that calls `time.Now()` or `Math.random()` will silently work until replay fails — potentially in production, weeks after deployment. +This is the most common source of developer pain with Temporal. You must learn to think about which code is "workflow" (deterministic orchestration) and which is "activity" (actual work). In TypeScript, the sandbox catches most violations immediately. In Go or Java, a third-party library that calls `time.Now()` or `Math.random()` will silently work until replay fails, potentially in production, weeks after deployment. ```typescript -// ❌ BROKEN — non-deterministic +// ❌ BROKEN: non-deterministic export async function myWorkflow() { if (Math.random() > 0.5) { // Different on replay! await activities.pathA(); @@ -388,7 +374,7 @@ export async function myWorkflow() { } } -// ✅ CORRECT — decision based on activity result +// ✅ CORRECT: decision based on activity result export async function myWorkflow() { const coin = await activities.flipCoin(); // Recorded in history if (coin > 0.5) { @@ -464,32 +450,7 @@ export const processOrder = inngest.createFunction( Inngest's execution model is unlike anything else. Your code runs as a **stateless HTTP endpoint**. The Inngest server orchestrates execution by making HTTP calls to your endpoint: -``` -Request 1 (no steps completed): - Server POST → your endpoint - Code runs: step.run("get-order", fn) → fn executes → returns order - Response: { step_result: "get-order", data: order } - Server stores result. - -Request 2 (get-order completed): - Server POST → your endpoint (with memoized results) - Code runs: step.run("get-order", fn) → memoized, returns stored result - Code runs: step.run("charge", fn) → fn executes → returns receipt - Response: { step_result: "charge", data: receipt } - Server stores result. - -Request 3 (get-order + charge completed): - Server POST → your endpoint (with memoized results) - Code runs: step.run("get-order") → memoized - Code runs: step.run("charge") → memoized - Code runs: step.run("ship", fn) → fn executes - Response: { step_result: "ship", data: tracking } - -Request 4 (all steps completed): - Server POST → your endpoint - All steps memoized, function returns. - Response: { complete: true, result: ... } -``` + **Each `step.run()` = one HTTP round-trip.** The function re-executes from the top on every request, but completed steps return instantly from memoized results. @@ -497,7 +458,7 @@ Request 4 (all steps completed): This design choice has profound implications: -**Pro: Truly stateless workers.** Your code is a regular HTTP endpoint — deploy it on Vercel, AWS Lambda, Cloudflare Workers, a Docker container, anywhere. No persistent worker process, no gRPC connection to maintain, no special runtime. The Inngest server handles all state. +**Pro: Truly stateless workers.** Your code is a regular HTTP endpoint. Deploy it on Vercel, AWS Lambda, Cloudflare Workers, a Docker container, anywhere. No persistent worker process, no gRPC connection to maintain, no special runtime. The Inngest server handles all state. **Pro: No new infrastructure for the developer.** You add Inngest to your existing Express/Next.js/Flask app. No separate worker binary, no task queue, no Celery/RabbitMQ. @@ -505,7 +466,7 @@ This design choice has profound implications: **Con: Highest per-step latency.** Every step = HTTP request + response + memoized replay of all previous steps. A 10-step workflow makes 10 HTTP requests, and the 10th request re-executes (and skips) all 9 previous steps before running the 10th. -**Con: Full re-execution per step.** Like Temporal, the function re-runs from the beginning. Unlike Temporal, there's no compiled workflow bundle or V8 isolate — it's a full HTTP request with all the associated overhead (routing, middleware, JSON parsing). +**Con: Full re-execution per step.** Like Temporal, the function re-runs from the beginning. Unlike Temporal, there's no compiled workflow bundle or V8 isolate. It's a full HTTP request with all the associated overhead (routing, middleware, JSON parsing). #### The Memoization Distinction @@ -521,7 +482,7 @@ The practical difference: Temporal's replay is in-process (fast, ~microseconds p - **Simplest deployment model.** Add it to your existing app. No infrastructure beyond the Inngest server (which can be self-hosted or cloud). - **Serverless-native.** Works perfectly with Lambda/Vercel/Cloudflare. No persistent connections to maintain. - **Event-driven.** First-class event system with fan-out, debounce, throttle. -- **Server-side sleep.** `step.sleep("1h")` doesn't hold a process — the server wakes your function after 1 hour. +- **Server-side sleep.** `step.sleep("1h")` doesn't hold a process. The server wakes your function after 1 hour. #### Cons @@ -545,7 +506,7 @@ const getOrder = task(async (id: string) => { export const main = workflow(async () => { const order = await getOrder("order-123"); - // step() executes inline — no child job, no dispatch + // step() executes inline: no child job, no dispatch const total = await step("calc-total", () => order.items.reduce((sum, i) => sum + i.price, 0) ); @@ -559,9 +520,9 @@ export const main = workflow(async () => { Windmill is unique in offering two step types with very different execution models: -**`task()`** — dispatches a child job. Separate process, separate resource limits, visible as an independent job in the UI. Like Temporal's activities. +**`task()`** dispatches a child job. Separate process, separate resource limits, visible as an independent job in the UI. Like Temporal's activities. -**`step()`** — executes inline in the same process. No child job, no queue hop. Result is checkpointed to the database. Like Temporal's local activities, but with a fast-path optimization: the SDK POSTs the step result directly to the API server while the script continues running. No suspend/resume cycle. +**`step()`** executes inline in the same process. No child job, no queue hop. Result is checkpointed to the database. Like Temporal's local activities, but with a fast-path optimization: the SDK POSTs the step result directly to the API server while the script continues running. No suspend/resume cycle. This dual model reflects a real insight: **not all steps are equally expensive, and forcing them through the same dispatch mechanism is wasteful.** A database query and a CSV parsing step don't need the same isolation guarantees. @@ -595,7 +556,7 @@ With `step()`, there's no suspend/resume at all. The SDK: 1. Executes the function body in-process 2. POSTs the result to `POST /wac/inline_checkpoint/{job_id}` on the API server 3. The API server writes a single JSONB delta to Postgres -4. The script continues immediately — no process restart +4. The script continues immediately. No process restart This means a 100-step workflow using `step()` runs as **one continuous Bun process** making 100 HTTP POSTs. There's no re-execution, no replay, no suspend/resume. @@ -639,17 +600,17 @@ The trade-off: checkpoints don't give you an audit trail of *when* each step ran #### Cons -- **Per-workflow cold start (without dedicated workers).** By default, each workflow spawns a new Bun process (~12ms). This is the main throughput bottleneck for short workflows. Windmill offers [dedicated workers](/docs/core_concepts/dedicated_workers) — persistent Bun processes assigned to specific scripts or workspaces — which eliminate this cold start entirely and bring execution closer to Temporal's persistent worker model. +- **Per-workflow cold start (without dedicated workers).** By default, each workflow spawns a new Bun process (~12ms). This is the main throughput bottleneck for short workflows. Windmill offers [dedicated workers](/docs/core_concepts/dedicated_workers), persistent Bun processes assigned to specific scripts or workspaces, which eliminate this cold start entirely and bring execution closer to Temporal's persistent worker model. - **One job per worker.** Workers process one workflow at a time. Temporal handles 200+ concurrent activities per worker. -- **Workers talk to Postgres directly by default.** No server-mediated batching — each step result is an individual PG transaction. Windmill also supports an [agent mode](/docs/core_concepts/agent_workers) where workers communicate with the server over HTTP/WebSocket instead of connecting to Postgres directly, which is useful for remote/edge deployments but doesn't yet batch writes. +- **Workers talk to Postgres directly by default.** No server-mediated batching. Each step result is an individual PG transaction. Windmill also supports an [agent mode](/docs/core_concepts/agent_workers) where workers communicate with the server over HTTP/WebSocket instead of connecting to Postgres directly, which is useful for remote/edge deployments but doesn't yet batch writes. --- -## The Hybrid: Visual Flow Builder +## Visual UX: Flow Builders ### Windmill Flows: JSON DAG + Code Steps -Windmill also offers a traditional flow builder — a visual drag-and-drop editor that produces JSON-defined DAGs: +Separate from its Gen-2 Workflow-as-Code engine, Windmill also offers a traditional flow builder. It is a visual drag-and-drop editor that produces JSON-defined DAGs, executed by a Gen-1 style engine: ```json { @@ -677,15 +638,15 @@ Windmill also offers a traditional flow builder — a visual drag-and-drop edito } ``` -This is closer to Airflow's model — each step is an independent execution unit dispatched to a worker. But unlike Airflow: +This is closer to Airflow's model: each step is an independent execution unit dispatched to a worker. But unlike Airflow: - **Steps can be in different languages** (TypeScript, Python, Go, Bash, SQL, etc.) within the same flow -- **Data passes via `results` context** — step results are stored in Postgres (like every engine where steps are separate jobs), but accessed transparently via `results.step_name` expressions in the flow definition +- **Data passes via `results` context**: step results are stored in Postgres (like every engine where steps are separate jobs), but accessed transparently via `results.step_name` expressions in the flow definition - **The flow executor runs as a state machine** in the Windmill worker, not as a separate scheduler process - **Branching, loops, error handlers, and approval steps** are built-in flow constructs - **Each step can be a full Windmill script** with auto-generated UIs, schedules, etc. -The flow builder is not programmatic — it's a UI. This makes it more accessible to non-developers but less flexible than code-based approaches. It sits between Airflow (Python DAGs) and Temporal (pure code) in the abstraction spectrum. +The flow builder is not programmatic. It's a UI. This makes it more accessible to non-developers but less flexible than code-based approaches. It sits between Airflow (Python DAGs) and Temporal (pure code) in the abstraction spectrum. --- @@ -695,7 +656,9 @@ The flow builder is not programmatic — it's a UI. This makes it more accessibl Let's implement the same 3-step workflow across all engines to see the differences in expressiveness: -**Airflow:** + + + ```python @task def get_order(id): @@ -716,7 +679,9 @@ def process_order(): ship(payment) ``` -**Prefect:** + + + ```python @task def get_order(id): @@ -737,7 +702,9 @@ def process_order(): ship(payment) ``` -**Temporal:** + + + ```typescript // Workflow file (deterministic sandbox) export async function processOrder() { @@ -752,7 +719,9 @@ export async function charge(order) { return stripe.charges.create({amount: orde export async function ship(payment) { return shipping.dispatch(payment); } ``` -**Inngest:** + + + ```typescript export const processOrder = inngest.createFunction( { id: "process-order" }, @@ -765,7 +734,9 @@ export const processOrder = inngest.createFunction( ); ``` -**Windmill WAC:** + + + ```typescript import { step, workflow } from "windmill-client"; @@ -776,21 +747,24 @@ export const main = workflow(async () => { }); ``` + + + Notice how the code converges. Gen 2 engines (Temporal, Inngest, Windmill) all look like decorated async functions. The differences are: - **Temporal** forces you to split into workflow + activities files. Strictest, but enables a deterministic sandbox. -- **Inngest** wraps each step in `step.run()`. Simplest — no build step, works in any HTTP framework. +- **Inngest** wraps each step in `step.run()`. Simplest: no build step, works in any HTTP framework. - **Windmill** offers `step()` (inline) and `task()` (dispatched). Most flexible per-step control. -- **Airflow / Prefect** look similar but the execution model is fundamentally different — no durable execution within a task. +- **Airflow / Prefect** look similar but the execution model is fundamentally different: no durable execution within a task. ### Dynamic Control Flow -Where the engines truly diverge is control flow: +Where the engines truly diverge is control flow. Consider the same requirement ("if payment fails, send to manual review queue"): -```typescript -// "If payment fails, send to manual review queue" + + -// Temporal — full imperative control flow +```typescript export async function processOrder() { const order = await activities.getOrder("123"); try { @@ -802,8 +776,14 @@ export async function processOrder() { } await activities.ship(order); } +``` + +Full imperative control flow. `try/catch`, signals, dynamic branching all work. + + + -// Inngest — same, via step.run() +```typescript async ({ step }) => { const order = await step.run("get-order", () => getOrder("123")); let charged = false; @@ -818,17 +798,27 @@ async ({ step }) => { } await step.run("ship", () => ship(order)); } +``` + +Same as Temporal, via `step.run()`. -// Airflow — you can't. + + + +```python # DAGs are static. You can use @task.branch, but it's limited: @task.branch def check_payment(result): if result["success"]: return "ship_order" return "send_to_review" -# This creates a static branch in the DAG — not a try/catch with dynamic resumption. ``` +You can't express this pattern directly. `@task.branch` creates a static branch in the DAG, not a `try/catch` with dynamic resumption. + + + + This is the fundamental expressiveness gap between DAG schedulers and durable execution engines. Temporal, Inngest, and Windmill can express any control flow (loops, recursion, try/catch, dynamic branching). Airflow and Prefect are limited to what a DAG can represent. --- @@ -839,17 +829,7 @@ This is the fundamental expressiveness gap between DAG schedulers and durable ex Every workflow engine makes a choice: **when do you persist state, and at what granularity?** -``` - No persistence Per-task Per-step Per-side-effect - (plain code) (Airflow) (Temporal, (Restate-style - Inngest, journaling) - Windmill WAC) - - ◄─────────────────────────────────────────────────────────────────────────► - Fastest Most - No durability durable - No overhead Most overhead -``` + - **Airflow/Prefect**: persist after each task completes. If a task has 100 lines of code, a crash at line 50 loses all 50 lines of work. - **Temporal/Inngest/Windmill**: persist after each step (activity/step.run/step). Crashes only lose the current in-flight step. @@ -869,29 +849,35 @@ How you represent workflow state determines what you can query, how much storage How workers relate to the coordination layer fundamentally determines throughput: -``` - Workers → DB directly Workers → Server → DB Workers = Runtime - (Airflow, Windmill) (Temporal, Inngest) (Restate) + - Each step = DB round-trip Server mediates + batches No external DB - ~1-5ms per step ~0.3-1ms per step ~0.01ms per step -``` - -Temporal's workers never touch the database directly — they communicate with the Temporal server via gRPC, and the server handles all database access. This is a key architectural advantage: the server can batch writes, cache state, and optimize queries. Windmill's workers currently talk to Postgres directly, paying a full round-trip per step. +Temporal's workers never touch the database directly. They communicate with the Temporal server via gRPC, and the server handles all database access. This is a key architectural advantage: the server can batch writes, cache state, and optimize queries. Windmill's workers currently talk to Postgres directly, paying a full round-trip per step. --- ## Performance Characteristics -We benchmarked equivalent workflows (using `step()` / local activities — inline execution, no dispatch) on the same hardware: +### Methodology + +We compare Windmill WAC against Temporal only. The reason is scope: Airflow and Prefect target minute-to-hour batch workloads where per-step latency is dominated by task dispatch (seconds per task), so head-to-head throughput numbers for short sequential workflows are misleading. Inngest runs over HTTP with inherently different latency characteristics. Temporal is the closest architectural comparable to WAC: persistent workers, durable per-step state, full imperative code. + +The workload shapes: + +- **`seq_N`**: N steps executed sequentially, each a trivial inline function (no real I/O). Measures per-step overhead. +- **`par_N`**: N steps dispatched in parallel with `Promise.all` / `workflow.Go`. Measures batch-dispatch cost. +- **`fan_out_N`**: one parent workflow fans out to N child jobs. Measures scheduler throughput under parallel dispatch. + +All runs use a single worker, Docker, the same host. We'll publish a standalone repro repo (engine versions, Docker Compose, workload scripts, raw CSVs) so anyone can verify or extend the numbers. Until then, treat the table below as directional rather than definitive. + +### Results | Workflow | Windmill WAC | Temporal | Notes | |----------|------------:|--------:|-------| -| seq_2 (2 steps) | 80 wf/s | 124 wf/s | Temporal 1.5x — persistent worker vs Bun cold start | +| seq_2 (2 steps) | 80 wf/s | 124 wf/s | Temporal 1.5x (persistent worker vs Bun cold start) | | seq_3 (3 steps) | 77 wf/s | 110 wf/s | Adding steps barely costs in Windmill (fast path) | | seq_100 (100 steps) | 12.6 wf/s | 29 wf/s | Per-step: Windmill 0.8ms, Temporal 0.3ms | -| par_2 (2 parallel) | 79 wf/s | 60 wf/s | **Windmill wins** — batch dispatch | -| fan_out_10 (10-way parallel) | 80 wf/s | 45 wf/s | **Windmill wins +79%** — StepSuspend batch | +| par_2 (2 parallel) | 79 wf/s | 60 wf/s | **Windmill wins** (batch dispatch) | +| fan_out_10 (10-way parallel) | 80 wf/s | 45 wf/s | **Windmill wins +79%** (StepSuspend batch) | *Windmill v1.683 (CE, single worker, Docker). Temporal v1.29 (single worker, Docker).* @@ -899,14 +885,14 @@ We benchmarked equivalent workflows (using `step()` / local activities — inlin - **Temporal is faster on sequential workflows** due to persistent workers (no cold start) and server-mediated DB writes (batched events vs individual PG round-trips). - **Windmill is faster on parallel workflows** due to batch dispatch. `StepSuspend` collects all parallel tasks and pushes them in one operation. Temporal schedules activities individually. -- **Windmill's step() fast path scales well** — seq_2 (80) and seq_3 (77) are nearly identical because the script stays alive and each step is just an HTTP POST. +- **Windmill's step() fast path scales well**: seq_2 (80) and seq_3 (77) are nearly identical because the script stays alive and each step is just an HTTP POST. - **At 100 steps, per-step cost dominates** over cold start. The 2.3x gap (0.8ms vs 0.3ms per step) reflects Windmill writing each step as an individual PG transaction vs Temporal's server batching event writes. ### Where the Remaining Gap Comes From For Windmill, two factors explain the performance gap with Temporal on sequential workloads: -1. **Per-workflow Bun cold start (~12ms).** Without dedicated workers, each workflow spawns a new Bun process. Temporal's worker is a persistent Node.js process. Windmill's [dedicated workers](/docs/core_concepts/dedicated_workers) — persistent Bun processes assigned to specific scripts or workspaces — eliminate this cold start entirely. +1. **Per-workflow Bun cold start (~12ms).** Without dedicated workers, each workflow spawns a new Bun process. Temporal's worker is a persistent Node.js process. Windmill's [dedicated workers](/docs/core_concepts/dedicated_workers), persistent Bun processes assigned to specific scripts or workspaces, eliminate this cold start entirely. 2. **Per-step: individual PG writes vs server-batched writes.** Each `step()` call POSTs to the API server, which does one PG transaction. Temporal's server accumulates events and writes them in batches. This is a ~2.5x difference per step (0.8ms vs 0.3ms). @@ -949,7 +935,7 @@ The fix for both is architectural: → Windmill (Flows for visual, WAC for code, both in one platform) **You need the lowest per-step latency:** -→ Windmill WAC with `step()` (0.5ms/step inline) — or Temporal with local activities (1-2ms/step) +→ Windmill WAC with `step()` (0.5ms/step inline), or Temporal with local activities (1-2ms/step) **You need the best parallel throughput:** → Windmill WAC with `task()` (batch dispatch via StepSuspend) @@ -961,13 +947,13 @@ The fix for both is architectural: ## Honorable Mentions: Restate, DBOS, Hatchet -The five engines above cover the main archetypes, but three other projects are worth knowing — each represents a genuinely different point in the design space. +The five engines above cover the main archetypes, but three other projects are worth knowing. Each represents a genuinely different point in the design space. ### Restate: Co-Located Storage (No External DB) [Restate](https://restate.dev) (3.7K stars) is the most architecturally radical engine in this space. Built by the team behind Apache Flink, it eliminates the external database entirely. -**How it works:** Your code runs as a normal HTTP service. Restate sits between the client and your service as a proxy, intercepting every side-effect (`ctx.run()`) via a bidirectional HTTP/2 stream. Each side-effect result is journaled to an embedded replicated log (Bifrost, backed by RocksDB) — not Postgres, not MySQL, not any external database. +**How it works:** Your code runs as a normal HTTP service. Restate sits between the client and your service as a proxy, intercepting every side-effect (`ctx.run()`) via a bidirectional HTTP/2 stream. Each side-effect result is journaled to an embedded replicated log (Bifrost, backed by RocksDB): not Postgres, not MySQL, not any external database. ```typescript const service = restate.service({ @@ -982,17 +968,17 @@ const service = restate.service({ }); ``` -**Why it's fast:** The commit point for "a step happened" is a local RocksDB write + quorum replication across nodes — no network round-trip to an external database. In our benchmarks, Restate achieved **4,600-6,700 workflows/sec** on the same hardware where Temporal did ~100 and Windmill did ~80. That's a **50x advantage** over Temporal. +**Why it's fast:** The commit point for "a step happened" is a local RocksDB write + quorum replication across nodes. No network round-trip to an external database. In our benchmarks, Restate achieved **4,600-6,700 workflows/sec** on the same hardware where Temporal did ~100 and Windmill did ~80. That's a **50x advantage** over Temporal. -**The insight:** This 50x gap isn't an implementation detail — it's a fundamental consequence of storage topology. Every engine that routes through an external database (Postgres, MySQL, Cassandra) pays ~1ms per step (network + query + WAL sync). Restate pays ~0.01-0.1ms (local disk + quorum ACK). Remove the network hop and you gain an order of magnitude. +**The insight:** This 50x gap isn't an implementation detail. It's a fundamental consequence of storage topology. Every engine that routes through an external database (Postgres, MySQL, Cassandra) pays ~1ms per step (network + query + WAL sync). Restate pays ~0.01-0.1ms (local disk + quorum ACK). Remove the network hop and you gain an order of magnitude. -**The trade-off:** No SQL access to workflow state. No familiar Postgres tooling for backup/replication — you rely on Restate's built-in mechanisms (S3 snapshots, log-based replication). Younger ecosystem (SDK for TypeScript, Java, Rust, Go). And for durable execution specifically: no separate activity/workflow distinction — all code runs in the handler, and `ctx.run()` is your only escape hatch for side-effects. +**The trade-off:** No SQL access to workflow state. No familiar Postgres tooling for backup/replication. You rely on Restate's built-in mechanisms (S3 snapshots, log-based replication). Younger ecosystem (SDK for TypeScript, Java, Rust, Go). And for durable execution specifically: no separate activity/workflow distinction. All code runs in the handler, and `ctx.run()` is your only escape hatch for side-effects. -**Unique feature:** Virtual Objects — keyed handlers with exclusive access and durable state. Essentially the actor model (Akka/Erlang) with persistence and exactly-once semantics. No other engine offers this. +**Unique feature:** Virtual Objects: keyed handlers with exclusive access and durable state. Essentially the actor model (Akka/Erlang) with persistence and exactly-once semantics. No other engine offers this. ### DBOS: Durable Execution as a Library -[DBOS](https://dbos.dev) (1.1K stars TS, 1.3K stars Python) takes the opposite approach from Restate: instead of replacing Postgres, it **leans into it.** DBOS is not a server — it's a library you import into your application. Postgres IS the durable execution engine. +[DBOS](https://dbos.dev) (1.1K stars TS, 1.3K stars Python) takes the opposite approach from Restate: instead of replacing Postgres, it **leans into it.** DBOS is not a server. It's a library you import into your application. Postgres IS the durable execution engine. ```typescript import { DBOS } from "@dbos-inc/dbos-sdk"; @@ -1020,13 +1006,13 @@ class OrderWorkflow { **The appeal:** No server to deploy, no infrastructure to manage. Just `npm install @dbos-inc/dbos-sdk` and use Postgres you already have. For teams that are allergic to adding infrastructure, this is compelling. -**The insight for Windmill:** DBOS proves that "durable execution on Postgres" can be simple and fast. Their approach — a library that uses Postgres transactions directly — is the lightest possible weight. It's the same insight behind Windmill's WAC `step()` fast path: use the database you already have, minimize ceremony around it. +**The insight for Windmill:** DBOS proves that "durable execution on Postgres" can be simple and fast. Their approach (a library that uses Postgres transactions directly) is the lightest possible weight. It's the same insight behind Windmill's WAC `step()` fast path: use the database you already have, minimize ceremony around it. **The trade-off:** Being a library means no centralized UI, no built-in monitoring, no visual flow builder. You get durability but not orchestration platform features. Also, being Postgres-bound means the same per-step latency ceiling as any Postgres-backed engine (~1ms per step). ### Hatchet: DAG Steps with a Go Engine -[Hatchet](https://hatchet.run) (6.8K stars) is a Go-based workflow engine that uses an explicit DAG model for steps — you declare parent dependencies, and the engine handles scheduling: +[Hatchet](https://hatchet.run) (6.8K stars) is a Go-based workflow engine that uses an explicit DAG model for steps. You declare parent dependencies, and the engine handles scheduling: ```typescript const workflow = hatchet.workflow("process-order"); @@ -1037,32 +1023,19 @@ workflow.step("ship", async (ctx) => { return shipping.dispatch(ctx.stepOutput(" { parents: ["charge"] }); ``` -**How it works:** Steps declare their parent dependencies explicitly. The Hatchet engine (Go) builds the DAG and dispatches steps via Postgres + RabbitMQ. Workers connect via gRPC. No deterministic replay — steps execute once, results are persisted. +**How it works:** Steps declare their parent dependencies explicitly. The Hatchet engine (Go) builds the DAG and dispatches steps via Postgres + RabbitMQ. Workers connect via gRPC. No deterministic replay: steps execute once, results are persisted. -**Where it sits:** Between Airflow (static DAG) and Temporal (imperative code). You get DAG-level parallelism (steps with no parents run in parallel automatically) without the determinism constraints of Temporal. But you lose the ability to use arbitrary control flow (if/else, loops) within the workflow — the DAG structure is fixed at registration time. +**Where it sits:** Between Airflow (static DAG) and Temporal (imperative code). You get DAG-level parallelism (steps with no parents run in parallel automatically) without the determinism constraints of Temporal. But you lose the ability to use arbitrary control flow (if/else, loops) within the workflow. The DAG structure is fixed at registration time. --- ## The Landscape at a Glance -``` - ┌─────────────────────────────────────────┐ - │ GitHub Stars (Apr 2026) │ - │ │ - Airflow ████████████████████████████████████████████████ 45,050 │ - Prefect ████████████████████████ 22,177 │ - Temporal ███████████████████ 19,598 │ - Windmill ████████████████ 16,241 │ - Hatchet ██████ 6,826 │ - Inngest █████ 5,202 │ - Restate ███ 3,729 │ - DBOS █ 1,137 (TS) / 1,267 (Python) │ - └─────────────────────────────────────────┘ -``` + Stars don't measure quality, but they measure mindshare. Airflow's dominance reflects a decade of production use. Temporal's growth reflects the industry shift toward durable execution. Windmill's position reflects being both a workflow engine and a broader platform. -The interesting trend: **every engine launched after 2020 supports durable execution.** The market has decided that "crash → retry entire task" (Airflow's model) is not good enough. The question is now *how* to implement durable execution — event sourcing vs checkpoints vs journals vs HTTP memoization vs Postgres transactions — not *whether* to implement it. +The interesting trend: **every engine launched after 2020 supports durable execution.** The market has decided that "crash → retry entire task" (Airflow's model) is not good enough. The question is now *how* to implement durable execution (event sourcing vs checkpoints vs journals vs HTTP memoization vs Postgres transactions), not *whether* to implement it. --- @@ -1077,8 +1050,8 @@ All workflow engines are converging on the same insight from different direction The theoretical limit is: **per-step cost = one durable write.** Whether that write goes to Postgres, a replicated log, or an embedded database determines how close you get. -We're not there yet. But the gap between engines is shrinking, and the choice increasingly comes down to which model fits your team's mental model — not which one is fundamentally faster. +We're not there yet. But the gap between engines is shrinking, and the choice increasingly comes down to which model fits your team's mental model, not which one is fundamentally faster. --- -*All benchmark code is [open source](https://github.com/windmill-labs/windmill/tree/main/benchmarks/comparison) and reproducible. We tried to be fair — if you think we weren't, the code is there. File an issue.* +*A standalone repro repo (Docker Compose, workload definitions, raw CSVs) is in the works. Until it lands, the numbers above are directional. If you think we got something wrong, file an issue or reach out on Discord.* From 35e6e875bc9f6ac4afe63d94b086c09388db4713 Mon Sep 17 00:00:00 2001 From: Ruben Fiszel Date: Thu, 16 Apr 2026 18:13:20 +0000 Subject: [PATCH 12/12] blog: add Dagster as an honorable mention MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Dagster's ~13.5K stars put it above Hatchet/Inngest/Restate/DBOS, and its software-defined-asset model is a genuinely different authoring paradigm on top of the Gen-1 execution model — worth a short entry even though the underlying runtime is similar to Airflow/Prefect. Also update the description, intro, section title, and GitHub stars chart to include Dagster. Co-Authored-By: Claude Opus 4.6 (1M context) --- .../_diagrams.tsx | 1 + .../index.mdx | 34 ++++++++++++++++--- 2 files changed, 31 insertions(+), 4 deletions(-) diff --git a/blog/2026-04-15-workflow-engines-primer/_diagrams.tsx b/blog/2026-04-15-workflow-engines-primer/_diagrams.tsx index d1afb7fb8..6d6e06a78 100644 --- a/blog/2026-04-15-workflow-engines-primer/_diagrams.tsx +++ b/blog/2026-04-15-workflow-engines-primer/_diagrams.tsx @@ -188,6 +188,7 @@ export function GithubStarsChart() { { name: 'Prefect', stars: 22177, color: 'bg-indigo-500' }, { name: 'Temporal', stars: 19598, color: 'bg-violet-500' }, { name: 'Windmill', stars: 16241, color: 'bg-emerald-500' }, + { name: 'Dagster', stars: 13500, color: 'bg-cyan-500' }, { name: 'Hatchet', stars: 6826, color: 'bg-orange-500' }, { name: 'Inngest', stars: 5202, color: 'bg-pink-500' }, { name: 'Restate', stars: 3729, color: 'bg-rose-500' }, diff --git a/blog/2026-04-15-workflow-engines-primer/index.mdx b/blog/2026-04-15-workflow-engines-primer/index.mdx index eac658b92..2d260abe5 100644 --- a/blog/2026-04-15-workflow-engines-primer/index.mdx +++ b/blog/2026-04-15-workflow-engines-primer/index.mdx @@ -2,7 +2,7 @@ slug: workflow-engines-primer authors: [rubenfiszel] tags: ['Benchmarks', 'Workflow Engines'] -description: 'A deep dive comparing Airflow, Prefect, Temporal, Inngest, and Windmill — how they work internally, their trade-offs, and real benchmarks. Plus Restate, DBOS, and Hatchet.' +description: 'A deep dive comparing Airflow, Prefect, Temporal, Inngest, and Windmill — how they work internally, their trade-offs, and real benchmarks. Plus Dagster, Restate, DBOS, and Hatchet.' title: 'From Cron to Durable Execution: A Primer on Workflow Engines' --- @@ -19,7 +19,7 @@ import { InngestHttpRoundtrips, } from './_diagrams'; -A deep dive comparing Airflow, Prefect, Temporal, Inngest, and Windmill — how they work internally, their trade-offs, and real benchmarks. Plus honorable mentions for Restate, DBOS, and Hatchet. +A deep dive comparing Airflow, Prefect, Temporal, Inngest, and Windmill — how they work internally, their trade-offs, and real benchmarks. Plus honorable mentions for Dagster, Restate, DBOS, and Hatchet. {/* truncate */} @@ -945,9 +945,9 @@ The fix for both is architectural: --- -## Honorable Mentions: Restate, DBOS, Hatchet +## Honorable Mentions: Dagster, Restate, DBOS, Hatchet -The five engines above cover the main archetypes, but three other projects are worth knowing. Each represents a genuinely different point in the design space. +The five engines above cover the main archetypes, but four other projects are worth knowing. Each represents a genuinely different point in the design space. ### Restate: Co-Located Storage (No External DB) @@ -1027,6 +1027,32 @@ workflow.step("ship", async (ctx) => { return shipping.dispatch(ctx.stepOutput(" **Where it sits:** Between Airflow (static DAG) and Temporal (imperative code). You get DAG-level parallelism (steps with no parents run in parallel automatically) without the determinism constraints of Temporal. But you lose the ability to use arbitrary control flow (if/else, loops) within the workflow. The DAG structure is fixed at registration time. +### Dagster: Asset-Based Orchestration + +[Dagster](https://dagster.io) (~13.5K stars) sits in the same broad category as Airflow and Prefect (Python, DAG-based, no durable execution), but its core abstraction is different enough to warrant its own mention. Instead of declaring tasks and their dependencies, you declare **software-defined assets**: the tables, files, ML models, or datasets your pipeline produces. The DAG then emerges from the dependencies between assets: + +```python +from dagster import asset + +@asset +def orders(): + return db.query("SELECT * FROM orders") + +@asset +def enriched_orders(orders, customers): + return orders.merge(customers, on="customer_id") + +@asset +def daily_revenue(enriched_orders): + return enriched_orders.groupby("day")["total"].sum() +``` + +**How it works:** You declare what each asset *is* (a function that produces it) and what other assets it depends on (function arguments). Dagster inspects the asset graph, figures out execution order, and runs whatever is needed to materialize a target asset. Everything hangs off the asset catalog: lineage, freshness checks, backfills, partitioning, data-quality assertions. + +**Where it sits:** Gen 1 in execution model (each asset run is a separate task, no durable execution within it) but with a higher-level authoring experience that targets data engineers specifically. It leans further into declarative/data-centric thinking than Airflow's task-centric DAGs. For ETL / ELT / analytics engineering, the asset model is genuinely a different mental frame. For general-purpose orchestration (business workflows, long-running processes, approvals), the asset framing fits less naturally. + +**The trade-off:** Best-in-class data lineage and catalog story. Python-only, and like Prefect/Airflow, no durable execution — a crash inside an asset computation loses all in-flight work. + --- ## The Landscape at a Glance