Skip to content

Refactor: decouple data plane from database via HTTP API#7

Merged
faranjit merged 6 commits intomainfrom
refactor/worker-db-decoupling
Feb 13, 2026
Merged

Refactor: decouple data plane from database via HTTP API#7
faranjit merged 6 commits intomainfrom
refactor/worker-db-decoupling

Conversation

@faranjit
Copy link
Copy Markdown
Owner

This PR completely decouples the jobplane worker (Data Plane) from PostgreSQL, making it 100% stateless and API-driven. Workers now communicate exclusively with the Controller via HTTP, establishing the Control Plane as the single source of truth for all execution state transitions.

Why?

Previously, workers pulled jobs by executing SELECT FOR UPDATE SKIP LOCKED directly against the database. While fast, this hybrid approach introduced critical scaling limitations:

  1. Connection Exhaustion: At scale, direct DB polling from hundreds of workers would exhaust standard Postgres connection limits and break under PgBouncer transaction pooling.
  2. Security & Untrusted Compute: I didn't not want to allow users to run workers in external environments (Bring Your Own Compute) without distributing raw database credentials.

Important Changes

  • Worker HTTP Client: Replaced all store.Queue database interactions in agent.go with HTTP calls using a new resilient doWithRetry helper.
  • Dequeue Endpoint: Added POST /internal/executions/dequeue to the controller. The controller now executes the safe SKIP LOCKED query on behalf of the worker.
  • Resilience: Implemented exponential backoff with jitter for heartbeats and results to survive transient network failures.
  • Execution Results: Added PUT /internal/executions/{id}/result for reporting success and failure.

- Migrates the worker's heartbeat mechanism from direct database updates (`queue.SetVisibleAfter`) to HTTP calls against the controller (`PUT /internal/executions/{id}/heartbeat`).
- Moves `VisibilityExtension` configuration from the worker to the controller, centralizing execution timeout policies.
- Decouples the Data Plane from the core storage layer, establishing the controller as the single source of truth for execution state transitions ahead of next features
- Implements `sendResult` in the worker agent to report execution success and failure via HTTP request.
- Updates the controller's InternalUpdateResult handler to safely process execution results
- Introduces `POST /internal/executions/dequeue` in the controller
- Implements `fetchWork` in the worker agent via HTTP, replacing direct `store.Queue.DequeueBatch` calls.
- Removes `store.Queue` dependency entirely from the worker agent.
- Achieves a 100% stateless Data Plane, enforcing the Control Plane as the sole authority for all execution lifecycles.
- Implements `doWithRetry` helper in the worker agent with exponential backoff and jitter to handle transient network failures.
- Updates `sendResult` and `runHeartbeat` to use the retry mechanism, ensuring execution state updates reach the controller.
@codecov
Copy link
Copy Markdown

codecov Bot commented Feb 13, 2026

Codecov Report

❌ Patch coverage is 72.72727% with 36 lines in your changes missing coverage. Please review.
✅ Project coverage is 67.92%. Comparing base (7cee8ba) to head (6e022bf).
⚠️ Report is 7 commits behind head on main.

Files with missing lines Patch % Lines
internal/worker/agent.go 77.64% 11 Missing and 8 partials ⚠️
cmd/worker/main.go 0.00% 6 Missing ⚠️
internal/controller/handlers/executions.go 85.71% 2 Missing and 2 partials ⚠️
internal/controller/server.go 0.00% 4 Missing ⚠️
internal/config/config.go 66.66% 1 Missing and 1 partial ⚠️
cmd/controller/main.go 0.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main       #7      +/-   ##
==========================================
+ Coverage   67.36%   67.92%   +0.56%     
==========================================
  Files          36       36              
  Lines        2108     2198      +90     
==========================================
+ Hits         1420     1493      +73     
- Misses        594      602       +8     
- Partials       94      103       +9     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@faranjit faranjit merged commit 218c3c1 into main Feb 13, 2026
6 checks passed
@faranjit faranjit deleted the refactor/worker-db-decoupling branch February 14, 2026 11:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant