Skip to content

feat(daemon): surface backend connectivity in daemon status#1910

Open
wingtonrbrito wants to merge 1 commit intomultica-ai:mainfrom
wingtonrbrito:feat/daemon-backend-connectivity
Open

feat(daemon): surface backend connectivity in daemon status#1910
wingtonrbrito wants to merge 1 commit intomultica-ai:mainfrom
wingtonrbrito:feat/daemon-backend-connectivity

Conversation

@wingtonrbrito
Copy link
Copy Markdown

Summary

Adds a backend_connectivity field to the daemon's HealthResponse so multica daemon status can surface whether the daemon can actually reach the configured Multica server. Three states: connected, unreachable, unknown.

Why

When the Multica server is unreachable (network down, server crashed, Docker stack stopped on a self-host install), the daemon stays "running" — process is alive, listener is bound, /health returns 200 — but it can't poll for assignments. From the caller's perspective this looks identical to a fully-healthy daemon, which makes diagnosing "my issues are stuck in todo" harder than it needs to be.

I hit this directly while running self-host Multica during testing — Docker Desktop hung, the daemon kept reporting running, and there was no signal that work wasn't actually flowing through.

What changes

  • New field on HealthResponse:
    BackendConnectivity string `json:"backend_connectivity,omitempty"`
  • New helper probeBackendConnectivity(ctx, serverURL) does a 1500ms-timeout GET <serverURL>/health (the unauthenticated liveness endpoint). Returns one of:
    • connected — 2xx response
    • unreachable — request failed (refused, DNS, TLS, timeout, non-2xx)
    • unknown — empty ServerBaseURL
  • healthHandler calls the probe and includes the result in the response.
  • runDaemonStatus (CLI) surfaces it as a Backend: line, only when the value is connected or unreachable. Older daemons (no field) and unknown both keep existing output unchanged.

Backward compatibility

Fully compatible. The new key uses omitempty, so consumers parsing into untyped maps don't see it for older daemons or when ServerBaseURL is empty. The CLI only adds output when the value is meaningful.

Performance

The probe is synchronous in the hot path of multica daemon status, with a 1500ms timeout. In the failure case it adds at most ~1.5s of latency — acceptable for an ergonomic / observability addition. If maintainers prefer, a follow-up can move the probe to a background goroutine that updates a cached value every N seconds.

Tests

  • TestProbeBackendConnectivity — 4 sub-tests covering all branches: server up, non-2xx response, server closed, empty ServerBaseURL
  • TestHealthHandlerSurfacesBackendConnectivity — verifies the backend_connectivity JSON wire key appears + round-trips through the typed struct
$ go test ./internal/daemon/ -run "TestProbeBackendConnectivity|TestHealthHandlerSurfacesBackendConnectivity" -v
=== RUN   TestProbeBackendConnectivity
--- PASS: TestProbeBackendConnectivity (0.00s)
    --- PASS: TestProbeBackendConnectivity/returns_connected_when_backend_responds_2xx (0.00s)
    --- PASS: TestProbeBackendConnectivity/returns_unreachable_on_non-2xx_response (0.00s)
    --- PASS: TestProbeBackendConnectivity/returns_unreachable_when_server_is_closed (0.00s)
    --- PASS: TestProbeBackendConnectivity/returns_unknown_when_ServerBaseURL_is_empty (0.00s)
=== RUN   TestHealthHandlerSurfacesBackendConnectivity
--- PASS: TestHealthHandlerSurfacesBackendConnectivity (0.00s)
PASS
ok      github.com/multica-ai/multica/server/internal/daemon    0.237s

go test ./internal/daemon/ (full suite), go vet, gofmt -l all clean on changed files.

When the Multica server is unreachable (network down, server crashed,
Docker stack stopped on a self-host install), the daemon stays
"running" — process is alive, listener is bound, the health endpoint
returns 200 — but it can't actually poll for assignments. From the
caller's perspective this looks identical to a fully-healthy daemon,
which makes diagnosing "my issues are stuck in todo" harder than it
needs to be.

This patch adds a `BackendConnectivity` field to the daemon's
HealthResponse with three states:
  - "connected"    — server responded with 2xx on /health
  - "unreachable"  — request failed (refused, DNS, TLS, timeout, non-2xx)
  - "unknown"      — daemon has no ServerBaseURL configured

The probe is synchronous in the hot path of `multica daemon status`,
with a 1500ms timeout. In the failure case it adds at most ~1.5s of
latency, which is acceptable for an ergonomic / observability addition.
A follow-up could move the probe to a background goroutine that
updates a cached value every N seconds if maintainers find the
synchronous cost too high under load.

The CLI surfaces the field as a "Backend:" line in `multica daemon
status` output, only when the value is "connected" or "unreachable".
"unknown" (empty ServerBaseURL) and absent (older daemons not yet
running this code) are both treated as "don't print", so existing
output stays unchanged for those cases.

Tests:
  - probeBackendConnectivity returns "connected" / "unreachable" /
    "unknown" across all four paths (httptest.Server up, server down,
    server closed, empty ServerBaseURL)
  - the JSON wire-level `backend_connectivity` key appears in the
    /health response and round-trips through the typed struct

go test, gofmt, go vet all clean on changed files.
@vercel
Copy link
Copy Markdown

vercel Bot commented Apr 29, 2026

@wingtonrbrito is attempting to deploy a commit to the IndexLabs Team on Vercel.

A member of the Team first needs to authorize it.

wingtonrbrito added a commit to wingtonrbrito/multica that referenced this pull request May 1, 2026
wingtonrbrito added a commit to wingtonrbrito/multica that referenced this pull request May 1, 2026
This branch now carries the two open upstream PRs (multica-ai#1805 and multica-ai#1910)
merged on top of upstream/main, making it the single consumable
branch for the fork. Updated quickstart, sync recipe, and patch log
to reflect the new shape.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant