Skip to content

fix(gateway): restart deallocates gateway causing unnecessary cold-start #47

@FL-AntoineDurand

Description

@FL-AntoineDurand

Problem

When a gateway container is restarted via envctl.sh restart, the gateway process receives SIGTERM, triggers graceful shutdown, and notifies Ganymede to deallocate (POST /gateway/stop). This sets ended_at on the allocation row. When the gateway comes back up, it's idle — the user must trigger a fresh allocation from the frontend, causing an unnecessary cold-start delay.

Current Behavior

envctl.sh restart
  → trigger-reload.sh sends SIGTERM to Node.js
    → shutdownGateway() runs (gateway-init.ts:383)
      → POST /gateway/stop to Ganymede
        → proc_organizations_gateways_stop() sets ended_at
          → Gateway allocation ENDED
  → Node.js restarts
    → POST /gateway/config returns 404 (no active allocation)
      → Gateway comes up IDLE
        → User must reload page → POST /gateway/start → full cold-start pipeline

Expected Behavior (for restart)

The gateway should come back up and reclaim its previous allocation, avoiding the cold-start pipeline for the user.

Design Considerations

The gateway currently can't distinguish between:

  1. Permanent stop — "I'm being decommissioned, deallocate me"
  2. Restart/reload — "I'm restarting, keep my allocation"

Option A: Flag file to skip deallocation

start-app-gateway.sh already uses /tmp/gateway-reloading to detect reload vs crash. The shutdownGateway() function could check this flag and skip POST /gateway/stop when it's a reload.

// In shutdownGateway():
if (fs.existsSync('/tmp/gateway-reloading')) {
  log('Skipping deallocation (reload in progress)');
} else {
  await ganymedeClient.request({ url: '/gateway/stop', ... });
}

Option B: Ganymede-side grace period

Instead of immediately ending the allocation, Ganymede could keep the allocation alive for N seconds after /gateway/stop. If the same gateway calls /gateway/ready within that window, the allocation is restored.

Option C: Separate stop vs restart signals

Add a /gateway/restart endpoint that preserves the allocation, distinct from /gateway/stop which ends it. The gateway calls the appropriate one based on shutdown context.

Option D: Kill without graceful shutdown on restart

envctl.sh restart could use SIGKILL instead of SIGTERM for restarts, skipping shutdownGateway() entirely. The allocation would remain active in the DB, and the restarted gateway would reclaim it via /gateway/config.

Files

  • packages/app-gateway/src/main.ts:168-194 — SIGTERM handler calls shutdownGateway()
  • packages/app-gateway/src/initialization/gateway-init.ts:383-437shutdownGateway() sends POST /gateway/stop
  • packages/app-ganymede/src/routes/gateway/index.ts:398-445/gateway/stop handler sets ended_at
  • docker-images/backend-images/gateway/app/lib/start-app-gateway.sh/tmp/gateway-reloading flag check
  • scripts/local-dev/envctl.sh — restart logic

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingdeveloper-experienceLocal dev tooling and workflowsenhancementNew feature or requestgatewayGateway container lifecycle and routinginfrastructureInfrastructure and DevOps related

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions