Skip to content

fix(gateway): Node.js crash during tarball re-extraction (ENOENT on main.sh) #45

@FL-AntoineDurand

Description

@FL-AntoineDurand

Problem

When envctl.sh restart re-extracts the gateway tarball while Node.js is still running inside the container, the update-nginx-locations periodic task (runs every 5 seconds) tries to execute /opt/gateway/app/main.sh which has been momentarily deleted during extraction. This causes a crash:

Error: Error executing [/opt/gateway/app/main.sh -r bin/update-nginx-locations.sh]:
  spawnSync /opt/gateway/app/main.sh ENOENT

The auto-restart loop in start-app-gateway.sh recovers after 3 seconds, but there's a brief outage window.

Root Cause

The reload script (trigger-reload.sh / envctl.sh restart) extracts the new tarball over the existing directory while the Node.js process is still running and executing scripts from that directory. The 5-second update-nginx-locations interval makes collisions frequent.

Proposed Fix

Stop Node.js before extracting the tarball. The reload sequence should be:

  1. Signal Node.js to stop gracefully
  2. Wait for process exit
  3. Extract new tarball
  4. Start Node.js

This is already partially handled by the reload script but the timing may not be reliable.

Priority

Low — the auto-restart loop recovers, but during the ~3s outage window, active WebSocket connections drop and reconnect.

Files

  • scripts/local-dev/envctl.sh — restart logic
  • docker-images/backend-images/gateway/app/lib/start-app-gateway.sh — auto-restart loop
  • docker-images/backend-images/gateway/app/lib/trigger-reload.sh — reload trigger

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingdeveloper-experienceLocal dev tooling and workflowsgatewayGateway container lifecycle and routinginfrastructureInfrastructure and DevOps related

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions