[Foundation F6.3] Tunnel health check + auto-reconnect#650
[Foundation F6.3] Tunnel health check + auto-reconnect#650
Conversation
|
@claude Do a thorough review of this PR in terms of: KISS, YAGNI, DRY, Fail Fast, Secure by Default and performance/speed |
|
Claude finished @furukama's task in 3m 31s —— View job PR Review: Tunnel health check + auto-reconnectTasks:
Overall the implementation is solid: the state machine is correct, tests are thorough, and the backoff logic is well-structured. The issues below are worth addressing before merge, with one critical gap (the config value is never wired in). Critical
KISS
const before = JSON.stringify(this.status());
// ... apply fields ...
const after = JSON.stringify(this.status());
if (before !== after) { this.publishStatusChange(); }
if (
typeof timer === 'object' && timer &&
'unref' in timer && typeof timer.unref === 'function'
) { timer.unref(); }
return `tunnel_${Date.now()}_${randomUUID().slice(0, 8)}`;
Two separate import blocks from the same module ( There are two separate
async function recordDefaultTunnelAudit(input: TunnelAuditRecord): Promise<void> {
const { recordAuditEvent } = await import('../audit/audit-events.js');
recordAuditEvent(input);
}Node.js caches dynamic imports, so this won't re-execute the module, but it re-enters the dynamic import machinery every call (promise allocation, microtask queue, cache lookup). A top-level static import is simpler and faster. Fix this → YAGNI
This is injectable but there is no caller that overrides it — all production and test paths use the
The path
These are tunable only through tests ( DRY
private clearHealthTimer(): void {
if (!this.healthTimer) return;
clearTimeout(this.healthTimer);
this.healthTimer = null;
}
// same pattern repeated for reconnectTimerA single "Reset to down" state update is scattered across 3 call sites
Fail FastNo constructor validation that If a caller passes
if (this.listener && this.publicUrl) {
return { public_url: this.publicUrl };
}If Initial When Secure by Default
recordAuditEvent?: TunnelAuditRecorder | false;Audit logging is a security control (operator visibility into tunnel lifecycle). The Health checks hit the public internet URL Each 30 s health check sends an HTTP GET to
Consider documenting this in the config reference so operators can tune Performance / Speed
At 30 s intervals this is trivial, but during a reconnect storm (health check fails, reconnect loop, repeated
Even with module cache hits, the overhead is unnecessary for a code path that already exists only at module-load time. No jitter in exponential backoff ( const delayMs = Math.min(
this.reconnectMaxBackoffMs,
this.reconnectInitialBackoffMs * 2 ** Math.max(0, attempt - 1),
);Deterministic backoff is fine for a single instance, but adding ±10% jitter ( Minor
|
There was a problem hiding this comment.
Pull request overview
Adds health monitoring and auto-reconnect behavior to the built-in ngrok tunnel provider, with corresponding runtime-config support, audit events, tests, and documentation updates.
Changes:
- Extend tunnel provider status to include lifecycle/health metadata and add defaults for health check + reconnect timings.
- Implement ngrok tunnel health checks, status transitions, capped exponential backoff reconnect, and
tunnel.up/tunnel.downstructured audit events. - Add
deployment.tunnel.health_check_interval_msto runtime config defaults/normalization, update config example/docs/changelog, and expand test coverage.
Reviewed changes
Copilot reviewed 12 out of 12 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/runtime-config-deployment.test.ts | Updates expectations to include the new deployment tunnel health-check interval field. |
| tests/ngrok-tunnel.test.ts | Adds/updates tests for expanded tunnel status, audit events, health checks, and reconnect backoff behavior. |
| src/tunnel/tunnel-provider.ts | Expands TunnelStatus and introduces default constants for health checks/reconnect. |
| src/tunnel/ngrok-tunnel-provider.ts | Implements health checking, reconnect logic, status transitions, and audit event recording for ngrok tunnels. |
| src/config/runtime-config.ts | Bumps config version and adds deployment.tunnel.health_check_interval_ms default + normalization. |
| docs/development/reference/configuration.md | Documents the new deployment tunnel health-check config key (but see list-rendering comment). |
| docs/development/guides/remote-access.md | Updates remote-access guide with the new config key and behavior description. |
| docs/content/reference/configuration.md | Mirrors configuration reference doc update for the docs site (but see list-rendering comment). |
| docs/content/guides/remote-access.md | Mirrors remote-access guide update for the docs site. |
| config.example.json | Adds the new config key and bumps the example config version. |
| README.md | Updates the configuration key summary to include the new tunnel health-check interval. |
| CHANGELOG.md | Notes tunnel health checks, audit events, and reconnect behavior in Unreleased. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| - `deployment.mode`, `deployment.public_url`, `deployment.tunnel.provider`, and | ||
| `deployment.tunnel.health_check_interval_ms` for declaring whether the | ||
| gateway runs behind a cloud URL or a local tunnel; cloud mode requires | ||
| `deployment.public_url`, while local mode requires a tunnel provider such as | ||
| `manual`, `ssh`, `ngrok`, `cloudflare`, or `tailscale`. The built-in ngrok | ||
| tunnel provider reads `NGROK_AUTHTOKEN` from the encrypted runtime secret | ||
| store and health-checks active tunnels every 30 seconds by default |
There was a problem hiding this comment.
docs/static/docs.js markdown renderer closes lists unless each line starts with a list marker, so this multi-line bullet (with continuation lines that start with indentation) will render as a single bullet followed by a separate paragraph. Consider rewriting as a single-line bullet, or split into multiple - bullets so it renders correctly on the docs site.
| - `deployment.mode`, `deployment.public_url`, `deployment.tunnel.provider`, and | |
| `deployment.tunnel.health_check_interval_ms` for declaring whether the | |
| gateway runs behind a cloud URL or a local tunnel; cloud mode requires | |
| `deployment.public_url`, while local mode requires a tunnel provider such as | |
| `manual`, `ssh`, `ngrok`, `cloudflare`, or `tailscale`. The built-in ngrok | |
| tunnel provider reads `NGROK_AUTHTOKEN` from the encrypted runtime secret | |
| store and health-checks active tunnels every 30 seconds by default | |
| - `deployment.mode`, `deployment.public_url`, `deployment.tunnel.provider`, and `deployment.tunnel.health_check_interval_ms` for declaring whether the gateway runs behind a cloud URL or a local tunnel; cloud mode requires `deployment.public_url`, while local mode requires a tunnel provider such as `manual`, `ssh`, `ngrok`, `cloudflare`, or `tailscale`. The built-in ngrok tunnel provider reads `NGROK_AUTHTOKEN` from the encrypted runtime secret store and health-checks active tunnels every 30 seconds by default |
| - `deployment.mode`, `deployment.public_url`, `deployment.tunnel.provider`, and | ||
| `deployment.tunnel.health_check_interval_ms` for declaring whether the | ||
| gateway runs behind a cloud URL or a local tunnel; cloud mode requires | ||
| `deployment.public_url`, while local mode requires a tunnel provider such as | ||
| `manual`, `ssh`, `ngrok`, `cloudflare`, or `tailscale`. The built-in ngrok | ||
| tunnel provider reads `NGROK_AUTHTOKEN` from the encrypted runtime secret | ||
| store and health-checks active tunnels every 30 seconds by default |
There was a problem hiding this comment.
docs/static/docs.js markdown renderer closes lists unless each line starts with a list marker, so this multi-line bullet (with continuation lines that start with indentation) will render as a single bullet followed by a separate paragraph. Consider rewriting as a single-line bullet, or split into multiple - bullets so it renders correctly on the docs site.
| - `deployment.mode`, `deployment.public_url`, `deployment.tunnel.provider`, and | |
| `deployment.tunnel.health_check_interval_ms` for declaring whether the | |
| gateway runs behind a cloud URL or a local tunnel; cloud mode requires | |
| `deployment.public_url`, while local mode requires a tunnel provider such as | |
| `manual`, `ssh`, `ngrok`, `cloudflare`, or `tailscale`. The built-in ngrok | |
| tunnel provider reads `NGROK_AUTHTOKEN` from the encrypted runtime secret | |
| store and health-checks active tunnels every 30 seconds by default | |
| - `deployment.mode`, `deployment.public_url`, `deployment.tunnel.provider`, and `deployment.tunnel.health_check_interval_ms` for declaring whether the gateway runs behind a cloud URL or a local tunnel; cloud mode requires `deployment.public_url`, while local mode requires a tunnel provider such as `manual`, `ssh`, `ngrok`, `cloudflare`, or `tailscale`. The built-in ngrok tunnel provider reads `NGROK_AUTHTOKEN` from the encrypted runtime secret store and health-checks active tunnels every 30 seconds by default |
| function makeTunnelAuditRunId(): string { | ||
| return `tunnel_${Date.now()}_${randomUUID().slice(0, 8)}`; | ||
| } | ||
|
|
||
| async function recordDefaultTunnelAudit( | ||
| input: TunnelAuditRecord, | ||
| ): Promise<void> { | ||
| const { recordAuditEvent } = await import('../audit/audit-events.js'); | ||
| recordAuditEvent(input); | ||
| } |
There was a problem hiding this comment.
The repo already exposes audit run ID generation via audit-events' makeAuditRunId(prefix); this file introduces a separate makeTunnelAuditRunId() implementation. Using the shared helper (or at least centralizing run-id generation) keeps run IDs consistent across audit events and avoids duplicating ID logic.
Summary
Closes #568.
Implements tunnel health monitoring for the built-in ngrok tunnel provider:
status()andonStatusChangetunnel.upandtunnel.downaudit eventsdeployment.tunnel.health_check_interval_msand updates the config example/changelogAcceptance Criteria
Validation
/Users/bkoehler/src/hybridclaw-electron-mac/desktop/build/runtime-bin/node ./node_modules/vitest/vitest.mjs run --configLoader runner --config vitest.unit.config.ts tests/ngrok-tunnel.test.ts tests/runtime-config-deployment.test.ts/Users/bkoehler/src/hybridclaw-electron-mac/desktop/build/runtime-bin/node ./node_modules/typescript/bin/tsc --noEmit --noUnusedLocals --noUnusedParameters/Users/bkoehler/src/hybridclaw/node_modules/.bin/biome check --write src/Users/bkoehler/src/hybridclaw/node_modules/.bin/biome check src