-
Notifications
You must be signed in to change notification settings - Fork 0
Description
Description
Add error recovery for Nginx reload failures. When nginx -t (configuration test) fails after creating a new gateway config, the system should automatically clean up the bad config file and restore the previous known-good state instead of leaving Nginx in a broken state.
Context
When creating a new gateway Nginx configuration:
- Nginx config file is created
nginx -tis run to test configuration- If test passes,
nginx -s reloadis run - If test fails, error is thrown but bad config file remains
Problem: If nginx -t fails, the bad config file is left in place, potentially causing:
- Nginx to fail on next reload
- Manual intervention required to fix
- Risk of Nginx becoming completely broken
- No automatic recovery
Current State
Location: packages/app-ganymede/src/services/nginx-manager.ts
Current Implementation:
createGatewayConfig()creates config filereloadNginx()runsnginx -tthennginx -s reload- If
nginx -tfails, error is thrown but config file remains
Problem: No cleanup of bad config files on test failure.
Requirements
1. Config Test Before Reload
Test configuration before reloading:
- Current:
reloadNginx()already tests withnginx -t - Enhancement: Ensure test always runs before reload
- Error handling: Catch test failures and handle cleanup
2. Cleanup on Test Failure
Remove bad config file if test fails:
- Detect failure: Catch
nginx -tfailure - Delete config: Remove the newly created config file
- Log cleanup: Log that config was removed due to test failure
- Return error: Return descriptive error to caller
3. Restore Previous State
Optionally restore previous known-good state:
- Backup: Keep backup of previous config (optional)
- Restore: Restore previous config if new one fails (optional)
- Or: Simply remove bad config and let Nginx use existing configs
Recommended: Simply remove bad config (simpler, Nginx will use existing configs)
4. Error Messages
Provide descriptive error messages:
- Test failure: "Nginx configuration test failed: [error details]"
- Cleanup status: "Removed invalid config file: [path]"
- Recovery status: "Nginx restored to previous state"
Implementation Plan
Phase 1: Enhance reloadNginx()
- Ensure
nginx -talways runs before reload - Catch test failures
- Identify which config file caused failure
- Delete bad config file
- Log cleanup operation
Phase 2: Config Tracking
- Track which config file was just created
- Map config file to organization
- Enable targeted cleanup
Phase 3: Error Handling
- Improve error messages
- Return descriptive errors
- Log cleanup operations
- Test error scenarios
Phase 4: Testing
- Test with valid config (should work)
- Test with invalid config (should cleanup)
- Test with multiple configs (should only remove bad one)
- Test Nginx state after cleanup
Related Files
packages/app-ganymede/src/services/nginx-manager.ts- Nginx managerpackages/app-ganymede/src/routes/gateway/index.ts- Gateway allocation (uses nginx manager)
Acceptance Criteria
-
nginx -truns before reload - Test failures are caught
- Bad config files are deleted on test failure
- Cleanup is logged
- Descriptive error messages returned
- Nginx state restored (bad config removed)
- Works with multiple configs
- No manual intervention required
Questions to Resolve
- Should we backup previous config before creating new one?
- Should we restore previous config or just remove bad one?
- How do we identify which config file caused the failure?
- Should we validate config syntax before writing file?
- Should we support rollback of multiple config changes?