Skip to content

fix: improve socket reconnection reliability#162

Open
lastowl wants to merge 2 commits intocellcortex:masterfrom
lastowl:fix/socket-reconnect-reliability
Open

fix: improve socket reconnection reliability#162
lastowl wants to merge 2 commits intocellcortex:masterfrom
lastowl:fix/socket-reconnect-reliability

Conversation

@lastowl
Copy link
Copy Markdown

@lastowl lastowl commented Mar 10, 2026

I kept running into an issue where my bulbs would go permanently offline and the only fix was restarting the plugin. The logs
would fill up with:

Socket Closed with error "Error, retrying to connect in 5s" connect ETIMEDOUT 169.254.118.86:55443

After digging into the socket code I found a few issues causing this:

The main one: when a bulb is power cycled it sends a graceful TCP FIN, which fires the end event rather than error. The
socketClosed handler only scheduled a reconnect when called with an error — a graceful close just disconnected and never tried
again, leaving the device permanently offline.

Other issues I fixed while I was in there:

  • The retry path was calling connect() instead of reconnect(), which could spawn a new socket while a previous one was still
    connecting
  • Old sockets weren't being destroyed before creating new ones, leading to duplicate event handlers
  • setInterval in onDeviceConnected had no clearInterval guard, leaking timers on rapid reconnects
  • clearOldTransactions had an inverted condition (item.timestamp > Date.now() + 60_000) that meant stale transactions were
    never cleaned up
  • The floodAlarm threshold was 180_000_000ms (~50 hours) instead of the intended 5 minutes
  • error.message access could crash if the error had no message property

After this change bulbs reconnect automatically after a power cycle with no plugin restart needed.

Closes #69 (repeated socket errors flooding logs — fixed by proper reconnect on graceful close and orphaned socket cleanup)
Closes #116 (UnhandledPromiseRejection — fixed by rejecting pending transactions on disconnect)
Closes #157 (Smart Light Panels colour not working — added plate2 spec)
May fix #86 (CT slider no effect — stale transactions were silently dropping CT commands)
May fix #112, #144 (Adaptive Lighting — stale transactions and 50hr floodAlarm suspension)
May fix #154, #155, #156 (post-update regressions — floodAlarm and stale transaction fixes)

Socket / reconnection fixes:
- Reconnect on graceful TCP close (end event), not just on errors — this
  was the primary cause of devices going permanently offline after a
  bulb reboot or power cycle
- Use reconnect() instead of connect() in the retry path to avoid
  spawning orphaned sockets while a previous connection attempt is
  still pending
- Destroy and remove listeners from old socket before creating a new one
  in connect() to prevent duplicate event handler firing
- Reject all pending transactions on disconnect to prevent
  UnhandledPromiseRejection errors and hanging promises (fixes cellcortex#116)

Accessory fixes:
- Guard setInterval in onDeviceConnected with clearInterval to prevent
  interval leaks on rapid reconnects
- Fix inverted condition in clearOldTransactions (> Date.now() + 60s
  should be Date.now() - timestamp > 60s)
- Fix floodAlarm threshold: 180_000_000ms (50 hours) → 300_000ms (5 min)
- Guard error.message access with nullish coalescing to avoid crashes
  when error has no message property

Device support:
- Add model spec for plate2 (Yeelight Smart Light Panels) with full
  colour support and 1700-6500K CT range (fixes cellcortex#157)
@lastowl lastowl force-pushed the fix/socket-reconnect-reliability branch from e5dc939 to c037da7 Compare March 10, 2026 20:36
- Add AdaptiveLightingController to ColorLightService when ctforcolor is
  enabled — color lights (e.g. Bulb 1S, Bedside Lamp D2) previously had
  no Adaptive Lighting support even with CT enabled (fixes cellcortex#144)
- Add model specs for color8/colora (W3 Color Lightbulb) with full
  colour and 1700-6500K CT range (fixes cellcortex#155)
- Add model spec for bslamp3 (Bedside Lamp D2) with full colour and
  1700-6500K CT range (fixes cellcortex#144)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment