Skip to content

fix(agent): make Windows hardware collection panic-tolerant#81

Merged
bokiko merged 1 commit intomainfrom
fix/agent-windows-wmi-crash-resilience
May 4, 2026
Merged

fix(agent): make Windows hardware collection panic-tolerant#81
bokiko merged 1 commit intomainfrom
fix/agent-windows-wmi-crash-resilience

Conversation

@bokiko
Copy link
Copy Markdown
Owner

@bokiko bokiko commented May 4, 2026

Closes the WMI-panic class of agent crashes structurally, not field-by-field.

Problem. StackExchange/wmi v1.2.1 panics via reflect when COM marshalling produces a Go type narrower than the struct field declares. We hit it twice in two days (Win32_Processor.Stepping, Win32_SystemEnclosure.ChassisTypes) and 5+ more fields would trip the same path on different Windows builds. Worse: the panic happens in sendHardware() BEFORE the agent's WS read loop processes inbound agent_version frames, so a bug in hardware collection permanently bricks the agent against Phase 8 auto-update.

Fix. Three layers, synthesized from review by Codex / Cursor / Gemini / Kimi:

  1. interface{} for variant-prone WMI fields (ChassisTypes, SMBIOSMemoryType, MemoryType, FormFactor, AdapterTypeID, thermal-zone temp). Library skips its panicky reflection path; data still flows via small coerceUint32 helpers.
  2. safeWmiQuery wraps every wmi.Query call with defer recover() — per-provider degradation if any future drift escapes layer 1.
  3. defer recover() around sendHardware in runAgent — outermost safety net guaranteeing the WS read loop survives any hardware-collection crash, so auto-update can always fire.

Verified on Win10 22H2: agent reaches steady-state and runs cleanly through the 30s metric tick. Auto-update path tested by simulating a downstream panic — agent stays connected and receives the agent_version frame.

Three coordinated changes that together close the WMI-panic class of
agent crashes — both the immediate symptoms and the structural fragility
that lets a single bug in pre-update code permanently brick the auto-update
mechanism.

1. interface{} for variant-prone WMI struct fields. StackExchange/wmi
   v1.2.1 panics in reflect.Value.Uint() when COM marshalling produces a
   different Go type than the struct field declares (commonly int32 on
   builds where the MOF says uint16). Declaring the field as interface{}
   makes the library skip its type-conversion path entirely — it just
   assigns the native Go value. Applies to ChassisTypes, SMBIOSMemoryType,
   MemoryType, FormFactor, AdapterTypeID, plus the thermal zone field.
   Also drops the unused Stepping field which had the same issue.

2. safeWmiQuery wrapper. Each wmi.Query call is wrapped in defer recover()
   that converts panics to errors. So if a future Windows build produces a
   type we haven't seen, the affected provider returns an error and gets
   skipped — other providers still succeed, partial hardware data still
   sends.

3. defer recover() around the sendHardware call in runAgent. Outermost
   safety net. Guarantees the WS read goroutine survives any crash in
   hardware collection — closes the chicken-and-egg loop where a pre-update
   panic prevented Phase 8 from ever delivering the fix.

Verified on Win10 22H2 (Drops + ai-07): agent reaches steady-state, sends
metrics on the 30s ticker, processes agent_version frame from hub.
Auto-update path operational regardless of what hardware collection does.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@bokiko bokiko merged commit aea51e5 into main May 4, 2026
3 checks passed
@bokiko bokiko deleted the fix/agent-windows-wmi-crash-resilience branch May 4, 2026 13:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant