fix(agent): make Windows hardware collection panic-tolerant#81
Merged
Conversation
Three coordinated changes that together close the WMI-panic class of
agent crashes — both the immediate symptoms and the structural fragility
that lets a single bug in pre-update code permanently brick the auto-update
mechanism.
1. interface{} for variant-prone WMI struct fields. StackExchange/wmi
v1.2.1 panics in reflect.Value.Uint() when COM marshalling produces a
different Go type than the struct field declares (commonly int32 on
builds where the MOF says uint16). Declaring the field as interface{}
makes the library skip its type-conversion path entirely — it just
assigns the native Go value. Applies to ChassisTypes, SMBIOSMemoryType,
MemoryType, FormFactor, AdapterTypeID, plus the thermal zone field.
Also drops the unused Stepping field which had the same issue.
2. safeWmiQuery wrapper. Each wmi.Query call is wrapped in defer recover()
that converts panics to errors. So if a future Windows build produces a
type we haven't seen, the affected provider returns an error and gets
skipped — other providers still succeed, partial hardware data still
sends.
3. defer recover() around the sendHardware call in runAgent. Outermost
safety net. Guarantees the WS read goroutine survives any crash in
hardware collection — closes the chicken-and-egg loop where a pre-update
panic prevented Phase 8 from ever delivering the fix.
Verified on Win10 22H2 (Drops + ai-07): agent reaches steady-state, sends
metrics on the 30s ticker, processes agent_version frame from hub.
Auto-update path operational regardless of what hardware collection does.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes the WMI-panic class of agent crashes structurally, not field-by-field.
Problem.
StackExchange/wmiv1.2.1 panics via reflect when COM marshalling produces a Go type narrower than the struct field declares. We hit it twice in two days (Win32_Processor.Stepping,Win32_SystemEnclosure.ChassisTypes) and 5+ more fields would trip the same path on different Windows builds. Worse: the panic happens insendHardware()BEFORE the agent's WS read loop processes inboundagent_versionframes, so a bug in hardware collection permanently bricks the agent against Phase 8 auto-update.Fix. Three layers, synthesized from review by Codex / Cursor / Gemini / Kimi:
interface{}for variant-prone WMI fields (ChassisTypes, SMBIOSMemoryType, MemoryType, FormFactor, AdapterTypeID, thermal-zone temp). Library skips its panicky reflection path; data still flows via small coerceUint32 helpers.safeWmiQuerywraps everywmi.Querycall withdefer recover()— per-provider degradation if any future drift escapes layer 1.defer recover()aroundsendHardwareinrunAgent— outermost safety net guaranteeing the WS read loop survives any hardware-collection crash, so auto-update can always fire.Verified on Win10 22H2: agent reaches steady-state and runs cleanly through the 30s metric tick. Auto-update path tested by simulating a downstream panic — agent stays connected and receives the agent_version frame.