NM-279: fix random listen port on Windows#1266
NM-279: fix random listen port on Windows#1266abhishek9686 wants to merge 2 commits intorelease-v1.5.0from
Conversation
…s the port On Windows the WireGuard-nt adapter from a previous run can keep the UDP port bound, causing GetFreePort to see it as busy and assign a random port. Additionally, concurrent wgctrl access and transient driver handle contention lead to ConfigureDevice failures. - Query the adapter's actual listen port before running GetFreePort so the port owned by our own adapter is not replaced - Add retry logic with backoff to apply() on Windows for transient 'file being used by another process' errors - Protect GetPeersFromDevice, GetPeer, and UpdatePeer with wgMutex to prevent concurrent device access - Add a delay between Close and Create in resetInterfaceFunc on Windows to let the kernel driver fully release the device handle
|
Tenki Code Review - Complete Files Reviewed: 5 By Severity:
This PR correctly addresses a Windows-specific race condition where the WireGuard adapter's port falsely appears busy, adding mutex protection to previously unguarded functions and a retry loop for transient device-busy errors. One low-severity issue was found: the retry loop's fallback error message at the end of Files Reviewed (5 files) |
There was a problem hiding this comment.
Overview
This PR (NM-279) fixes a Windows-specific bug where the netclient would unnecessarily change its listen port on restart because the existing WireGuard adapter's port appeared busy to net.ListenUDP. The fix is well-structured and targets the correct layers.
Changes Reviewed
wireguard/wireguard.go— Newapply()retry loop withisDeviceBusyError()helper and newGetDeviceListenPort()function.wireguard/types.go—NCIface.UpdatePeernow holdswgMutexbefore callingapply(), consistent with other public methods likeSetPeersandConfigure.wireguard/wireguard.go—GetPeersFromDeviceandGetPeeralso gainwgMutexguards, improving thread safety.functions/daemon.go— Port assignment now checksGetDeviceListenPort()first; skips the UDP bind test when the adapter already owns the desired port.functions/mqhandlers.go/functions/uninstall.go— A 500 ms sleep on Windows betweenClose()andNewNCIface()to allow the kernel to release the adapter handle.
Finding
Unreachable fallback error in apply() retry loop (wireguard/wireguard.go)
The loop condition is attempt <= maxRetries (where maxRetries = 3), giving 4 iterations (0–3). The inner retry guards check attempt < maxRetries, so on the final iteration (attempt == 3) a device-busy error skips retrying and falls to return err at line 126. Every iteration therefore ends with either continue, return nil, or return err — the loop body never exits without returning. The return fmt.Errorf("ConfigureDevice failed after %d retries", maxRetries) at line 128 is unreachable dead code. While functionally harmless (the actual error is still returned), the intent appears to be that line 128 provides the definitive exhausted-retries message, and the current structure defeats that.
Positive Notes
- The string-matching approach in
isDeviceBusyErroris pragmatic for Windows error messages and handles the known variants. - The
portOwnedByDevicecheck indaemon.gocleanly isolates the Windows-specific logic without affecting the non-static-port code path on other platforms. - Adding
wgMutextoNCIface.UpdatePeer,GetPeersFromDevice, andGetPeerimproves consistency across the locking model.
wireguard/wireguard.go
Outdated
| for attempt := 0; attempt <= maxRetries; attempt++ { | ||
| wg, err := wgctrl.New() | ||
| if err != nil { | ||
| if runtime.GOOS == "windows" && attempt < maxRetries && isDeviceBusyError(err) { | ||
| slog.Warn("wgctrl busy, retrying", "attempt", attempt+1, "error", err) | ||
| time.Sleep(time.Duration(500*(attempt+1)) * time.Millisecond) | ||
| continue | ||
| } | ||
| return fmt.Errorf("wgctrl %w", err) | ||
| } | ||
| err = wg.ConfigureDevice(ifaceName, *c) | ||
| wg.Close() | ||
| if err == nil { | ||
| return nil | ||
| } | ||
| if runtime.GOOS == "windows" && attempt < maxRetries && isDeviceBusyError(err) { | ||
| slog.Warn("ConfigureDevice busy, retrying", "attempt", attempt+1, "error", err) | ||
| time.Sleep(time.Duration(500*(attempt+1)) * time.Millisecond) | ||
| continue | ||
| } | ||
| return err | ||
| } | ||
| defer wg.Close() | ||
| return fmt.Errorf("ConfigureDevice failed after %d retries", maxRetries) |
There was a problem hiding this comment.
🟡 Unreachable fallback error message in apply() retry loop (bug)
In wireguard/wireguard.go, the apply() function uses for attempt := 0; attempt <= maxRetries; attempt++ with maxRetries = 3, running 4 iterations (attempts 0–3). The retry guards inside the loop check attempt < maxRetries, which is false on the final iteration (attempt == 3). On that last iteration a device-busy error falls through to return err at line 126 rather than continuing. Because every iteration ends with either continue, return nil, or return err, the loop body never exits normally — making return fmt.Errorf("ConfigureDevice failed after %d retries", maxRetries) at line 128 unreachable dead code. In practice the correct error is still propagated via line 126, so this has no runtime impact, but the authoritative "retries exhausted" message is silently lost and the loop structure does not match its apparent intent.
💡 Suggestion: Change the loop condition to attempt < maxRetries and remove the redundant attempt < maxRetries guards from the inner if-checks. This makes the loop run exactly maxRetries retry attempts and allows the loop to exit naturally, falling through to the meaningful ConfigureDevice failed after N retries error at line 128.
| for attempt := 0; attempt <= maxRetries; attempt++ { | |
| wg, err := wgctrl.New() | |
| if err != nil { | |
| if runtime.GOOS == "windows" && attempt < maxRetries && isDeviceBusyError(err) { | |
| slog.Warn("wgctrl busy, retrying", "attempt", attempt+1, "error", err) | |
| time.Sleep(time.Duration(500*(attempt+1)) * time.Millisecond) | |
| continue | |
| } | |
| return fmt.Errorf("wgctrl %w", err) | |
| } | |
| err = wg.ConfigureDevice(ifaceName, *c) | |
| wg.Close() | |
| if err == nil { | |
| return nil | |
| } | |
| if runtime.GOOS == "windows" && attempt < maxRetries && isDeviceBusyError(err) { | |
| slog.Warn("ConfigureDevice busy, retrying", "attempt", attempt+1, "error", err) | |
| time.Sleep(time.Duration(500*(attempt+1)) * time.Millisecond) | |
| continue | |
| } | |
| return err | |
| } | |
| defer wg.Close() | |
| return fmt.Errorf("ConfigureDevice failed after %d retries", maxRetries) | |
| const maxRetries = 3 | |
| for attempt := 0; attempt < maxRetries; attempt++ { | |
| wg, err := wgctrl.New() | |
| if err != nil { | |
| if runtime.GOOS == "windows" && isDeviceBusyError(err) { | |
| slog.Warn("wgctrl busy, retrying", "attempt", attempt+1, "error", err) | |
| time.Sleep(time.Duration(500*(attempt+1)) * time.Millisecond) | |
| continue | |
| } | |
| return fmt.Errorf("wgctrl %w", err) | |
| } | |
| err = wg.ConfigureDevice(ifaceName, *c) | |
| wg.Close() | |
| if err == nil { | |
| return nil | |
| } | |
| if runtime.GOOS == "windows" && isDeviceBusyError(err) { | |
| slog.Warn("ConfigureDevice busy, retrying", "attempt", attempt+1, "error", err) | |
| time.Sleep(time.Duration(500*(attempt+1)) * time.Millisecond) | |
| continue | |
| } | |
| return err | |
| } | |
| return fmt.Errorf("ConfigureDevice failed after %d retries", maxRetries) |
On the final retry attempt, busy errors were returned directly instead of falling through to the exhausted-retries message. Restructure so busy errors always continue the loop and the post-loop error with the retry count is the definitive exhaustion path.
On Windows, the WireGuard-nt adapter from a previous run can keep the UDP port bound, causing GetFreePort to see it as busy and assign a random port. Additionally, concurrent wgctrl access and transient driver handle contention lead to ConfigureDevice failures.
Describe your changes
Provide Issue ticket number if applicable/not in title
Provide link to Netmaker PR if required
Provide testing steps
Checklist before requesting a review