Skip to content

NM-279: fix random listen port on Windows#1266

Open
abhishek9686 wants to merge 2 commits intorelease-v1.5.0from
NM-279-v2
Open

NM-279: fix random listen port on Windows#1266
abhishek9686 wants to merge 2 commits intorelease-v1.5.0from
NM-279-v2

Conversation

@abhishek9686
Copy link
Member

On Windows, the WireGuard-nt adapter from a previous run can keep the UDP port bound, causing GetFreePort to see it as busy and assign a random port. Additionally, concurrent wgctrl access and transient driver handle contention lead to ConfigureDevice failures.

  • Query the adapter's actual listen port before running GetFreePort so the port owned by our own adapter is not replaced
  • Add retry logic with backoff to apply() on Windows for transient 'file being used by another process' errors
  • Protect GetPeersFromDevice, GetPeer, and UpdatePeer with wgMutex to prevent concurrent device access
  • Add a delay between Close and Create in resetInterfaceFunc on Windows to let the kernel driver fully release the device handle

Describe your changes

Provide Issue ticket number if applicable/not in title

Provide link to Netmaker PR if required

Provide testing steps

Checklist before requesting a review

  • My changes affect only 10 files or less.
  • I have performed a self-review of my code and tested it.
  • If it is a new feature, I have added thorough tests, my code is <= 1450 lines.
  • If it is a bugfix, my code is <= 200 lines.
  • My functions are <= 80 lines.
  • I have had my code reviewed by a peer.
  • My unit tests pass locally.
  • Netclient & Netmaker are awesome.

…s the port

On Windows the WireGuard-nt adapter from a previous run can keep the UDP
port bound, causing GetFreePort to see it as busy and assign a random
port. Additionally, concurrent wgctrl access and transient driver handle
contention lead to ConfigureDevice failures.

- Query the adapter's actual listen port before running GetFreePort so
  the port owned by our own adapter is not replaced
- Add retry logic with backoff to apply() on Windows for transient
  'file being used by another process' errors
- Protect GetPeersFromDevice, GetPeer, and UpdatePeer with wgMutex to
  prevent concurrent device access
- Add a delay between Close and Create in resetInterfaceFunc on Windows
  to let the kernel driver fully release the device handle
@tenki-reviewer
Copy link

tenki-reviewer bot commented Mar 18, 2026

Tenki Code Review - Complete

Files Reviewed: 5
Findings: 1

By Severity:

  • 🟡 Low: 1

This PR correctly addresses a Windows-specific race condition where the WireGuard adapter's port falsely appears busy, adding mutex protection to previously unguarded functions and a retry loop for transient device-busy errors. One low-severity issue was found: the retry loop's fallback error message at the end of apply() is unreachable dead code due to an off-by-one in the loop condition.

Files Reviewed (5 files)
functions/daemon.go
functions/mqhandlers.go
functions/uninstall.go
wireguard/types.go
wireguard/wireguard.go

Copy link

@tenki-reviewer tenki-reviewer bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overview

This PR (NM-279) fixes a Windows-specific bug where the netclient would unnecessarily change its listen port on restart because the existing WireGuard adapter's port appeared busy to net.ListenUDP. The fix is well-structured and targets the correct layers.

Changes Reviewed

  • wireguard/wireguard.go — New apply() retry loop with isDeviceBusyError() helper and new GetDeviceListenPort() function.
  • wireguard/types.goNCIface.UpdatePeer now holds wgMutex before calling apply(), consistent with other public methods like SetPeers and Configure.
  • wireguard/wireguard.goGetPeersFromDevice and GetPeer also gain wgMutex guards, improving thread safety.
  • functions/daemon.go — Port assignment now checks GetDeviceListenPort() first; skips the UDP bind test when the adapter already owns the desired port.
  • functions/mqhandlers.go / functions/uninstall.go — A 500 ms sleep on Windows between Close() and NewNCIface() to allow the kernel to release the adapter handle.

Finding

Unreachable fallback error in apply() retry loop (wireguard/wireguard.go)

The loop condition is attempt <= maxRetries (where maxRetries = 3), giving 4 iterations (0–3). The inner retry guards check attempt < maxRetries, so on the final iteration (attempt == 3) a device-busy error skips retrying and falls to return err at line 126. Every iteration therefore ends with either continue, return nil, or return err — the loop body never exits without returning. The return fmt.Errorf("ConfigureDevice failed after %d retries", maxRetries) at line 128 is unreachable dead code. While functionally harmless (the actual error is still returned), the intent appears to be that line 128 provides the definitive exhausted-retries message, and the current structure defeats that.

Positive Notes

  • The string-matching approach in isDeviceBusyError is pragmatic for Windows error messages and handles the known variants.
  • The portOwnedByDevice check in daemon.go cleanly isolates the Windows-specific logic without affecting the non-static-port code path on other platforms.
  • Adding wgMutex to NCIface.UpdatePeer, GetPeersFromDevice, and GetPeer improves consistency across the locking model.

Comment on lines +106 to +128
for attempt := 0; attempt <= maxRetries; attempt++ {
wg, err := wgctrl.New()
if err != nil {
if runtime.GOOS == "windows" && attempt < maxRetries && isDeviceBusyError(err) {
slog.Warn("wgctrl busy, retrying", "attempt", attempt+1, "error", err)
time.Sleep(time.Duration(500*(attempt+1)) * time.Millisecond)
continue
}
return fmt.Errorf("wgctrl %w", err)
}
err = wg.ConfigureDevice(ifaceName, *c)
wg.Close()
if err == nil {
return nil
}
if runtime.GOOS == "windows" && attempt < maxRetries && isDeviceBusyError(err) {
slog.Warn("ConfigureDevice busy, retrying", "attempt", attempt+1, "error", err)
time.Sleep(time.Duration(500*(attempt+1)) * time.Millisecond)
continue
}
return err
}
defer wg.Close()
return fmt.Errorf("ConfigureDevice failed after %d retries", maxRetries)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Unreachable fallback error message in apply() retry loop (bug)

In wireguard/wireguard.go, the apply() function uses for attempt := 0; attempt <= maxRetries; attempt++ with maxRetries = 3, running 4 iterations (attempts 0–3). The retry guards inside the loop check attempt < maxRetries, which is false on the final iteration (attempt == 3). On that last iteration a device-busy error falls through to return err at line 126 rather than continuing. Because every iteration ends with either continue, return nil, or return err, the loop body never exits normally — making return fmt.Errorf("ConfigureDevice failed after %d retries", maxRetries) at line 128 unreachable dead code. In practice the correct error is still propagated via line 126, so this has no runtime impact, but the authoritative "retries exhausted" message is silently lost and the loop structure does not match its apparent intent.

💡 Suggestion: Change the loop condition to attempt < maxRetries and remove the redundant attempt < maxRetries guards from the inner if-checks. This makes the loop run exactly maxRetries retry attempts and allows the loop to exit naturally, falling through to the meaningful ConfigureDevice failed after N retries error at line 128.

Suggested change
for attempt := 0; attempt <= maxRetries; attempt++ {
wg, err := wgctrl.New()
if err != nil {
if runtime.GOOS == "windows" && attempt < maxRetries && isDeviceBusyError(err) {
slog.Warn("wgctrl busy, retrying", "attempt", attempt+1, "error", err)
time.Sleep(time.Duration(500*(attempt+1)) * time.Millisecond)
continue
}
return fmt.Errorf("wgctrl %w", err)
}
err = wg.ConfigureDevice(ifaceName, *c)
wg.Close()
if err == nil {
return nil
}
if runtime.GOOS == "windows" && attempt < maxRetries && isDeviceBusyError(err) {
slog.Warn("ConfigureDevice busy, retrying", "attempt", attempt+1, "error", err)
time.Sleep(time.Duration(500*(attempt+1)) * time.Millisecond)
continue
}
return err
}
defer wg.Close()
return fmt.Errorf("ConfigureDevice failed after %d retries", maxRetries)
const maxRetries = 3
for attempt := 0; attempt < maxRetries; attempt++ {
wg, err := wgctrl.New()
if err != nil {
if runtime.GOOS == "windows" && isDeviceBusyError(err) {
slog.Warn("wgctrl busy, retrying", "attempt", attempt+1, "error", err)
time.Sleep(time.Duration(500*(attempt+1)) * time.Millisecond)
continue
}
return fmt.Errorf("wgctrl %w", err)
}
err = wg.ConfigureDevice(ifaceName, *c)
wg.Close()
if err == nil {
return nil
}
if runtime.GOOS == "windows" && isDeviceBusyError(err) {
slog.Warn("ConfigureDevice busy, retrying", "attempt", attempt+1, "error", err)
time.Sleep(time.Duration(500*(attempt+1)) * time.Millisecond)
continue
}
return err
}
return fmt.Errorf("ConfigureDevice failed after %d retries", maxRetries)

On the final retry attempt, busy errors were returned directly instead
of falling through to the exhausted-retries message. Restructure so
busy errors always continue the loop and the post-loop error with the
retry count is the definitive exhaustion path.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant