NM-279: fix random listen port on Windows by abhishek9686 · Pull Request #1266 · gravitl/netclient

abhishek9686 · 2026-03-18T11:50:33Z

On Windows, the WireGuard-nt adapter from a previous run can keep the UDP port bound, causing GetFreePort to see it as busy and assign a random port. Additionally, concurrent wgctrl access and transient driver handle contention lead to ConfigureDevice failures.

Query the adapter's actual listen port before running GetFreePort so the port owned by our own adapter is not replaced
Add retry logic with backoff to apply() on Windows for transient 'file being used by another process' errors
Protect GetPeersFromDevice, GetPeer, and UpdatePeer with wgMutex to prevent concurrent device access
Add a delay between Close and Create in resetInterfaceFunc on Windows to let the kernel driver fully release the device handle

Describe your changes

Provide Issue ticket number if applicable/not in title

Provide link to Netmaker PR if required

Provide testing steps

Checklist before requesting a review

My changes affect only 10 files or less.
I have performed a self-review of my code and tested it.
If it is a new feature, I have added thorough tests, my code is <= 1450 lines.
If it is a bugfix, my code is <= 200 lines.
My functions are <= 80 lines.
I have had my code reviewed by a peer.
My unit tests pass locally.
Netclient & Netmaker are awesome.

…s the port On Windows the WireGuard-nt adapter from a previous run can keep the UDP port bound, causing GetFreePort to see it as busy and assign a random port. Additionally, concurrent wgctrl access and transient driver handle contention lead to ConfigureDevice failures. - Query the adapter's actual listen port before running GetFreePort so the port owned by our own adapter is not replaced - Add retry logic with backoff to apply() on Windows for transient 'file being used by another process' errors - Protect GetPeersFromDevice, GetPeer, and UpdatePeer with wgMutex to prevent concurrent device access - Add a delay between Close and Create in resetInterfaceFunc on Windows to let the kernel driver fully release the device handle

tenki-reviewer · 2026-03-18T11:50:41Z

Tenki Code Review - Complete

Files Reviewed: 5
Findings: 1

By Severity:

🟡 Low: 1

This PR correctly addresses a Windows-specific race condition where the WireGuard adapter's port falsely appears busy, adding mutex protection to previously unguarded functions and a retry loop for transient device-busy errors. One low-severity issue was found: the retry loop's fallback error message at the end of apply() is unreachable dead code due to an off-by-one in the loop condition.

Files Reviewed (5 files)

functions/daemon.go
functions/mqhandlers.go
functions/uninstall.go
wireguard/types.go
wireguard/wireguard.go

tenki-reviewer

Overview

This PR (NM-279) fixes a Windows-specific bug where the netclient would unnecessarily change its listen port on restart because the existing WireGuard adapter's port appeared busy to net.ListenUDP. The fix is well-structured and targets the correct layers.

Changes Reviewed

wireguard/wireguard.go — New apply() retry loop with isDeviceBusyError() helper and new GetDeviceListenPort() function.
wireguard/types.go — NCIface.UpdatePeer now holds wgMutex before calling apply(), consistent with other public methods like SetPeers and Configure.
wireguard/wireguard.go — GetPeersFromDevice and GetPeer also gain wgMutex guards, improving thread safety.
functions/daemon.go — Port assignment now checks GetDeviceListenPort() first; skips the UDP bind test when the adapter already owns the desired port.
functions/mqhandlers.go / functions/uninstall.go — A 500 ms sleep on Windows between Close() and NewNCIface() to allow the kernel to release the adapter handle.

Finding

Unreachable fallback error in `apply()` retry loop (`wireguard/wireguard.go`)

The loop condition is attempt <= maxRetries (where maxRetries = 3), giving 4 iterations (0–3). The inner retry guards check attempt < maxRetries, so on the final iteration (attempt == 3) a device-busy error skips retrying and falls to return err at line 126. Every iteration therefore ends with either continue, return nil, or return err — the loop body never exits without returning. The return fmt.Errorf("ConfigureDevice failed after %d retries", maxRetries) at line 128 is unreachable dead code. While functionally harmless (the actual error is still returned), the intent appears to be that line 128 provides the definitive exhausted-retries message, and the current structure defeats that.

Positive Notes

The string-matching approach in isDeviceBusyError is pragmatic for Windows error messages and handles the known variants.
The portOwnedByDevice check in daemon.go cleanly isolates the Windows-specific logic without affecting the non-static-port code path on other platforms.
Adding wgMutex to NCIface.UpdatePeer, GetPeersFromDevice, and GetPeer improves consistency across the locking model.

tenki-reviewer · 2026-03-18T11:53:38Z

wireguard/wireguard.go

+	for attempt := 0; attempt <= maxRetries; attempt++ {
+		wg, err := wgctrl.New()
+		if err != nil {
+			if runtime.GOOS == "windows" && attempt < maxRetries && isDeviceBusyError(err) {
+				slog.Warn("wgctrl busy, retrying", "attempt", attempt+1, "error", err)
+				time.Sleep(time.Duration(500*(attempt+1)) * time.Millisecond)
+				continue
+			}
+			return fmt.Errorf("wgctrl %w", err)
+		}
+		err = wg.ConfigureDevice(ifaceName, *c)
+		wg.Close()
+		if err == nil {
+			return nil
+		}
+		if runtime.GOOS == "windows" && attempt < maxRetries && isDeviceBusyError(err) {
+			slog.Warn("ConfigureDevice busy, retrying", "attempt", attempt+1, "error", err)
+			time.Sleep(time.Duration(500*(attempt+1)) * time.Millisecond)
+			continue
+		}
+		return err
 	}
-	defer wg.Close()
+	return fmt.Errorf("ConfigureDevice failed after %d retries", maxRetries)


🟡 Unreachable fallback error message in apply() retry loop (bug)

In wireguard/wireguard.go, the apply() function uses for attempt := 0; attempt <= maxRetries; attempt++ with maxRetries = 3, running 4 iterations (attempts 0–3). The retry guards inside the loop check attempt < maxRetries, which is false on the final iteration (attempt == 3). On that last iteration a device-busy error falls through to return err at line 126 rather than continuing. Because every iteration ends with either continue, return nil, or return err, the loop body never exits normally — making return fmt.Errorf("ConfigureDevice failed after %d retries", maxRetries) at line 128 unreachable dead code. In practice the correct error is still propagated via line 126, so this has no runtime impact, but the authoritative "retries exhausted" message is silently lost and the loop structure does not match its apparent intent.

💡 Suggestion: Change the loop condition to attempt < maxRetries and remove the redundant attempt < maxRetries guards from the inner if-checks. This makes the loop run exactly maxRetries retry attempts and allows the loop to exit naturally, falling through to the meaningful ConfigureDevice failed after N retries error at line 128.

Suggested change

for attempt := 0; attempt <= maxRetries; attempt++ {

wg, err := wgctrl.New()

if err != nil {

if runtime.GOOS == "windows" && attempt < maxRetries && isDeviceBusyError(err) {

slog.Warn("wgctrl busy, retrying", "attempt", attempt+1, "error", err)

time.Sleep(time.Duration(500*(attempt+1)) * time.Millisecond)

continue

}

return fmt.Errorf("wgctrl %w", err)

}

err = wg.ConfigureDevice(ifaceName, *c)

wg.Close()

if err == nil {

return nil

}

if runtime.GOOS == "windows" && attempt < maxRetries && isDeviceBusyError(err) {

slog.Warn("ConfigureDevice busy, retrying", "attempt", attempt+1, "error", err)

time.Sleep(time.Duration(500*(attempt+1)) * time.Millisecond)

continue

}

return err

}

defer wg.Close()

return fmt.Errorf("ConfigureDevice failed after %d retries", maxRetries)

const maxRetries = 3

for attempt := 0; attempt < maxRetries; attempt++ {

wg, err := wgctrl.New()

if err != nil {

if runtime.GOOS == "windows" && isDeviceBusyError(err) {

slog.Warn("wgctrl busy, retrying", "attempt", attempt+1, "error", err)

time.Sleep(time.Duration(500*(attempt+1)) * time.Millisecond)

continue

}

return fmt.Errorf("wgctrl %w", err)

}

err = wg.ConfigureDevice(ifaceName, *c)

wg.Close()

if err == nil {

return nil

}

if runtime.GOOS == "windows" && isDeviceBusyError(err) {

slog.Warn("ConfigureDevice busy, retrying", "attempt", attempt+1, "error", err)

time.Sleep(time.Duration(500*(attempt+1)) * time.Millisecond)

continue

}

return err

}

return fmt.Errorf("ConfigureDevice failed after %d retries", maxRetries)

On the final retry attempt, busy errors were returned directly instead of falling through to the exhausted-retries message. Restructure so busy errors always continue the loop and the post-loop error with the retry count is the definitive exhaustion path.

tenki-reviewer bot reviewed Mar 18, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NM-279: fix random listen port on Windows#1266

NM-279: fix random listen port on Windows#1266
abhishek9686 wants to merge 2 commits intorelease-v1.5.0from
NM-279-v2

abhishek9686 commented Mar 18, 2026

Uh oh!

tenki-reviewer bot commented Mar 18, 2026 •

edited

Loading

Uh oh!

tenki-reviewer bot left a comment

Uh oh!

tenki-reviewer bot Mar 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

abhishek9686 commented Mar 18, 2026

Describe your changes

Provide Issue ticket number if applicable/not in title

Provide link to Netmaker PR if required

Provide testing steps

Checklist before requesting a review

Uh oh!

tenki-reviewer bot commented Mar 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tenki-reviewer bot left a comment

Choose a reason for hiding this comment

Overview

Changes Reviewed

Finding

Unreachable fallback error in apply() retry loop (wireguard/wireguard.go)

Positive Notes

Uh oh!

tenki-reviewer bot Mar 18, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

tenki-reviewer bot commented Mar 18, 2026 •

edited

Loading

Unreachable fallback error in `apply()` retry loop (`wireguard/wireguard.go`)