Skip to content

Holepunching unreliable if there's network overlap #72

@asardaes

Description

@asardaes

Describe the Bug

It seems like Olm can't decide beetween holepunching and relaying when there's network overlap:

INFO: 2026/01/02 12:56:40 Added IPv4 included route: {DestinationAddress:172.16.15.133 SubnetMask:255.255.255.255 GatewayAddress: IsDefault:false}
INFO: 2026/01/02 12:56:40 Adding route to 172.16.15.133/32 via interface olm
INFO: 2026/01/02 12:56:40 Added route for remote subnet: 172.16.15.133/32
INFO: 2026/01/02 12:56:40 Started monitoring for site 3 at 100.90.128.3:50145
INFO: 2026/01/02 12:56:40 Configured peer hGI22xrRZQJQ8weL8Ye2FIutO+vBX6IXJpCUpTqJ4W0=
INFO: 2026/01/02 12:56:40 Started monitoring peer 3
INFO: 2026/01/02 12:56:40 Started holepunch connection monitor
INFO: 2026/01/02 12:56:40 DNS proxy started on 100.96.128.1:53 (tunnelDNS=false)
INFO: 2026/01/02 12:56:40 WireGuard device created.
INFO: 2026/01/02 12:56:40 Starting rapid holepunch test for site 3 at 172.16.15.133:50144 (max 5 attempts, 400ms timeout each)
WARN: 2026/01/02 12:56:42 Rapid test: site 3 holepunch FAILED after 5 attempts, will relay
INFO: 2026/01/02 12:56:42 Rapid test failed for site 3, requesting relay
INFO: 2026/01/02 12:56:42 Sent relay message
INFO: 2026/01/02 12:56:42 Adjusted peer 3 to point to relay!
INFO: 2026/01/02 12:56:45 Holepunch to site 3 (172.16.15.133:50144) is CONNECTED (RTT: 1.033343387s)
INFO: 2026/01/02 12:56:45 Holepunch to site 3 succeeded while relayed, switching to direct connection
INFO: 2026/01/02 12:56:45 Sent unrelay message
INFO: 2026/01/02 12:56:45 Switched peer 3 back to direct connection at 172.16.15.133:50144
WARN: 2026/01/02 12:56:45 WireGuard connection to site 3 is DISCONNECTED
INFO: 2026/01/02 12:56:45 WireGuard connection to site 3 is CONNECTED (RTT: 3.102040645s)
WARN: 2026/01/02 12:56:48 Holepunch to site 3 (172.16.15.133:50144) is DISCONNECTED: timeout waiting for response
WARN: 2026/01/02 12:56:49 WireGuard connection to site 3 is DISCONNECTED
INFO: 2026/01/02 12:56:52 Holepunch to site 3 failed 3 times, triggering relay
INFO: 2026/01/02 12:56:52 Sent relay message
INFO: 2026/01/02 12:56:52 Adjusted peer 3 to point to relay!
INFO: 2026/01/02 12:56:52 WireGuard connection to site 3 is CONNECTED (RTT: 21.876093ms)
INFO: 2026/01/02 12:56:54 Holepunch to site 3 (172.16.15.133:50144) is CONNECTED (RTT: 15.023471ms)
INFO: 2026/01/02 12:56:54 Holepunch to site 3 succeeded while relayed, switching to direct connection
INFO: 2026/01/02 12:56:54 Sent unrelay message
INFO: 2026/01/02 12:56:54 Switched peer 3 back to direct connection at 172.16.15.133:50144
WARN: 2026/01/02 12:56:58 Holepunch to site 3 (172.16.15.133:50144) is DISCONNECTED: timeout waiting for response
WARN: 2026/01/02 12:56:59 WireGuard connection to site 3 is DISCONNECTED
INFO: 2026/01/02 12:57:02 Holepunch to site 3 failed 3 times, triggering relay
INFO: 2026/01/02 12:57:02 Sent relay message
INFO: 2026/01/02 12:57:02 Adjusted peer 3 to point to relay!
INFO: 2026/01/02 12:57:02 WireGuard connection to site 3 is CONNECTED (RTT: 20.854135ms)

172.16.15.133 is the Newt site's private IP, see more below.

Environment

  • OS Type & Version: Debian GNU/Linux 12 (bookworm)
  • Pangolin Version: 1.14.1
  • Gerbil Version: 1.3.0
  • Olm Version: 1.3.0

To Reproduce

I did a maybe-unusual experiment. I have 2 VMs in my VPS and they're both in the same subnet, one for Pangolin and one for a Newt site. Pangolin has a public domain, but I tried to connect the Newt site through the internal subnet by manually entering an entry in its /etc/hosts:

172.16.15.101 my.domain.com

Newt is not running inside a container.

I then defined a private resource in the Newt VM. The client machine ended up being a container with Olm that's not on the VPS and is not running in host network mode. The Docker daemon from Olm's host has a pretty large IP pool (TrueNAS default): 172.17.0.0/12, so as seen in the logs, the holepunch tried to use the VPS private IP, which obviously cannot work, but since that IP was also valid in the Olm container's network, it looked like it could work, but it didn't.

Expected Behavior

My guess is that Olm should completely ignore private IP ranges when attempting to holepunch.

Metadata

Metadata

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions