Summary
cc @pappz — would value your input given the engine-side context from #5807 / netbirdio/android-client#152.
This is an RFC-style proposal to replace the two independent peer-connection flags (NB_FORCE_RELAY and NB_ENABLE_EXPERIMENTAL_LAZY_CONN) with a single connection-mode enum that has five explicit values, including two new modes (p2p-dynamic, p2p-dynamic-lazy) that combine an always-on relay baseline with on-demand P2P upgrade per active peer. Inactivity thresholds become explicit, configurable settings rather than a single env var.
The companion proposal #5990 extends this with per-peer/per-group server-side override of both the mode and the thresholds.
This addresses the same use case as #5589 (mobile default flip) and #4103 (UI/CLI exposure of relay-only mode) with a broader mechanism. If maintainers agree this is the right direction, the original authors of #5589 / #4103 may want to consider whether their issues are still independently needed or can be closed in favor of this work.
Background
The peer-connection state machine in client/internal/peer/conn.go is currently controlled by two independent settings whose effects overlap on the same code path:
Each has its own client-side and server-side toggles, and the precedence between them is asymmetric and undocumented (see conn_mgr.go:48-82). The recently closed android-client#152 (revert ForceRelay default to false on Android) made it visible that the binary force-relay flag is too coarse for mobile defaults: turning it off costs battery on large meshes (eager ICE for unused peers, see #1354, #2138), turning it on prevents same-LAN P2P even when peers are in the same subnet (see #5589).
Proposed solution
Single enum connection-mode with five values:
| Mode |
Behavior |
Maps to existing |
relay-forced |
Skip ICE entirely; transport is relay only. Relay stays up indefinitely. |
NB_FORCE_RELAY=true skip-ICE branch in peer/conn.go:188-203 |
p2p |
Eager worker_relay + worker_ice in parallel; hot-swap to P2P on success (conn.go:421 redirect packets from relayed conn to WireGuard). Both stay up indefinitely. |
current default for non-mobile platforms |
p2p-lazy |
No connection at all until WireGuard sees outgoing traffic to the peer; then full worker_relay + worker_ice. After relay-idle-threshold without traffic, the entire connection is torn down. |
client/internal/lazyconn/ package as-is |
p2p-dynamic (new) |
Eager worker_relay baseline so the connection is always "ready" with low setup latency. worker_ice is constructed but its OnNewOffer is not registered until the activity-detector fires for that peer. After ice-idle-threshold without traffic, the ICE worker is torn down (relay stays up). |
new |
p2p-dynamic-lazy (new) |
Same as p2p-dynamic, plus: after a longer relay-idle-threshold without any traffic at all, the relay is also torn down. Re-activation reopens both relay and ICE on the next outgoing packet (same setup latency as p2p-lazy). Combines p2p-dynamic's low-latency-when-active behavior with p2p-lazy's zero-cost-when-truly-idle behavior. |
new |
Why these new modes
p2p-dynamic (and its lazier sibling) address two structural issues that the binary ForceRelay flag cannot resolve:
p2p-dynamic-lazy exists as its own mode rather than as a tunable variant of p2p-dynamic because the two-threshold lifecycle is qualitatively different and worth giving its own user-facing name. Mode resolution stays predictable; thresholds are an orthogonal config concern (next section).
Inactivity thresholds — explicit settings, configurable per scope
Two explicit thresholds replace today's single NB_LAZY_CONN_INACTIVITY_THRESHOLD env var:
| Setting |
Default |
Applies to |
Effect on inactivity expiry |
ice-idle-threshold |
5 min (proposal) |
p2p-dynamic, p2p-dynamic-lazy |
ICE worker torn down; relay stays |
relay-idle-threshold |
1 h (proposal) |
p2p-lazy, p2p-dynamic-lazy |
Relay torn down (and ICE if still up); next packet re-opens everything |
Both thresholds are configurable independently and follow the same source hierarchy as the mode itself (covered in the companion proposal): account default → per-group → per-peer override, with explicit client-side override on top. Ship reasonable defaults, let admins / power users tune.
relay-forced and p2p are unaffected by either threshold — those modes are explicitly always-on by design. NB_LAZY_CONN_INACTIVITY_THRESHOLD continues to work as a backwards-compat alias for relay-idle-threshold (see backwards compatibility below).
Phased rollout — no default changes in this proposal
This proposal explicitly does NOT change any default mode for any platform. The new modes ship as opt-in choices alongside the existing three behaviors (preserved via the backwards-compat mapping below). Once the implementation is in users' hands and field telemetry exists for the new modes' real-world behavior (battery, latency, edge cases), a follow-up discussion can decide whether to make one of the new modes the new universal default — ideally a single default across all platforms rather than continuing today's mobile-vs-non-mobile split.
This phasing avoids relitigating the default-flip question while the new mechanism is unproven.
Backwards compatibility
Existing knobs continue to work and map to the new enum, with deprecation notices in --help text and docs:
NB_FORCE_RELAY=true → connection-mode=relay-forced
NB_FORCE_RELAY=false (or unset) + NB_ENABLE_EXPERIMENTAL_LAZY_CONN=true → connection-mode=p2p-lazy
--enable-lazy-connection → --connection-mode=p2p-lazy
- Account-level
Settings.LazyConnectionEnabled=true → equivalent to setting account-level connection-mode=p2p-lazy
NB_LAZY_CONN_INACTIVITY_THRESHOLD → backwards-compat alias for relay-idle-threshold
No env-var or CLI removal in this change; deprecate in this minor, remove no earlier than next major.
Settings-source precedence (client-side)
Replace the current asymmetric "client-ON locks server out, client cannot opt-out of server-ON" with a single explicit precedence (applies to both the mode and the thresholds):
- Client env var (highest — for debug/CI)
- Client config (CLI/UI explicit set, including the special value
follow-server to clear a local override)
- Server-pushed value (default — what the server resolves for this peer)
Each layer is allowed to set any of the five modes (not just enable/disable) and to override either threshold independently, so a power-user can explicitly opt out of an account-wide setting in either direction (today not possible).
Server-side per-peer/per-group resolution that produces the value sent to the client is covered in the companion proposal.
Implementation notes
Most pieces already exist:
relay-forced: existing skip-ICE branch in peer/conn.go:188.
p2p: existing default code path.
p2p-lazy: existing client/internal/lazyconn/ package with one threshold (relay-idle-threshold).
p2p-dynamic: new — but reuses lazyconn/activity/ for the activity-detector and lazyconn/inactivity/manager for the tear-down. The novel piece is decoupling "open relay" from "register ICE OnNewOffer" in peer.NewConn so the relay path runs eagerly while ICE registration is deferred.
p2p-dynamic-lazy: new — composition of p2p-dynamic's ICE-deferral with p2p-lazy's relay-teardown, gated on the second (longer) threshold. Adds a second timer to the per-peer inactivity manager.
Subnet-router peers stay always-on via the existing ExcludePeer mechanism. Rosenpass remains mutually exclusive with p2p-lazy, p2p-dynamic, and p2p-dynamic-lazy (same constraint as today, conn_mgr.go:66). Mobile clients drop their ad-hoc UI for ForceRelay and adopt a single mode-picker; the existing Android EnvKeyNBLazyConn/EnvKeyNBInactivityThreshold exports in client/android/env_list.go are already in place for the gomobile binding.
Related issues
Summary
cc @pappz — would value your input given the engine-side context from #5807 / netbirdio/android-client#152.
This is an RFC-style proposal to replace the two independent peer-connection flags (
NB_FORCE_RELAYandNB_ENABLE_EXPERIMENTAL_LAZY_CONN) with a singleconnection-modeenum that has five explicit values, including two new modes (p2p-dynamic,p2p-dynamic-lazy) that combine an always-on relay baseline with on-demand P2P upgrade per active peer. Inactivity thresholds become explicit, configurable settings rather than a single env var.The companion proposal #5990 extends this with per-peer/per-group server-side override of both the mode and the thresholds.
This addresses the same use case as #5589 (mobile default flip) and #4103 (UI/CLI exposure of relay-only mode) with a broader mechanism. If maintainers agree this is the right direction, the original authors of #5589 / #4103 may want to consider whether their issues are still independently needed or can be closed in favor of this work.
Background
The peer-connection state machine in
client/internal/peer/conn.gois currently controlled by two independent settings whose effects overlap on the same code path:NB_FORCE_RELAY/EnvKeyNBForceRelay(peer/env.go)NB_ENABLE_EXPERIMENTAL_LAZY_CONN/LazyConnectionEnabled(lazyconn/env.go)Each has its own client-side and server-side toggles, and the precedence between them is asymmetric and undocumented (see conn_mgr.go:48-82). The recently closed android-client#152 (revert ForceRelay default to false on Android) made it visible that the binary
force-relayflag is too coarse for mobile defaults: turning it off costs battery on large meshes (eager ICE for unused peers, see #1354, #2138), turning it on prevents same-LAN P2P even when peers are in the same subnet (see #5589).Proposed solution
Single enum
connection-modewith five values:relay-forcedNB_FORCE_RELAY=trueskip-ICE branch inpeer/conn.go:188-203p2pworker_relay+worker_icein parallel; hot-swap to P2P on success (conn.go:421redirect packets from relayed conn to WireGuard). Both stay up indefinitely.p2p-lazyworker_relay+worker_ice. Afterrelay-idle-thresholdwithout traffic, the entire connection is torn down.client/internal/lazyconn/package as-isp2p-dynamic(new)worker_relaybaseline so the connection is always "ready" with low setup latency.worker_iceis constructed but itsOnNewOfferis not registered until the activity-detector fires for that peer. Afterice-idle-thresholdwithout traffic, the ICE worker is torn down (relay stays up).p2p-dynamic-lazy(new)p2p-dynamic, plus: after a longerrelay-idle-thresholdwithout any traffic at all, the relay is also torn down. Re-activation reopens both relay and ICE on the next outgoing packet (same setup latency asp2p-lazy). Combinesp2p-dynamic's low-latency-when-active behavior withp2p-lazy's zero-cost-when-truly-idle behavior.Why these new modes
p2p-dynamic(and its lazier sibling) address two structural issues that the binaryForceRelayflag cannot resolve:relay-forced(one relay TCP multiplex regardless of peer count) — pappz's battery argument from Revert ForceRelay default to false now that P2P issues are fixed android-client#152 holds without sacrificing P2P entirely.p2ponce the upgrade settles (~1s of relay-routed traffic at the start of each active session before the hot-swap; subsequent traffic is direct).DeactivatePeeris a no-op when the local manager is not in lazy mode (the lazy peer'sGO_IDLEsignal is silently ignored, so the eager side immediately reconnects).p2p-dynamic-lazyexists as its own mode rather than as a tunable variant ofp2p-dynamicbecause the two-threshold lifecycle is qualitatively different and worth giving its own user-facing name. Mode resolution stays predictable; thresholds are an orthogonal config concern (next section).Inactivity thresholds — explicit settings, configurable per scope
Two explicit thresholds replace today's single
NB_LAZY_CONN_INACTIVITY_THRESHOLDenv var:ice-idle-thresholdp2p-dynamic,p2p-dynamic-lazyrelay-idle-thresholdp2p-lazy,p2p-dynamic-lazyBoth thresholds are configurable independently and follow the same source hierarchy as the mode itself (covered in the companion proposal): account default → per-group → per-peer override, with explicit client-side override on top. Ship reasonable defaults, let admins / power users tune.
relay-forcedandp2pare unaffected by either threshold — those modes are explicitly always-on by design.NB_LAZY_CONN_INACTIVITY_THRESHOLDcontinues to work as a backwards-compat alias forrelay-idle-threshold(see backwards compatibility below).Phased rollout — no default changes in this proposal
This proposal explicitly does NOT change any default mode for any platform. The new modes ship as opt-in choices alongside the existing three behaviors (preserved via the backwards-compat mapping below). Once the implementation is in users' hands and field telemetry exists for the new modes' real-world behavior (battery, latency, edge cases), a follow-up discussion can decide whether to make one of the new modes the new universal default — ideally a single default across all platforms rather than continuing today's mobile-vs-non-mobile split.
This phasing avoids relitigating the default-flip question while the new mechanism is unproven.
Backwards compatibility
Existing knobs continue to work and map to the new enum, with deprecation notices in
--helptext and docs:NB_FORCE_RELAY=true→connection-mode=relay-forcedNB_FORCE_RELAY=false(or unset) +NB_ENABLE_EXPERIMENTAL_LAZY_CONN=true→connection-mode=p2p-lazy--enable-lazy-connection→--connection-mode=p2p-lazySettings.LazyConnectionEnabled=true→ equivalent to setting account-levelconnection-mode=p2p-lazyNB_LAZY_CONN_INACTIVITY_THRESHOLD→ backwards-compat alias forrelay-idle-thresholdNo env-var or CLI removal in this change; deprecate in this minor, remove no earlier than next major.
Settings-source precedence (client-side)
Replace the current asymmetric "client-ON locks server out, client cannot opt-out of server-ON" with a single explicit precedence (applies to both the mode and the thresholds):
follow-serverto clear a local override)Each layer is allowed to set any of the five modes (not just enable/disable) and to override either threshold independently, so a power-user can explicitly opt out of an account-wide setting in either direction (today not possible).
Server-side per-peer/per-group resolution that produces the value sent to the client is covered in the companion proposal.
Implementation notes
Most pieces already exist:
relay-forced: existing skip-ICE branch inpeer/conn.go:188.p2p: existing default code path.p2p-lazy: existingclient/internal/lazyconn/package with one threshold (relay-idle-threshold).p2p-dynamic: new — but reuseslazyconn/activity/for the activity-detector andlazyconn/inactivity/managerfor the tear-down. The novel piece is decoupling "open relay" from "register ICE OnNewOffer" inpeer.NewConnso the relay path runs eagerly while ICE registration is deferred.p2p-dynamic-lazy: new — composition ofp2p-dynamic's ICE-deferral withp2p-lazy's relay-teardown, gated on the second (longer) threshold. Adds a second timer to the per-peer inactivity manager.Subnet-router peers stay always-on via the existing
ExcludePeermechanism. Rosenpass remains mutually exclusive withp2p-lazy,p2p-dynamic, andp2p-dynamic-lazy(same constraint as today, conn_mgr.go:66). Mobile clients drop their ad-hoc UI for ForceRelay and adopt a single mode-picker; the existing AndroidEnvKeyNBLazyConn/EnvKeyNBInactivityThresholdexports inclient/android/env_list.goare already in place for the gomobile binding.Related issues
p2p-dynamic/p2p-dynamic-lazywhich give same-LAN P2P without the eager-ICE battery cost.ForceRelay=truemobile default).