Skip to content

Consolidate connection-mode flags; add p2p-dynamic and p2p-dynamic-lazy modes #5989

@MichaelUray

Description

@MichaelUray

Summary

cc @pappz — would value your input given the engine-side context from #5807 / netbirdio/android-client#152.

This is an RFC-style proposal to replace the two independent peer-connection flags (NB_FORCE_RELAY and NB_ENABLE_EXPERIMENTAL_LAZY_CONN) with a single connection-mode enum that has five explicit values, including two new modes (p2p-dynamic, p2p-dynamic-lazy) that combine an always-on relay baseline with on-demand P2P upgrade per active peer. Inactivity thresholds become explicit, configurable settings rather than a single env var.

The companion proposal #5990 extends this with per-peer/per-group server-side override of both the mode and the thresholds.

This addresses the same use case as #5589 (mobile default flip) and #4103 (UI/CLI exposure of relay-only mode) with a broader mechanism. If maintainers agree this is the right direction, the original authors of #5589 / #4103 may want to consider whether their issues are still independently needed or can be closed in favor of this work.

Background

The peer-connection state machine in client/internal/peer/conn.go is currently controlled by two independent settings whose effects overlap on the same code path:

Each has its own client-side and server-side toggles, and the precedence between them is asymmetric and undocumented (see conn_mgr.go:48-82). The recently closed android-client#152 (revert ForceRelay default to false on Android) made it visible that the binary force-relay flag is too coarse for mobile defaults: turning it off costs battery on large meshes (eager ICE for unused peers, see #1354, #2138), turning it on prevents same-LAN P2P even when peers are in the same subnet (see #5589).

Proposed solution

Single enum connection-mode with five values:

Mode Behavior Maps to existing
relay-forced Skip ICE entirely; transport is relay only. Relay stays up indefinitely. NB_FORCE_RELAY=true skip-ICE branch in peer/conn.go:188-203
p2p Eager worker_relay + worker_ice in parallel; hot-swap to P2P on success (conn.go:421 redirect packets from relayed conn to WireGuard). Both stay up indefinitely. current default for non-mobile platforms
p2p-lazy No connection at all until WireGuard sees outgoing traffic to the peer; then full worker_relay + worker_ice. After relay-idle-threshold without traffic, the entire connection is torn down. client/internal/lazyconn/ package as-is
p2p-dynamic (new) Eager worker_relay baseline so the connection is always "ready" with low setup latency. worker_ice is constructed but its OnNewOffer is not registered until the activity-detector fires for that peer. After ice-idle-threshold without traffic, the ICE worker is torn down (relay stays up). new
p2p-dynamic-lazy (new) Same as p2p-dynamic, plus: after a longer relay-idle-threshold without any traffic at all, the relay is also torn down. Re-activation reopens both relay and ICE on the next outgoing packet (same setup latency as p2p-lazy). Combines p2p-dynamic's low-latency-when-active behavior with p2p-lazy's zero-cost-when-truly-idle behavior. new

Why these new modes

p2p-dynamic (and its lazier sibling) address two structural issues that the binary ForceRelay flag cannot resolve:

p2p-dynamic-lazy exists as its own mode rather than as a tunable variant of p2p-dynamic because the two-threshold lifecycle is qualitatively different and worth giving its own user-facing name. Mode resolution stays predictable; thresholds are an orthogonal config concern (next section).

Inactivity thresholds — explicit settings, configurable per scope

Two explicit thresholds replace today's single NB_LAZY_CONN_INACTIVITY_THRESHOLD env var:

Setting Default Applies to Effect on inactivity expiry
ice-idle-threshold 5 min (proposal) p2p-dynamic, p2p-dynamic-lazy ICE worker torn down; relay stays
relay-idle-threshold 1 h (proposal) p2p-lazy, p2p-dynamic-lazy Relay torn down (and ICE if still up); next packet re-opens everything

Both thresholds are configurable independently and follow the same source hierarchy as the mode itself (covered in the companion proposal): account default → per-group → per-peer override, with explicit client-side override on top. Ship reasonable defaults, let admins / power users tune.

relay-forced and p2p are unaffected by either threshold — those modes are explicitly always-on by design. NB_LAZY_CONN_INACTIVITY_THRESHOLD continues to work as a backwards-compat alias for relay-idle-threshold (see backwards compatibility below).

Phased rollout — no default changes in this proposal

This proposal explicitly does NOT change any default mode for any platform. The new modes ship as opt-in choices alongside the existing three behaviors (preserved via the backwards-compat mapping below). Once the implementation is in users' hands and field telemetry exists for the new modes' real-world behavior (battery, latency, edge cases), a follow-up discussion can decide whether to make one of the new modes the new universal default — ideally a single default across all platforms rather than continuing today's mobile-vs-non-mobile split.

This phasing avoids relitigating the default-flip question while the new mechanism is unproven.

Backwards compatibility

Existing knobs continue to work and map to the new enum, with deprecation notices in --help text and docs:

  • NB_FORCE_RELAY=trueconnection-mode=relay-forced
  • NB_FORCE_RELAY=false (or unset) + NB_ENABLE_EXPERIMENTAL_LAZY_CONN=trueconnection-mode=p2p-lazy
  • --enable-lazy-connection--connection-mode=p2p-lazy
  • Account-level Settings.LazyConnectionEnabled=true → equivalent to setting account-level connection-mode=p2p-lazy
  • NB_LAZY_CONN_INACTIVITY_THRESHOLD → backwards-compat alias for relay-idle-threshold

No env-var or CLI removal in this change; deprecate in this minor, remove no earlier than next major.

Settings-source precedence (client-side)

Replace the current asymmetric "client-ON locks server out, client cannot opt-out of server-ON" with a single explicit precedence (applies to both the mode and the thresholds):

  1. Client env var (highest — for debug/CI)
  2. Client config (CLI/UI explicit set, including the special value follow-server to clear a local override)
  3. Server-pushed value (default — what the server resolves for this peer)

Each layer is allowed to set any of the five modes (not just enable/disable) and to override either threshold independently, so a power-user can explicitly opt out of an account-wide setting in either direction (today not possible).

Server-side per-peer/per-group resolution that produces the value sent to the client is covered in the companion proposal.

Implementation notes

Most pieces already exist:

  • relay-forced: existing skip-ICE branch in peer/conn.go:188.
  • p2p: existing default code path.
  • p2p-lazy: existing client/internal/lazyconn/ package with one threshold (relay-idle-threshold).
  • p2p-dynamic: new — but reuses lazyconn/activity/ for the activity-detector and lazyconn/inactivity/manager for the tear-down. The novel piece is decoupling "open relay" from "register ICE OnNewOffer" in peer.NewConn so the relay path runs eagerly while ICE registration is deferred.
  • p2p-dynamic-lazy: new — composition of p2p-dynamic's ICE-deferral with p2p-lazy's relay-teardown, gated on the second (longer) threshold. Adds a second timer to the per-peer inactivity manager.

Subnet-router peers stay always-on via the existing ExcludePeer mechanism. Rosenpass remains mutually exclusive with p2p-lazy, p2p-dynamic, and p2p-dynamic-lazy (same constraint as today, conn_mgr.go:66). Mobile clients drop their ad-hoc UI for ForceRelay and adopt a single mode-picker; the existing Android EnvKeyNBLazyConn/EnvKeyNBInactivityThreshold exports in client/android/env_list.go are already in place for the gomobile binding.

Related issues

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions