Skip to content

feat(mesh): session singleton + [mesh] optional dependency#100

Closed
cagataycali wants to merge 1 commit intostrands-labs:mainfrom
cagataycali:feat/mesh-session
Closed

feat(mesh): session singleton + [mesh] optional dependency#100
cagataycali wants to merge 1 commit intostrands-labs:mainfrom
cagataycali:feat/mesh-session

Conversation

@cagataycali
Copy link
Copy Markdown
Member

Summary

Add strands_robots/mesh_session.py — a thread-safe, ref-counted Zenoh session singleton that will be shared by all Mesh instances in the same process.

Part 1 of 6 for v0.4.0 Zenoh mesh networking. See implementation plan.

What's in this PR

New files

File LOC Purpose
strands_robots/mesh_session.py 211 Session lifecycle, MeshConfig, auto-mesh
tests/test_mesh_session.py 258 22 tests, 98% coverage, all mock-based

Modified files

File Change
pyproject.toml Add [mesh] extra (eclipse-zenoh>=0.11.0,<0.12.0), add to [all], mypy override

Design decisions

  1. Module, not subpackagemesh_session.py lives at strands_robots/ level. The mesh/ subpackage comes in PR 2 when there are multiple files to organize.

  2. MeshConfig(frozen=True) — Immutable after creation for thread safety. Reads ZENOH_CONNECT, ZENOH_LISTEN, STRANDS_MESH_PORT from env.

  3. Auto-mesh — When no env vars are set, the first process on a host listens on tcp/127.0.0.1:7447; subsequent processes connect as clients. Zero config required.

  4. STRANDS_MESH=false kill switch — Disables mesh globally before any zenoh import.

  5. Lazy importeclipse-zenoh is imported via importlib.import_module() only when get_session() is called. No import-time dependency.

  6. Version pin >=0.11.0,<0.12.0 — eclipse-zenoh 1.0 has API breaks in subscriber patterns. Conservative pin for safety; broadened after testing.

  7. Return Anyget_session() returns Any (not zenoh.Session) to avoid forcing zenoh as an import-time dependency. Consistent with how the codebase handles other optional deps (torch, lerobot).

API

from strands_robots.mesh_session import get_session, release_session, MeshConfig, session_info

# Acquire (ref +1)
session = get_session()

# Release (ref -1, closes at 0)
release_session()

# Explicit config
session = get_session(config=MeshConfig(connect=("tcp/10.0.0.5:7447",)))

# Diagnostics
info = session_info()  # {'active': True, 'refs': 1}

Test results

22 passed in 0.43s — 98% coverage on mesh_session.py
250 passed (full suite), 6 skipped, 0 failures

What's next

PR Scope Depends on
PR 1 (this) Session singleton + [mesh] extra
PR 2 Mesh class + presence + peer discovery PR 1
PR 3 tell/send/broadcast + response correlation PR 2
PR 4 publish_step + subscribe + on_stream PR 2
PR 5 emergency_stop + safety audit log PR 3
PR 6 Wire mesh into Robot.__init__ PR 5 + v0.3.9

Prototype provenance: dashboard-zenoh-fixes:strands_robots/zenoh_mesh.py (691 LOC, battle-tested with 2 SO-100 arms, 10 simulated G1 humanoids, Mac↔Jetson Thor Wi-Fi at 50Hz).


🤖 AI agent response. Strands Agents. Feedback welcome!

Add strands_robots/mesh_session.py — a ref-counted Zenoh session
singleton that will be shared by all Mesh instances in the process.

Design:
- MeshConfig (frozen dataclass) reads ZENOH_CONNECT, ZENOH_LISTEN,
  STRANDS_MESH_PORT from environment
- get_session() / release_session() with thread-safe ref counting
- Auto-mesh: first process listens on localhost:7447, subsequent
  processes connect as clients
- STRANDS_MESH=false global kill switch
- Lazy import: eclipse-zenoh is not required at import time
- session_info() for dashboard diagnostics

pyproject.toml:
- Add [mesh] extra: eclipse-zenoh>=0.11.0,<0.12.0
- Add [mesh] to [all] extras
- Add mypy override for mesh_session (zenoh types are Any)

Tests: 22 tests, 98% coverage, all mock-based (no network)
- MeshConfig construction, frozen, from_env parsing
- Ref counting: acquire, release, close-at-zero, reopen
- Auto-mesh: listener mode vs client fallback
- Kill switch, zenoh-not-installed graceful degradation
- Thread safety: 4 threads × 50 acquire/release cycles

Part 1 of 6 for v0.4.0 Zenoh mesh — see cagataycali/strands-gtc-nvidia#313
@cagataycali
Copy link
Copy Markdown
Member Author

Closing as duplicate of #101 — three concurrent mesh-session PRs were opened simultaneously (#100, #101, #103). Consolidating on #101 which has the most complete scope (4 files, 801 LOC incl. PeerInfo registry). If reviewers prefer pieces from this PR, we will cherry-pick before merging #101.

Canonical PR: #101
Tracking issue: #98

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

1 participant