-
Notifications
You must be signed in to change notification settings - Fork 20
Description
Context
Following the discussion in the [Discord thread (Feb 18)](https://discord.gg) and the merge pipeline PR (#246), this issue proposes a concrete JSON schema for x-medkit-graph — a SOVD vendor resource that exposes live pipeline topology with performance overlay.
The use case driving this: Isaac ROS vision pipelines (e.g. Camera → Rectify → Resize → DNN → Postprocess) need a live diagnostic view that answers:
- "Where is the bottleneck?" — one node at 15 fps while the rest can do 30
- "Where did it break?" — a node silently stopped publishing and everything downstream is stale
As discussed with @bartosz_b_ and @mfaferek93: greenwave_monitor already covers per-topic Hz/latency via /diagnostics, and the Diagnostic Bridge already ingests it. The Fault Manager handles cascade correlation. What's missing is the structural layer on top — a DAG view that ties those flat per-topic metrics to a pipeline topology.
This issue proposes the schema for that structural layer before touching any code.
Proposed Schema
{
"x-medkit-graph": {
"graph_id": "perception-pipeline-graph",
"timestamp": "2026-03-02T13:45:00Z",
"nodes": [
{ "entity_id": "app_camera_node" },
{ "entity_id": "app_rectify_node" },
{ "entity_id": "app_dnn_inference" }
],
"edges": [
{
"edge_id": "edge_cam_to_rect",
"source": "app_camera_node",
"target": "app_rectify_node",
"topic": "/camera/image_raw",
"transport_type": "intra_process_zero_copy",
"metrics": {
"source": "greenwave_monitor",
"frequency_hz": 29.8,
"latency_ms": 1.2,
"drop_rate_percent": 0.0
}
},
{
"edge_id": "edge_rect_to_dnn",
"source": "app_rectify_node",
"target": "app_dnn_inference",
"topic": "/camera/image_rect",
"transport_type": "intra_process_zero_copy",
"metrics": {
"source": "greenwave_monitor",
"frequency_hz": 14.5,
"latency_ms": 45.3,
"drop_rate_percent": 2.1
}
}
]
}
}Design decisions
Nodes reference existing entities by ID (SSOT)
Node health, fault state, and metadata are not duplicated here. Clients look those up via the existing entity endpoints. The graph only carries topology and the metrics overlay.
Direct Node→Node edges (not bipartite)
ROS 2's internal model is bipartite (Node → Topic → Node), which is also what rqt_graph "Nodes/Topics (all)" mode shows. x-medkit-graph instead projects to direct Node→Node edges — consistent with rqt_graph "Nodes only" mode — because the goal is pipeline readability, not a full DDS topology dump.
metrics.source field
Makes the data provenance explicit. Clients can know whether a metric came from greenwave, NITROS diagnostics, or another source. Important for trust and for future multi-source scenarios.
Fan-out / Fan-in handling
One design point worth resolving: ROS 2 fan-out (one publisher, N subscribers on the same topic) produces N separate edges in this schema, all sharing the same topic name.
Currently greenwave_monitor measures per-topic, not per-subscriber-connection. This means fan-out edges on the same topic will carry identical metrics until a per-connection data source (e.g. the NITROS introspection plugin from #112) is available.
Proposal: add an optional shared_topic_id field on edges that share an underlying topic, so clients can group them visually and avoid presenting misleadingly "independent" metrics.
{
"edge_id": "edge_cam_to_rectify",
"source": "app_camera_node",
"target": "app_rectify_node",
"topic": "/camera/image_raw",
"shared_topic_id": "topic_camera_image_raw",
...
},
{
"edge_id": "edge_cam_to_logger",
"source": "app_camera_node",
"target": "app_logger_node",
"topic": "/camera/image_raw",
"shared_topic_id": "topic_camera_image_raw",
...
}Open Questions
-
metrics when no live data yet — If a topic is declared in the manifest but greenwave hasn't seen it yet, should
metricsbenullor should there be a"metrics_available": falseflag? -
transport_type enum — Proposed values below. Any additions or corrections, especially from the NITROS plugin side?
"intra_process_zero_copy"— NITROS zero-copy"intra_process"— standard ROS 2 intra-process"inter_process"— same machine, separate processes (DDS)"network"— cross-machine"unknown"— no plugin data available (default)
-
pipeline-level aggregate health — Should the server compute
pipeline_status(healthy/degraded/broken) andbottleneck_edge, or should clients derive this from per-edge metrics? Server-side computation makes it easier to surface in/health, but adds logic. -
topology source: manifest vs runtime auto-discovery —
depends_onand function membership are manifest-only today. Auto-discovering the DAG from pub/sub connections at runtime (viarclcppgraph introspection APIs) would be more practical for Isaac ROS pipelines. Is this in scope after Add layered MergePipeline for multi-source entity discovery #246 lands, or a separate issue? -
fan-out: is
shared_topic_idthe right hook, or is there a cleaner approach?
Out of Scope for this issue
- New Entity types —
x-medkit-graphis a vendor resource on existing Apps/Components/Functions - Runtime DAG auto-discovery implementation — separate PR after Add layered MergePipeline for multi-source entity discovery #246
- NITROS plugin implementation (Discovery: Plugin system for platform-specific introspection #112) — that feeds
transport_typeand per-connection metrics
Relationship to existing issues / PRs
- Add layered MergePipeline for multi-source entity discovery #246 (MergePipeline PR) — prerequisite; once merged, plugin layers provide the hook to inject graph data
- Discovery: Implement merge pipeline for discovery hybrid approach #113 (merge pipeline issue) — Add layered MergePipeline for multi-source entity discovery #246 implements this
- Discovery: Plugin system for platform-specific introspection #112 (NITROS introspection plugin) — will eventually populate
transport_typeand improve fan-out metrics fidelity