Skip to content

feat(graph): x-medkit-graph vendor resource schema proposal for live pipeline topology #248

@eclipse0922

Description

@eclipse0922

Context

Following the discussion in the [Discord thread (Feb 18)](https://discord.gg) and the merge pipeline PR (#246), this issue proposes a concrete JSON schema for x-medkit-graph — a SOVD vendor resource that exposes live pipeline topology with performance overlay.

The use case driving this: Isaac ROS vision pipelines (e.g. Camera → Rectify → Resize → DNN → Postprocess) need a live diagnostic view that answers:

  • "Where is the bottleneck?" — one node at 15 fps while the rest can do 30
  • "Where did it break?" — a node silently stopped publishing and everything downstream is stale

As discussed with @bartosz_b_ and @mfaferek93: greenwave_monitor already covers per-topic Hz/latency via /diagnostics, and the Diagnostic Bridge already ingests it. The Fault Manager handles cascade correlation. What's missing is the structural layer on top — a DAG view that ties those flat per-topic metrics to a pipeline topology.

This issue proposes the schema for that structural layer before touching any code.


Proposed Schema

{
  "x-medkit-graph": {
    "graph_id": "perception-pipeline-graph",
    "timestamp": "2026-03-02T13:45:00Z",

    "nodes": [
      { "entity_id": "app_camera_node" },
      { "entity_id": "app_rectify_node" },
      { "entity_id": "app_dnn_inference" }
    ],

    "edges": [
      {
        "edge_id": "edge_cam_to_rect",
        "source": "app_camera_node",
        "target": "app_rectify_node",
        "topic": "/camera/image_raw",
        "transport_type": "intra_process_zero_copy",
        "metrics": {
          "source": "greenwave_monitor",
          "frequency_hz": 29.8,
          "latency_ms": 1.2,
          "drop_rate_percent": 0.0
        }
      },
      {
        "edge_id": "edge_rect_to_dnn",
        "source": "app_rectify_node",
        "target": "app_dnn_inference",
        "topic": "/camera/image_rect",
        "transport_type": "intra_process_zero_copy",
        "metrics": {
          "source": "greenwave_monitor",
          "frequency_hz": 14.5,
          "latency_ms": 45.3,
          "drop_rate_percent": 2.1
        }
      }
    ]
  }
}

Design decisions

Nodes reference existing entities by ID (SSOT)
Node health, fault state, and metadata are not duplicated here. Clients look those up via the existing entity endpoints. The graph only carries topology and the metrics overlay.

Direct Node→Node edges (not bipartite)
ROS 2's internal model is bipartite (Node → Topic → Node), which is also what rqt_graph "Nodes/Topics (all)" mode shows. x-medkit-graph instead projects to direct Node→Node edges — consistent with rqt_graph "Nodes only" mode — because the goal is pipeline readability, not a full DDS topology dump.

metrics.source field
Makes the data provenance explicit. Clients can know whether a metric came from greenwave, NITROS diagnostics, or another source. Important for trust and for future multi-source scenarios.


Fan-out / Fan-in handling

One design point worth resolving: ROS 2 fan-out (one publisher, N subscribers on the same topic) produces N separate edges in this schema, all sharing the same topic name.

Currently greenwave_monitor measures per-topic, not per-subscriber-connection. This means fan-out edges on the same topic will carry identical metrics until a per-connection data source (e.g. the NITROS introspection plugin from #112) is available.

Proposal: add an optional shared_topic_id field on edges that share an underlying topic, so clients can group them visually and avoid presenting misleadingly "independent" metrics.

{
  "edge_id": "edge_cam_to_rectify",
  "source": "app_camera_node",
  "target": "app_rectify_node",
  "topic": "/camera/image_raw",
  "shared_topic_id": "topic_camera_image_raw",
  ...
},
{
  "edge_id": "edge_cam_to_logger",
  "source": "app_camera_node",
  "target": "app_logger_node",
  "topic": "/camera/image_raw",
  "shared_topic_id": "topic_camera_image_raw",
  ...
}

Open Questions

  1. metrics when no live data yet — If a topic is declared in the manifest but greenwave hasn't seen it yet, should metrics be null or should there be a "metrics_available": false flag?

  2. transport_type enum — Proposed values below. Any additions or corrections, especially from the NITROS plugin side?

    • "intra_process_zero_copy" — NITROS zero-copy
    • "intra_process" — standard ROS 2 intra-process
    • "inter_process" — same machine, separate processes (DDS)
    • "network" — cross-machine
    • "unknown" — no plugin data available (default)
  3. pipeline-level aggregate health — Should the server compute pipeline_status (healthy / degraded / broken) and bottleneck_edge, or should clients derive this from per-edge metrics? Server-side computation makes it easier to surface in /health, but adds logic.

  4. topology source: manifest vs runtime auto-discoverydepends_on and function membership are manifest-only today. Auto-discovering the DAG from pub/sub connections at runtime (via rclcpp graph introspection APIs) would be more practical for Isaac ROS pipelines. Is this in scope after Add layered MergePipeline for multi-source entity discovery #246 lands, or a separate issue?

  5. fan-out: is shared_topic_id the right hook, or is there a cleaner approach?


Out of Scope for this issue


Relationship to existing issues / PRs

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions