feat(graph): x-medkit-graph vendor resource schema proposal for live pipeline topology


## Context

Following the discussion in the [[Discord thread (Feb 18)](https://discord.gg/)](https://discord.gg) and the merge pipeline PR (#246), this issue proposes a concrete JSON schema for `x-medkit-graph` — a SOVD vendor resource that exposes live pipeline topology with performance overlay.

The use case driving this: Isaac ROS vision pipelines (e.g. `Camera → Rectify → Resize → DNN → Postprocess`) need a live diagnostic view that answers:
- **"Where is the bottleneck?"** — one node at 15 fps while the rest can do 30
- **"Where did it break?"** — a node silently stopped publishing and everything downstream is stale

As discussed with @bartosz_b_ and @mfaferek93: greenwave_monitor already covers per-topic Hz/latency via `/diagnostics`, and the Diagnostic Bridge already ingests it. The Fault Manager handles cascade correlation. What's missing is the **structural layer on top** — a DAG view that ties those flat per-topic metrics to a pipeline topology.

This issue proposes the schema for that structural layer before touching any code.

---

## Proposed Schema

```json
{
  "x-medkit-graph": {
    "graph_id": "perception-pipeline-graph",
    "timestamp": "2026-03-02T13:45:00Z",

    "nodes": [
      { "entity_id": "app_camera_node" },
      { "entity_id": "app_rectify_node" },
      { "entity_id": "app_dnn_inference" }
    ],

    "edges": [
      {
        "edge_id": "edge_cam_to_rect",
        "source": "app_camera_node",
        "target": "app_rectify_node",
        "topic": "/camera/image_raw",
        "transport_type": "intra_process_zero_copy",
        "metrics": {
          "source": "greenwave_monitor",
          "frequency_hz": 29.8,
          "latency_ms": 1.2,
          "drop_rate_percent": 0.0
        }
      },
      {
        "edge_id": "edge_rect_to_dnn",
        "source": "app_rectify_node",
        "target": "app_dnn_inference",
        "topic": "/camera/image_rect",
        "transport_type": "intra_process_zero_copy",
        "metrics": {
          "source": "greenwave_monitor",
          "frequency_hz": 14.5,
          "latency_ms": 45.3,
          "drop_rate_percent": 2.1
        }
      }
    ]
  }
}
```

### Design decisions

**Nodes reference existing entities by ID (SSOT)**
Node health, fault state, and metadata are not duplicated here. Clients look those up via the existing entity endpoints. The graph only carries topology and the metrics overlay.

**Direct Node→Node edges (not bipartite)**
ROS 2's internal model is bipartite (`Node → Topic → Node`), which is also what `rqt_graph "Nodes/Topics (all)"` mode shows. x-medkit-graph instead projects to direct Node→Node edges — consistent with `rqt_graph "Nodes only"` mode — because the goal is pipeline readability, not a full DDS topology dump.

**metrics.source field**
Makes the data provenance explicit. Clients can know whether a metric came from greenwave, NITROS diagnostics, or another source. Important for trust and for future multi-source scenarios.

---

## Fan-out / Fan-in handling

One design point worth resolving: ROS 2 fan-out (one publisher, N subscribers on the same topic) produces N separate edges in this schema, all sharing the same `topic` name.

Currently greenwave_monitor measures **per-topic**, not per-subscriber-connection. This means fan-out edges on the same topic will carry identical metrics until a per-connection data source (e.g. the NITROS introspection plugin from #112) is available.

Proposal: add an optional `shared_topic_id` field on edges that share an underlying topic, so clients can group them visually and avoid presenting misleadingly "independent" metrics.

```json
{
  "edge_id": "edge_cam_to_rectify",
  "source": "app_camera_node",
  "target": "app_rectify_node",
  "topic": "/camera/image_raw",
  "shared_topic_id": "topic_camera_image_raw",
  ...
},
{
  "edge_id": "edge_cam_to_logger",
  "source": "app_camera_node",
  "target": "app_logger_node",
  "topic": "/camera/image_raw",
  "shared_topic_id": "topic_camera_image_raw",
  ...
}
```

---

## Open Questions

1. **metrics when no live data yet** — If a topic is declared in the manifest but greenwave hasn't seen it yet, should `metrics` be `null` or should there be a `"metrics_available": false` flag?

2. **transport_type enum** — Proposed values below. Any additions or corrections, especially from the NITROS plugin side?
   - `"intra_process_zero_copy"` — NITROS zero-copy
   - `"intra_process"` — standard ROS 2 intra-process
   - `"inter_process"` — same machine, separate processes (DDS)
   - `"network"` — cross-machine
   - `"unknown"` — no plugin data available (default)

3. **pipeline-level aggregate health** — Should the server compute `pipeline_status` (`healthy` / `degraded` / `broken`) and `bottleneck_edge`, or should clients derive this from per-edge metrics? Server-side computation makes it easier to surface in `/health`, but adds logic.

4. **topology source: manifest vs runtime auto-discovery** — `depends_on` and function membership are manifest-only today. Auto-discovering the DAG from pub/sub connections at runtime (via `rclcpp` graph introspection APIs) would be more practical for Isaac ROS pipelines. Is this in scope after #246 lands, or a separate issue?

5. **fan-out: is `shared_topic_id` the right hook**, or is there a cleaner approach?

---

## Out of Scope for this issue

- New Entity types — `x-medkit-graph` is a vendor resource on existing Apps/Components/Functions
- Runtime DAG auto-discovery implementation — separate PR after #246
- NITROS plugin implementation (#112) — that feeds `transport_type` and per-connection metrics

---

## Relationship to existing issues / PRs

- #246 (MergePipeline PR) — prerequisite; once merged, plugin layers provide the hook to inject graph data
- #113 (merge pipeline issue) — #246 implements this
- #112 (NITROS introspection plugin) — will eventually populate `transport_type` and improve fan-out metrics fidelity

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(graph): x-medkit-graph vendor resource schema proposal for live pipeline topology #248

Context

Proposed Schema

Design decisions

Fan-out / Fan-in handling

Open Questions

Out of Scope for this issue

Relationship to existing issues / PRs

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

feat(graph): x-medkit-graph vendor resource schema proposal for live pipeline topology #248

Description

Context

Proposed Schema

Design decisions

Fan-out / Fan-in handling

Open Questions

Out of Scope for this issue

Relationship to existing issues / PRs

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions