Skip to content

Hvui/cxo metrics and fleet resources#2424

Open
0pcom wants to merge 23 commits intoskycoin:developfrom
0pcom:hvui/cxo-metrics-and-fleet-resources
Open

Hvui/cxo metrics and fleet resources#2424
0pcom wants to merge 23 commits intoskycoin:developfrom
0pcom:hvui/cxo-metrics-and-fleet-resources

Conversation

@0pcom
Copy link
Copy Markdown
Collaborator

@0pcom 0pcom commented May 4, 2026

No description provided.

0pcom added 23 commits May 3, 2026 17:10
Global .mat-mdc-button:not(:disabled) { color: $black } in
styles.scss was bleeding through onto the day-window selector
(1d/7d/30d), the view-mode toggle (Compact/Tree), and the
Refresh button on the new Transports home tab — black-on-dark,
invisible until clicked. Forced explicit white on the inactive
state, kept explicit white on .active, and let the mat-icon
inherit so the chevron / refresh glyph stays visible too.
- 'account_tree' rendering as the literal string on the Tree
  view-mode button (newer ligature missing from the bundled
  Material Icons font); swapped for 'device_hub' which is
  definitely in the font and reads as a tree shape too
- new edges show/hide toggle in the controls row, visible only
  in the compact view. Hides the edge_a + edge_b columns
  (66-char PKs each) so operators who care about
  id/type/bandwidth/latency get a readable single-screen table
Two issues on the Resources home tab:

1. Row order shuffled between polls because getNodes() doesn't
   guarantee a stable order. Sort the merged rows by label
   (case-insensitive), falling back to PK, after the merge step
   so the table doesn't reflow on each 5s tick.

2. The table didn't match the home /nodes/list/1 visual language —
   it lacked the rounded-elevated-box wrapper, used a non-link table
   row, and put the visor name + PK in plain cells. Switched to the
   same shell: rounded-elevated-box → responsive-table-translucid
   d-none d-md-table → <a class="selectable link-row"> rows that
   navigate to the per-visor Resources tab on click. Added an
   IP/Location column matching the home page (with city/region/
   country footer line) and a chevron-right action cell. Mobile
   fallback gets the same link wrapper.
Surfaces TPD's per-transport `live` flag in both view modes so an
operator can spot dead transports at a glance and optionally hide
them entirely.

Compact view:
- new "Offline: Show / Hide" toggle in the controls row
- type cell becomes a small pill (blue for live, red for offline)
- offline rows dim with a warm tint

Tree view:
- visor row gains a small red "N offline" pill summarizing the
  visor's offline-transport count (only shown when offline rows
  are visible — i.e. when the global toggle is set to Show)
- offline child rows render with reduced opacity + warm tint
- with the toggle set to Hide, visors with no live children drop
  out of the tree entirely
Read-only Runtime Configuration section becomes round-trippable.
The hvui pane gets Edit / Save / Cancel buttons; the textarea
parses JSON live (client-side error message, Save disabled while
invalid); on Save the body is PUT verbatim to a new endpoint that
re-validates server-side and writes to the on-disk config.

Backend (pkg/visor):
- new SetRuntimeConfig([]byte) on Visor + the API interface
- strict JSON decode into a fresh visorconfig.V1 with
  DisallowUnknownFields() so typos in field names get rejected
  instead of silently dropping data
- SK/PK consistency check (PK must derive from SK when both set;
  PK alone with empty SK is rejected since startup will fail)
- writes the user's bytes verbatim to v.conf.Path() (preserves
  whitespace + ordering) — no hot reload, response signals
  restart_required: true so the UI can surface that
- new visorconfig.Common.Path() exported getter so the runtime-
  config writer can target the on-disk path without poking at
  unexported fields
- new hypervisor route PUT /api/visors/{pk}/runtime-config; RPC
  method + rpcClient + mock implementations for cross-visor reach

Frontend (api.service + node-info-content):
- ApiService gains a RawJson request type that sends the body
  string verbatim (no JSON.stringify wrapping). Default Json
  mode unchanged so other call sites are unaffected
- Runtime config section grows an Edit button; clicking it copies
  rawConfig into a textarea, attaches an input listener that runs
  JSON.parse on every keystroke, and shows the parser's error
  inline. Save is disabled while configError is non-empty
- on Save: PUTs the draft via RawJson, on success replaces the
  rendered rawConfig and surfaces the "restart required" hint;
  on failure shows the server's error message
red offline rows

Three fixes batched:

1. Runtime Configuration "Edit" / "Cancel" buttons rendered black-
   on-dark (invisible). Add explicit white color on non-primary
   buttons in .config-actions; lighten the stroked-button border
   so the outline reads on the dark surface.

2. Rewards / Routing / Transports tabs were using info-line /
   toggle-line / inline-edit-btn / inline-form / collapsible-link
   classes that only existed in node-info-content.component.scss
   (component-scoped) — so on the new tabs those classes were
   unstyled and elements crowded into a single un-flexed line.
   Moved the helpers into assets/scss/_forms.scss so all four tabs
   pick them up, and added flex-wrap: wrap so a long PK + edit
   button no longer overflow on narrow widths.

3. Network-Transports tab — offline rows used a "warm dim" pink
   that didn't read clearly as "this is broken." Switch to an
   explicit red row tint + a 3px red left-border on the first cell
   so offline transports stand out at a glance. Live rows are
   unchanged.
The visor's stats tracker (pkg/visor/stats) already keeps a bbolt-
backed rollup of every transport's sent/recv counters and latency
stats — current snapshot + per-day daily rollups, retained for 30
days. Surface that locally instead of routing through TPD.

Backend (pkg/visor):
- new Visor.LocalTransportStats() returns []*stats.TransportRecord
  sorted busiest-first (Current.SentBytes+RecvBytes summed with
  every Daily rollup so cold-but-historically-busy transports
  rank above genuinely-cold ones)
- LocalTransportStatsResponse wraps the list with FetchedAt for
  the hvui's "last sample" stamp
- API interface, RPC, rpcClient, mock all wired
- new GET /api/visors/{pk}/local-transport-stats handler

Frontend (per-visor pages):
- new "Bandwidth" tab between Transports and Apps (equalizer icon)
- BandwidthComponent at /nodes/<pk>/bandwidth polls the new
  endpoint every 30s
- window selector: Now (current snapshot only) / 7d / 30d (sums
  daily rollups in the chosen window)
- one expandable row per transport: type pill, transport id,
  remote PK, total/sent/recv bytes, current latency
- expand reveals current snapshot grid + daily-history table
  (date / sent / recv / total / lat min/avg/max / sample count)
- selectedTabIndex bookkeeping shifts apps/rewards/skynet/
  resources/chat/dmsg by one slot to make room
Three controls that lived behind the destructive-action menu next
to Shut Down promoted to dedicated per-visor tabs:

- Terminal (terminal icon) — iframes the existing /pty/<pk> route
  the hypervisor already serves, so the dmsgpty UI lives inline
  in the tab. "Open in new window" button keeps the full-screen
  path the actions menu used to provide
- Web Proxy (language icon) — resolving proxy controls (.skynet
  + .dmsg domain resolution + upstream SOCKS5 address). Same
  feature set as the old ProxySettingsComponent dialog, laid out
  flat with a refresh button so the live state can be re-fetched
  without reopening
- Logs (description icon) — runtime log viewer with live tail,
  level filter (All / Debug+ / Info+ / Warn+ / Error+ / Fatal+ /
  Panic), pause toggle, and "Open raw logs" link to the existing
  /api/visors/<pk>/runtime-logs raw endpoint. Reuses the same
  diff-streaming endpoint NodeLogsComponent's dialog used

The actions menu (top-bar dot-dot-dot near Shut Down) now carries
only the destructive Shut Down action, since Terminal / Logs /
Proxy are first-class tabs. The dialog-mode handlers stay in
node-actions-helper.ts as dead code so the switch statement
compiles, but they're unreachable from the UI.

Tab order:
  Info · Routing · Transports · Bandwidth · Apps · Rewards ·
  Skynet · Web Proxy · Resources · Terminal · Logs · Skychat · DMSG
Tab strip order:
  Info · Routing · Transports · Bandwidth · Apps · Skychat ·
  Rewards · Skynet · Web Proxy · Resources · Terminal · Logs · DMSG

selectedTabIndex bookkeeping shifted to match.
skysocks-client: drop dead --passcode (apps no longer accept it),
add --addr / --http / --tries / --retry-time fields routed through
custom_setting (no typed putApp handler exists for them).

skychat: full rewrite of the settings dialog. Listener section keeps
the existing localhost-only + port toggle (typed --addr handler).
New persistence section exposes --persist + --persist-{max-size,
per-peer-rate, per-peer-cap, total-cap, ttl, seed}. Network listener
toggles for --skynet / --dmsg, plus --pair-enable for CXO pair feeds.
All numeric persist-* fields are optional — empty value omits the
flag so the binary default applies rather than writing `-flag ""`.

Filesystem-path flags (--persist-db, --persist-whitelist) are
intentionally omitted from the UI; they are box-local config.
`cli util got` becomes `cli got` and grows scheme awareness for
skynet:// and dmsg:// URLs (routed via the visor's SkynetHTTP /
DmsgHTTP RPCs). `cli skynet curl` is removed — its functionality
lives in `cli got` (incl. `dl`, `req`, `head`).

The point of the move is to leave `cli skynet` unambiguously about
port consumption (srv / start / stop / status). `got` is the
correct home for HTTP-style fetches regardless of which transport
they ride on, since the user picks the route via the URL scheme.

http(s) URLs keep the chunked-range concurrent-download path. The
skywire RPC path returns the body in a single response, so range
splitting doesn't apply on those — a single GET is issued and the
body written to file (default name from URL path; `-o` overrides).
`head` on skywire URLs fakes HEAD by issuing GET and discarding the
body, since the visor RPC has no HEAD method.
Adds --min-hops uint16 (default 1, no minimum) to `cli proxy start`,
joining --mux / --mux-mode / --existing-tp / --local-route as
session config knobs that toggle visor-wide router state before
StartAppWithMode runs. Reuses the existing SetMinHops RPC; only
fires when the user explicitly set the flag (cmd.Flags().Changed),
so the visor's configured default isn't clobbered on every start.

Rejects --min-hops=0 with an error (the router treats 0 as
"routing disabled" and would refuse to dial).
Mounts the dmsgpty web terminal on the visor's port-80 dmsg/skynet
logserver alongside the existing /health, /node-info, /stats/*
endpoints. Access is gated by the same whitelist the dmsgpty Host
enforces on direct dmsg connections — configured Dmsgpty.Whitelist
plus the visor's hypervisor PKs plus its own PK — so any peer
already authorised to attach a pty over dmsg can also reach it
through a browser pointed at <pk>.dmsg/pty (or <pk>.skynet/pty when
the resolving proxy is in use).

The dialer prefers the local CLI socket when one is configured and
falls back to a self-loop through the visor's own dmsg client at
DmsgPtyPort otherwise — the same pattern the hypervisor uses for
its `/pty/{pk}` route. The /pty route is wired unconditionally; if
SetPtyHandler is never called or the whitelist is empty, the route
returns 404/403 (fail-closed for a high-power surface).

The landing page advertises /pty only to whitelisted peers — visible
to whoever already has the privilege, hidden from probing strangers.

dmsghttp_logserver gains a dependency on the pty module so /pty is
ready to dial the moment the listener accepts its first request.
Adds Visor.LocalUptimeStats(args), an RPC mirror of the logserver's
/stats/uptime payload reachable through the hypervisor's per-visor
proxy chain. Same bbolt store, same 288-slot 5-minute bitmap shape
(. = online, ' ' = offline) — the wire form is just a {tier: {date:
ascii}} map plus the requested window so renderers don't have to
re-derive it.

Backbone for the per-visor and network-wide Uptime tabs in the
hvui — UI side comes next. The handler accepts ?since= / ?until=
RFC3339 query params and falls back to a seven-day-ending-now
default that matches the logserver's defaultHistoryWindow.
NodeComponent.currentNode emits on every polling tick (~few seconds),
and the terminal tab was rebuilding its SafeResourceUrl on each
emission. Even when the URL string was unchanged, the new object
ref made Angular's change detection re-bind the iframe src, which
made the iframe navigate again and tore down the websocket — so the
shell session reset every few seconds.

Track the last PK we bound to and only rebuild the iframe URL when
the PK actually changes.
Two new tabs render the visor's tier-uptime bitmaps (process / dmsg /
skynet, 5-minute slot resolution) sourced from each visor's local
bbolt stats store via the new LocalUptimeStats RPC. No round-trip
through the standalone uptime-tracker service or TPD aggregates —
the operator gets the same five-minute online/offline intervals the
visor recorded itself.

* Per-visor tab: stacked tier rows, each row showing 288 cells per
  day (one per slot, online green / offline red) plus a daily
  uptime percentage and a tier-window average. Window selector
  switches between today / 7d / 30d.

* Network tab: one row per visor known to the hypervisor (online or
  offline) with the per-tier window percentages and today's process
  bar inline. Filter toggle "connected only" / "all known" — offline
  visors render dimmed with no data, since the hypervisor can't
  reach the per-visor bbolt store while the RPC session is down.

The connection between the two: each cell on the network row links
into the per-visor tab via the visor PK column, so spotting a
sagging tier on the fleet view drills into its full timeline.
Reshaped the per-visor Uptime tab so the day is the primary axis,
not the tier:

* Newest day on top, older below.
* Each day block has a date header and three tier ribbons inside
  (process / dmsg / skynet) sitting one above the other so the
  reader can scan a single day's full picture in one glance instead
  of jumping between tier sections.
* A small "Window avg" strip up top keeps the per-tier roll-ups
  visible without making the operator do mental math down the page.

Future slots are no longer conflated with offline. Today's row gets
a third cell state — `future` — rendered as a hatched neutral band
instead of red, so an empty bar reads as "not yet" rather than
"down". Window-average percentages exclude future slots from the
denominator so today's incomplete day doesn't anchor the visor at
50% just because we're noon UTC.

Also: include /uptime in NodeComponent's URL test for the per-visor
tab strip — without it the tabsData fell into the empty branch and
the rest of the strip vanished when the user navigated to /uptime.
Adds an outbound CXO feed on TPD that mirrors `/uptimes?v=v3` —
`[]VisorSummary` with per-day timeline strings — and a lazy
on-demand subscriber on the visor that drives the hvui Network
Uptime tab. Same shape as the existing metrics feed.

* TPD `pkg/transport-discovery/api/cxo_uptime_publisher.go`:
  recomputes the v3 list every 60s for windows {1, 7, 30} and
  publishes to `uptimes/days/<n>` on skyenv.DmsgTPDUptimeCXOPort.
  Trims Daily/Timeline to the requested window so each bucket
  carries only what the subscriber needs.

* Visor `pkg/visor/api_tpd_uptime_subscriber.go`: lazy subscriber
  spun up the first time `/api/network/visor-uptime` is hit.
  Sticks around for the visor's lifetime so subsequent reads are
  local memory instead of a fresh dial-and-tear-down per minute.
  Closed alongside tpdMetricsSub in the Visor close path.

* Hypervisor handler chain: CXO subscriber → DMSG-HTTP → plain
  HTTP. The strategy is reported in `X-Skywire-Uptime-Source` so
  the UI / debug tooling can surface which path served the
  response.

* hvui Network Uptime tab consumes `/network/visor-uptime` (one
  fetch instead of fan-out per connected visor) and renders today's
  timeline directly from the v3 bitmap. Filter toggle "all known"
  vs "connected only" — connected rows are detected by intersecting
  with the hypervisor's nodes list, so PKs the hypervisor doesn't
  manage still surface but only link out for the managed ones.
  Future cells use the hatched-neutral style so a partial today
  doesn't read as downtime.
Restyles /nodes/uptime to one row per visor with a single
concatenated multi-day timeline, rendered as 24 hour blocks per day
shaded by online-slot density (1–3 / 4–6 / 7–9 / 10–12 → faint to
solid green; 0 → red). Mirrors the layout the CLI \`ut tpd graph\`
prints in its default mode, just on a web grid instead of unicode
block art so colour conveys density at a glance.

Hour blocks past "now" on today's row use the future-cell style
(hatched neutral) — explicitly distinct from offline so a partial
today doesn't read as downtime. Shared leading and trailing empty
columns are trimmed globally so all bars line up; tick markers
above the grid label day boundaries when the window spans more
than two days.

Other changes in this pass:

* Loading text was "Polling visors…" — wrong since the fetch is a
  single TPD-feed call now, not a per-visor fan-out. Updated to
  "Reading TPD uptime feed…".
* Added a 30s timeout on the network/visor-uptime fetch so a
  hung TPD (e.g. while it's panicking) surfaces an error instead
  of leaving the spinner spinning forever.
* PK column links into the per-visor /uptime tab when the row is
  hypervisor-managed; non-managed visors render plain text.
* Default the network Uptime filter to "connected" — the operator's
  own fleet is the primary lens; "all known" stays one click away
  for spotting visors TPD reports that this hypervisor doesn't
  manage.
* Bring back today's uptime percentage as a colour-coded badge in
  each row (green ≥99 / orange ≥80 / red below). Window-average
  stays on the right for the longer-term view; today gives the
  current-day signal at a glance.
* Larger online/offline dot + a soft glow on the online state so
  current-status reads from across the table without squinting.
Adds a rolling-24h uptime row per tier (process / dmsg / skynet) to
\`skywire cli visor info\` output, rendered with the same five-level
unicode density blocks (' ' ░ ▒ ▓ █) the \`cli ut tpd graph\` command
uses. Same source as the per-visor /uptime tab and /stats/uptime on
the logserver — pkg/visor/stats — so the bar shows what the
integrated tracker recorded for THIS visor, not the network-wide
TPD aggregate. Pcts on the right are computed from the same slots.

The window crosses UTC midnight cleanly: each block aggregates 12
five-minute slots, and slots earlier than the start of today's
bucket pull from yesterday's bitmap. Shading thresholds match
cliuptime.shadeForCount so an operator can eyeball the local view
next to a TPD-graph output and read both at once.

Best-effort: if LocalUptimeStats fails (rare; partial startup or
stats subsystem disabled) the section is skipped silently — the
rest of \`info\` is still useful.
Adds --shuffle (and --shuffle-seed for repro) to \`skywire cli ut
{tpd,sd,mdisc} graph\`. Renders rows in random order so an operator
can sanity-check whether the banding patterns they see in the
PK-sorted graph are PK-correlated (real visor-uptime structure that
travels with the rows) or are the eye chunking long runs of a
high-density distribution (banding will disperse under shuffle).

Verbose mode prints the seed to stderr so suspicious shuffles can
be re-played. Default seed is time-based; pass --shuffle-seed=N to
get a deterministic order.
The probe already used countLiveTransports >= 2, but the docstring
and the hvui tooltip described it as a "skynet connectivity probe"
— misleading wording given the actual semantics. Update both so a
reader knows the local tier matches the same "skynet online"
definition TPD uses on its side: a visor counts as skynet-up only
when it has at least two transports, i.e. when it can actually be
routed through.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant