perf(tpd): no-latency variant for mirrorEdges + concat redis-key builders by 0pcom · Pull Request #2430 · skycoin/skywire

0pcom · 2026-05-04T22:06:37Z

Symptom

Even after #2429 (the GetBit-fan-out fix) deployed, TPD still pinned at 167% CPU with 75% of CPU in runtime GC. RestartCount 0 — busy CPU, no panic loop. `GetBit` is gone from the heap profile (the prior fix held), but a different allocator took over the top.

Diagnosis

Fresh 30s CPU profile post-#2429: same shape — `runtime.spanClass.sizeclass`, `scanObject`, `tryDeferToSpanScan` dominate, application code below the rounding error. Heap profile by alloc_objects:

| flat % | cum % | function |
|---|---|---|
| 16.35% | 16.35% | `encoding/hex.EncodeToString` |
| 13.86% | 30.00% | `json-iterator/go.(*textMarshalerEncoder).Encode` |
| 7.71% | 23.28% | `pkg/cipher.PubKey.MarshalText` |
| 3.02% | 32.38% | `(*redisStore).GetTransportsByEdge` |
| 1.60% | 10.48% | `(*redisStore).hydrateDurableLatency` |
| 1.56% | 4.62% | `(*redisStore).latencyKey` (`fmt.Sprintf`) |
| 1.47% | 4.43% | `(*redisStore).transportKey` (`fmt.Sprintf`) |

`pprof -peek hex.EncodeToString`: 99.24% of allocations from `cipher.PubKey.Hex`.
`pprof -peek GetTransportsByEdge`: 97.68% of invocations from `mirrorEdges` — fires on every transport register / delete to re-publish the edge's full transport list to the DHT.

Root cause

`mirrorEdges` walks every touched edge → `GetTransportsByEdge(edge)` → which:

Cache miss → MGET `tp:` for each id, JSON-decode each `TransportData`.
Always runs `hydrateDurableLatency` — another per-call MGET on the `lat:` keyspace + N JSON decodes for the latency overlay (added by transport-discovery: persist latency in a dedicated key, decoupled from registration TTL #2418 for HTTP responses that need fresh latency).
Caches the result.

The DHT consumer doesn't read the Latency field — that work was pure overhead on the dominant call path. Plus every step uses `fmt.Sprintf` to build redis keys; with a few hundred mirror cycles/sec, those format-state-machine allocations add up.

Fix

### 1. No-latency variant for callers that don't need the overlay

Split `GetTransportsByEdge` into:

`getTransportsByEdge` (unexported) — fetch + decode, no hydrate, no cache write.
`GetTransportsByEdge` — calls core + hydrate + cache. Existing HTTP behavior unchanged.
`GetTransportsByEdgeNoLatency` — cache-only read of the no-latency core. Skips the per-call MGET on `lat:*` plus N JSON decodes.

Switch `mirrorEdges` to `GetTransportsByEdgeNoLatency`. Saves ~10% of total alloc_objects on the hot path.

Added to the `Store` interface; memory store + three test/mock stubs (rpcgrpc, cli/route/calc, route-finder/store) gain the method as a thin alias. No production behavior change for callers that still want fresh latency in the response.

### 2. `fmt.Sprintf` → string concat in five hot redis-key builders

`transportKey`, `latencyKey`, `edgeKey`, `allTpsIndexKey`, and the inline format in `allTransportKeysFromIndex` all built keys with `fmt.Sprintf("%s:tp:%s", serviceName, id)` — clear but expensive (~10% of TPD's total alloc_objects). Plain concat compiles to a single buffer write and skips the format-state-machine allocations:

```go
// before
return fmt.Sprintf("%s:tp:%s", serviceName, id.String())
// after
return serviceName + ":tp:" + id.String()
```

Same key strings, same wire shape, no behavior change.

Test plan

`go build ./...` clean.
`go vet ./...` clean.
`go test ./pkg/transport-discovery/...` all pass (redisStore + memoryStore).
`gofmt` clean.
Post-deploy: TPD CPU drops below ~50% (or whatever the new baseline is); `runtime.spanClass.sizeclass` no longer dominates the CPU profile; `/transports/edge:` HTTP responses still carry latency in their JSON (HTTP path unchanged); `mirrorEdges` no longer shows `hydrateDurableLatency` in its alloc trace.

Sequencing

Layered on top of #2429 conceptually but doesn't depend on its branch — both touch `pkg/transport-discovery/store` but in different files (#2429 in `redis_uptime.go`, this one in `redis_transport.go` / `store.go` / `memory_store.go` / `api/api.go`). Either can merge first.

Diff stat

```
7 files changed, +76 / -14
```

…ders After skycoin#2429 collapsed the GetBit fan-out, fresh prod CPU profile still shows TPD at 167% with 75% in runtime GC. New top allocators: 16.35% flat encoding/hex.EncodeToString (99% from PubKey.Hex) 13.86% flat json-iterator textMarshalerEncoder.Encode 10.48% cum (*redisStore).hydrateDurableLatency 32.38% cum GetTransportsByEdge 4.62% cum (*redisStore).latencyKey (fmt.Sprintf) 4.43% cum (*redisStore).transportKey (fmt.Sprintf) pprof -peek traced 97.7% of GetTransportsByEdge alloc to mirrorEdges, which fires on every transport register/delete to re-publish the edge's full transport list to the DHT. The DHT consumer doesn't read the Latency field — but the call always paid for hydrateDurableLatency's per-call MGET on the lat:<id> keyspace plus N JSON decodes. Two changes: 1. Split GetTransportsByEdge into a fetch-only core (getTransportsByEdge, unexported) and two public wrappers: GetTransportsByEdge (current behavior, calls hydrate + edgeCache.Put) and GetTransportsByEdgeNoLatency (cache-only read, no MGET on lat:*). mirrorEdges switches to the no-latency variant. Interface, memory store, and three test/mock stubs (rpcgrpc, cli/route/calc, route-finder/store) gain the new method as a thin alias. 2. Replace fmt.Sprintf with string concat in five hot key builders (transportKey, latencyKey, edgeKey, allTpsIndexKey, the inline format in allTransportKeysFromIndex). Sprintf was ~10% of TPD's total alloc_objects in pprof; concat compiles to a single buffer write and avoids the format-state machinery. No behavior change — same key strings, same wire shape. Existing tests (redisStore + memoryStore) pass.

0pcom force-pushed the perf/tpd-edge-mirror-no-latency branch from c73a69a to 495d626 Compare May 4, 2026 22:07

0pcom merged commit 47ed5b0 into skycoin:develop May 4, 2026
3 of 4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf(tpd): no-latency variant for mirrorEdges + concat redis-key builders#2430

perf(tpd): no-latency variant for mirrorEdges + concat redis-key builders#2430
0pcom merged 1 commit intoskycoin:developfrom
0pcom:perf/tpd-edge-mirror-no-latency

0pcom commented May 4, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

0pcom commented May 4, 2026

Symptom

Diagnosis

Root cause

Fix

Test plan

Sequencing

Diff stat

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant