Skip to content

perf(tpd): no-latency variant for mirrorEdges + concat redis-key builders#2430

Merged
0pcom merged 1 commit intoskycoin:developfrom
0pcom:perf/tpd-edge-mirror-no-latency
May 4, 2026
Merged

perf(tpd): no-latency variant for mirrorEdges + concat redis-key builders#2430
0pcom merged 1 commit intoskycoin:developfrom
0pcom:perf/tpd-edge-mirror-no-latency

Conversation

@0pcom
Copy link
Copy Markdown
Collaborator

@0pcom 0pcom commented May 4, 2026

Symptom

Even after #2429 (the GetBit-fan-out fix) deployed, TPD still pinned at 167% CPU with 75% of CPU in runtime GC. RestartCount 0 — busy CPU, no panic loop. `GetBit` is gone from the heap profile (the prior fix held), but a different allocator took over the top.

Diagnosis

Fresh 30s CPU profile post-#2429: same shape — `runtime.spanClass.sizeclass`, `scanObject`, `tryDeferToSpanScan` dominate, application code below the rounding error. Heap profile by alloc_objects:

| flat % | cum % | function |
|---|---|---|
| 16.35% | 16.35% | `encoding/hex.EncodeToString` |
| 13.86% | 30.00% | `json-iterator/go.(*textMarshalerEncoder).Encode` |
| 7.71% | 23.28% | `pkg/cipher.PubKey.MarshalText` |
| 3.02% | 32.38% | `(*redisStore).GetTransportsByEdge` |
| 1.60% | 10.48% | `(*redisStore).hydrateDurableLatency` |
| 1.56% | 4.62% | `(*redisStore).latencyKey` (`fmt.Sprintf`) |
| 1.47% | 4.43% | `(*redisStore).transportKey` (`fmt.Sprintf`) |

`pprof -peek hex.EncodeToString`: 99.24% of allocations from `cipher.PubKey.Hex`.
`pprof -peek GetTransportsByEdge`: 97.68% of invocations from `mirrorEdges` — fires on every transport register / delete to re-publish the edge's full transport list to the DHT.

Root cause

`mirrorEdges` walks every touched edge → `GetTransportsByEdge(edge)` → which:

  1. Cache miss → MGET `tp:` for each id, JSON-decode each `TransportData`.
  2. Always runs `hydrateDurableLatency` — another per-call MGET on the `lat:` keyspace + N JSON decodes for the latency overlay (added by transport-discovery: persist latency in a dedicated key, decoupled from registration TTL #2418 for HTTP responses that need fresh latency).
  3. Caches the result.

The DHT consumer doesn't read the Latency field — that work was pure overhead on the dominant call path. Plus every step uses `fmt.Sprintf` to build redis keys; with a few hundred mirror cycles/sec, those format-state-machine allocations add up.

Fix

### 1. No-latency variant for callers that don't need the overlay

Split `GetTransportsByEdge` into:

  • `getTransportsByEdge` (unexported) — fetch + decode, no hydrate, no cache write.
  • `GetTransportsByEdge` — calls core + hydrate + cache. Existing HTTP behavior unchanged.
  • `GetTransportsByEdgeNoLatency` — cache-only read of the no-latency core. Skips the per-call MGET on `lat:*` plus N JSON decodes.

Switch `mirrorEdges` to `GetTransportsByEdgeNoLatency`. Saves ~10% of total alloc_objects on the hot path.

Added to the `Store` interface; memory store + three test/mock stubs (rpcgrpc, cli/route/calc, route-finder/store) gain the method as a thin alias. No production behavior change for callers that still want fresh latency in the response.

### 2. `fmt.Sprintf` → string concat in five hot redis-key builders

`transportKey`, `latencyKey`, `edgeKey`, `allTpsIndexKey`, and the inline format in `allTransportKeysFromIndex` all built keys with `fmt.Sprintf("%s:tp:%s", serviceName, id)` — clear but expensive (~10% of TPD's total alloc_objects). Plain concat compiles to a single buffer write and skips the format-state-machine allocations:

```go
// before
return fmt.Sprintf("%s:tp:%s", serviceName, id.String())
// after
return serviceName + ":tp:" + id.String()
```

Same key strings, same wire shape, no behavior change.

Test plan

  • `go build ./...` clean.
  • `go vet ./...` clean.
  • `go test ./pkg/transport-discovery/...` all pass (redisStore + memoryStore).
  • `gofmt` clean.
  • Post-deploy: TPD CPU drops below ~50% (or whatever the new baseline is); `runtime.spanClass.sizeclass` no longer dominates the CPU profile; `/transports/edge:` HTTP responses still carry latency in their JSON (HTTP path unchanged); `mirrorEdges` no longer shows `hydrateDurableLatency` in its alloc trace.

Sequencing

Layered on top of #2429 conceptually but doesn't depend on its branch — both touch `pkg/transport-discovery/store` but in different files (#2429 in `redis_uptime.go`, this one in `redis_transport.go` / `store.go` / `memory_store.go` / `api/api.go`). Either can merge first.

Diff stat

```
7 files changed, +76 / -14
```

…ders

After skycoin#2429 collapsed the GetBit fan-out, fresh prod CPU profile
still shows TPD at 167% with 75% in runtime GC. New top allocators:

  16.35% flat   encoding/hex.EncodeToString  (99% from PubKey.Hex)
  13.86% flat   json-iterator textMarshalerEncoder.Encode
  10.48% cum    (*redisStore).hydrateDurableLatency
  32.38% cum    GetTransportsByEdge
  4.62% cum     (*redisStore).latencyKey      (fmt.Sprintf)
  4.43% cum     (*redisStore).transportKey    (fmt.Sprintf)

pprof -peek traced 97.7% of GetTransportsByEdge alloc to
mirrorEdges, which fires on every transport register/delete to
re-publish the edge's full transport list to the DHT. The DHT
consumer doesn't read the Latency field — but the call always paid
for hydrateDurableLatency's per-call MGET on the lat:<id> keyspace
plus N JSON decodes.

Two changes:

1. Split GetTransportsByEdge into a fetch-only core
   (getTransportsByEdge, unexported) and two public wrappers:
   GetTransportsByEdge (current behavior, calls hydrate +
   edgeCache.Put) and GetTransportsByEdgeNoLatency (cache-only
   read, no MGET on lat:*). mirrorEdges switches to the no-latency
   variant. Interface, memory store, and three test/mock stubs
   (rpcgrpc, cli/route/calc, route-finder/store) gain the new
   method as a thin alias.

2. Replace fmt.Sprintf with string concat in five hot key
   builders (transportKey, latencyKey, edgeKey, allTpsIndexKey,
   the inline format in allTransportKeysFromIndex). Sprintf was
   ~10% of TPD's total alloc_objects in pprof; concat compiles to
   a single buffer write and avoids the format-state machinery.
   No behavior change — same key strings, same wire shape.

Existing tests (redisStore + memoryStore) pass.
@0pcom 0pcom force-pushed the perf/tpd-edge-mirror-no-latency branch from c73a69a to 495d626 Compare May 4, 2026 22:07
@0pcom 0pcom merged commit 47ed5b0 into skycoin:develop May 4, 2026
3 of 4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant