perf(tpd): no-latency variant for mirrorEdges + concat redis-key builders#2430
Merged
0pcom merged 1 commit intoskycoin:developfrom May 4, 2026
Merged
Conversation
…ders After skycoin#2429 collapsed the GetBit fan-out, fresh prod CPU profile still shows TPD at 167% with 75% in runtime GC. New top allocators: 16.35% flat encoding/hex.EncodeToString (99% from PubKey.Hex) 13.86% flat json-iterator textMarshalerEncoder.Encode 10.48% cum (*redisStore).hydrateDurableLatency 32.38% cum GetTransportsByEdge 4.62% cum (*redisStore).latencyKey (fmt.Sprintf) 4.43% cum (*redisStore).transportKey (fmt.Sprintf) pprof -peek traced 97.7% of GetTransportsByEdge alloc to mirrorEdges, which fires on every transport register/delete to re-publish the edge's full transport list to the DHT. The DHT consumer doesn't read the Latency field — but the call always paid for hydrateDurableLatency's per-call MGET on the lat:<id> keyspace plus N JSON decodes. Two changes: 1. Split GetTransportsByEdge into a fetch-only core (getTransportsByEdge, unexported) and two public wrappers: GetTransportsByEdge (current behavior, calls hydrate + edgeCache.Put) and GetTransportsByEdgeNoLatency (cache-only read, no MGET on lat:*). mirrorEdges switches to the no-latency variant. Interface, memory store, and three test/mock stubs (rpcgrpc, cli/route/calc, route-finder/store) gain the new method as a thin alias. 2. Replace fmt.Sprintf with string concat in five hot key builders (transportKey, latencyKey, edgeKey, allTpsIndexKey, the inline format in allTransportKeysFromIndex). Sprintf was ~10% of TPD's total alloc_objects in pprof; concat compiles to a single buffer write and avoids the format-state machinery. No behavior change — same key strings, same wire shape. Existing tests (redisStore + memoryStore) pass.
c73a69a to
495d626
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Symptom
Even after #2429 (the GetBit-fan-out fix) deployed, TPD still pinned at 167% CPU with 75% of CPU in runtime GC. RestartCount 0 — busy CPU, no panic loop. `GetBit` is gone from the heap profile (the prior fix held), but a different allocator took over the top.
Diagnosis
Fresh 30s CPU profile post-#2429: same shape — `runtime.spanClass.sizeclass`, `scanObject`, `tryDeferToSpanScan` dominate, application code below the rounding error. Heap profile by alloc_objects:
| flat % | cum % | function |
|---|---|---|
| 16.35% | 16.35% | `encoding/hex.EncodeToString` |
| 13.86% | 30.00% | `json-iterator/go.(*textMarshalerEncoder).Encode` |
| 7.71% | 23.28% | `pkg/cipher.PubKey.MarshalText` |
| 3.02% | 32.38% | `(*redisStore).GetTransportsByEdge` |
| 1.60% | 10.48% | `(*redisStore).hydrateDurableLatency` |
| 1.56% | 4.62% | `(*redisStore).latencyKey` (`fmt.Sprintf`) |
| 1.47% | 4.43% | `(*redisStore).transportKey` (`fmt.Sprintf`) |
`pprof -peek hex.EncodeToString`: 99.24% of allocations from `cipher.PubKey.Hex`.
`pprof -peek GetTransportsByEdge`: 97.68% of invocations from `mirrorEdges` — fires on every transport register / delete to re-publish the edge's full transport list to the DHT.
Root cause
`mirrorEdges` walks every touched edge → `GetTransportsByEdge(edge)` → which:
The DHT consumer doesn't read the Latency field — that work was pure overhead on the dominant call path. Plus every step uses `fmt.Sprintf` to build redis keys; with a few hundred mirror cycles/sec, those format-state-machine allocations add up.
Fix
### 1. No-latency variant for callers that don't need the overlay
Split `GetTransportsByEdge` into:
Switch `mirrorEdges` to `GetTransportsByEdgeNoLatency`. Saves ~10% of total alloc_objects on the hot path.
Added to the `Store` interface; memory store + three test/mock stubs (rpcgrpc, cli/route/calc, route-finder/store) gain the method as a thin alias. No production behavior change for callers that still want fresh latency in the response.
### 2. `fmt.Sprintf` → string concat in five hot redis-key builders
`transportKey`, `latencyKey`, `edgeKey`, `allTpsIndexKey`, and the inline format in `allTransportKeysFromIndex` all built keys with `fmt.Sprintf("%s:tp:%s", serviceName, id)` — clear but expensive (~10% of TPD's total alloc_objects). Plain concat compiles to a single buffer write and skips the format-state-machine allocations:
```go
// before
return fmt.Sprintf("%s:tp:%s", serviceName, id.String())
// after
return serviceName + ":tp:" + id.String()
```
Same key strings, same wire shape, no behavior change.
Test plan
Sequencing
Layered on top of #2429 conceptually but doesn't depend on its branch — both touch `pkg/transport-discovery/store` but in different files (#2429 in `redis_uptime.go`, this one in `redis_transport.go` / `store.go` / `memory_store.go` / `api/api.go`). Either can merge first.
Diff stat
```
7 files changed, +76 / -14
```