Follow-up to PRI-6. The current drain path allocates per tick:
RingBufReader::drain() returns Vec<Vec<u8>> (one Vec per message plus the outer vec)
host::parse_batch allocates a Vec<UnifiedMessage>
serde_json::from_slice allocates Strings for every owned field in the deserialized types
At the documented 10k–20k msg/sec peak targets from docs/performance.md this will likely show up in profiles. Current implementation is correct and simple enough to ship for the first end-to-end milestone; optimization should be driven by actual benchmarks, not speculation.
Suggested direction (when benchmarks show it's needed)
-
Replace RingBufReader::drain() -> Vec<Vec<u8>> with a callback-style API:
pub fn drain_with<F: FnMut(&[u8])>(&mut self, f: F)
so the caller sees each message's bytes as a slice directly into the mapped region, with no per-message allocation.
-
Keep a caller-owned Vec<UnifiedMessage> scratch buffer that is cleared and refilled each tick instead of allocated fresh. Already trivial to add to the drain loop in lib.rs.
-
Investigate simd-json or sonic-rs for zero-copy parsing. Both are drop-in replacements that return owned types but with faster allocation paths.
-
If profiling shows allocation of the UnifiedMessage fields is the hot path (not likely — these are small strings), consider Cow<'a, str> or SmartString fields.
Acceptance
- Benchmark the current drain path at 20k msg/sec synthetic load (needs a benchmark harness, which is its own small task).
- Only then optimize. Do not land this speculatively.
Source
Flagged by Copilot review on PR #33.
Follow-up to PRI-6. The current drain path allocates per tick:
RingBufReader::drain()returnsVec<Vec<u8>>(oneVecper message plus the outer vec)host::parse_batchallocates aVec<UnifiedMessage>serde_json::from_sliceallocatesStrings for every owned field in the deserialized typesAt the documented 10k–20k msg/sec peak targets from
docs/performance.mdthis will likely show up in profiles. Current implementation is correct and simple enough to ship for the first end-to-end milestone; optimization should be driven by actual benchmarks, not speculation.Suggested direction (when benchmarks show it's needed)
Replace
RingBufReader::drain() -> Vec<Vec<u8>>with a callback-style API:so the caller sees each message's bytes as a slice directly into the mapped region, with no per-message allocation.
Keep a caller-owned
Vec<UnifiedMessage>scratch buffer that is cleared and refilled each tick instead of allocated fresh. Already trivial to add to the drain loop inlib.rs.Investigate
simd-jsonorsonic-rsfor zero-copy parsing. Both are drop-in replacements that return owned types but with faster allocation paths.If profiling shows allocation of the
UnifiedMessagefields is the hot path (not likely — these are small strings), considerCow<'a, str>orSmartStringfields.Acceptance
Source
Flagged by Copilot review on PR #33.