Summary
Design and run comprehensive benchmarks to compare the current pattern of heap allocations for byte[] (via new byte[n]) against alternatives using stackalloc and ArrayPool<byte> throughout the NLightning.Bolt11 assembly. Measure execution time, memory allocation, and CPU usage under realistic workloads (e.g., decoding BOLT11 invoices). If results show meaningful improvements, follow up with a PR to apply the preferred approach consistently across the project.
Motivation
new byte[n] causes heap allocations and GC pressure for transient buffers used in parsing/encoding BOLT11 invoices.
stackalloc can avoid heap allocations for small, short‑lived buffers but increases stack usage and requires Span<T>-based code. Which is mostly ok in the project.
ArrayPool<byte> can amortize buffer costs for larger or variable‑size buffers but adds rental/return complexity and potential for misuse.
- We need data to guide a project‑wide change to one of these strategies (or to stay as is).
Scope
- Benchmark the existing code paths that allocate temporary
byte[] buffers in NLightning.Bolt11.
- Compare three strategies:
- Baseline:
new byte[n] (status quo)
stackalloc (where possible – small, fixed‑size buffers)
ArrayPool<byte>.Shared.Rent/Return (for larger/variable sizes)
- Workloads: realistic scenarios such as full invoice decode (and optionally encode) across small, typical, and large inputs.
- Metrics: execution time, total allocations (bytes/GC count), and CPU usage.
Out of Scope
- Immediate refactoring across the codebase. That will be proposed only if benchmarks show a clear benefit.
Proposed Work
- Create a dedicated benchmark project under
benchmark/NLightning.Bolt11.Benchmarks using BenchmarkDotNet.
- Implement benchmarks that:
- Drive end‑to‑end decode of BOLT11 invoices with datasets representing common and worst‑case sizes.
- Include micro-benchmarks for the most allocation‑heavy routines (e.g., bit readers/writers, tagged field parsing, bech32 operations) to isolate buffer behavior.
- Provide three variants for each benchmarked routine:
- Baseline (
new byte[n])
stackalloc (guard with size thresholds and safe spans)
ArrayPool<byte>
- Collect metrics:
- BenchmarkDotNet’s standard stats (Mean, P95, StdDev)
- Allocated bytes, Gen0/1/2 counts
- Optional CPU sampling/tracing corroboration using external tools
- Document results and provide a recommendation (strategy/thresholds). If beneficial, open a follow‑up PR to apply the chosen strategy consistently.
Methodology & Metrics
- Use BenchmarkDotNet with
Release builds, RunStrategy.Monitoring, and GcForce disabled to reflect realistic GC.
- Configure multiple input sizes:
- Small invoices
- Typical invoices
- Large invoices (many tagged fields, long route info)
- Metrics from BDN:
Mean, Error, StdDev, Median
Allocated (bytes), Gen0/1/2
- External validation (optional but recommended):
- CPU:
dotnet-trace + Speedscope/PerfView
- Counters:
dotnet-counters for GC (alloc rate, GC count)
- Ensure warmup and multiple iterations; include environment info in the report (TFM, OS, CPU model, .NET version).
Benchmark Project Structure
benchmark/NLightning.Bolt11.Benchmarks/ (new project)
- References
src/NLightning.Bolt11
- Contains:
DecodeInvoiceBenchmarks.cs (end‑to‑end scenarios)
BufferStrategyBenchmarks.cs (microbenchmarks comparing allocation strategies)
TestData/ with representative invoice samples
README.md with instructions to run and interpret results
Example BenchmarkDotNet Template
using System;
using System.Buffers;
using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Jobs;
[SimpleJob(RuntimeMoniker.Net80, warmupCount: 3, iterationCount: 15)]
[MemoryDiagnoser]
public class BufferStrategyBenchmarks
{
[Params(16, 64, 256, 1024, 4096)]
public int N;
[Benchmark(Baseline = true)]
public int Baseline_NewArray()
{
var buf = new byte[N];
return Touch(buf);
}
[Benchmark]
public int Stackalloc_WhenSmall()
{
if (N <= 256)
{
Span<byte> span = stackalloc byte[N];
return Touch(span);
}
else
{
var buf = new byte[N];
return Touch(buf);
}
}
[Benchmark]
public int ArrayPool_RentReturn()
{
var pool = ArrayPool<byte>.Shared;
var buf = pool.Rent(N);
try { return Touch(buf.AsSpan(0, N)); }
finally { pool.Return(buf, clearArray: false); }
}
private static int Touch(Span<byte> s)
{
int x = 0;
for (int i = 0; i < s.Length; i++)
x ^= i;
return x;
}
}
Datasets
- Curate a set of real‑world and synthetic BOLT11 invoice samples:
- Minimal invoices
- Typical invoices (median size from your logs/fixtures)
- Stress invoices (max fields, large route info, long descriptions)
- Reuse samples from existing tests under
test/NLightning.Bolt11.Tests and test/NLightning.Integration.Tests where possible.
Tooling (suggested) — links
Risks / Considerations
stackalloc only for small buffers; large stack allocations risk stack overflow.
ArrayPool<byte> requires careful zeroing policy and correct Return usage to avoid data leakage and correctness issues.
- Some APIs may need
Span<T> overloads; refactoring effort should be considered in the follow‑up PR.
- Ensure benchmarks aren’t over‑optimized by the JIT; vary inputs and prevent dead‑code elimination.
Acceptance Criteria
Deliverables
- Benchmark project + source.
- Benchmark results (Markdown/CSV) checked into
benchmark/results/ with date and environment metadata.
- Recommendation summary and next steps.
How to Run
- From repo root:
dotnet build -c Release
dotnet run -c Release --project benchmark/NLightning.Bolt11.Benchmarks
- Optional:
dotnet-counters monitor System.Runtime -- dotnet run ...
- Optional:
dotnet-trace collect -- dotnet run ... and analyze with PerfView/Speedscope.
Summary
Design and run comprehensive benchmarks to compare the current pattern of heap allocations for
byte[](vianew byte[n]) against alternatives usingstackallocandArrayPool<byte>throughout theNLightning.Bolt11assembly. Measure execution time, memory allocation, and CPU usage under realistic workloads (e.g., decoding BOLT11 invoices). If results show meaningful improvements, follow up with a PR to apply the preferred approach consistently across the project.Motivation
new byte[n]causes heap allocations and GC pressure for transient buffers used in parsing/encoding BOLT11 invoices.stackalloccan avoid heap allocations for small, short‑lived buffers but increases stack usage and requiresSpan<T>-based code. Which is mostly ok in the project.ArrayPool<byte>can amortize buffer costs for larger or variable‑size buffers but adds rental/return complexity and potential for misuse.Scope
byte[]buffers inNLightning.Bolt11.new byte[n](status quo)stackalloc(where possible – small, fixed‑size buffers)ArrayPool<byte>.Shared.Rent/Return(for larger/variable sizes)Out of Scope
Proposed Work
benchmark/NLightning.Bolt11.Benchmarksusing BenchmarkDotNet.new byte[n])stackalloc(guard with size thresholds and safe spans)ArrayPool<byte>Methodology & Metrics
Releasebuilds,RunStrategy.Monitoring, andGcForcedisabled to reflect realistic GC.Mean,Error,StdDev,MedianAllocated(bytes),Gen0/1/2dotnet-trace+Speedscope/PerfViewdotnet-countersfor GC (alloc rate, GC count)Benchmark Project Structure
benchmark/NLightning.Bolt11.Benchmarks/(new project)src/NLightning.Bolt11DecodeInvoiceBenchmarks.cs(end‑to‑end scenarios)BufferStrategyBenchmarks.cs(microbenchmarks comparing allocation strategies)TestData/with representative invoice samplesREADME.mdwith instructions to run and interpret resultsExample BenchmarkDotNet Template
Datasets
test/NLightning.Bolt11.Testsandtest/NLightning.Integration.Testswhere possible.Tooling (suggested) — links
Risks / Considerations
stackalloconly for small buffers; large stack allocations risk stack overflow.ArrayPool<byte>requires careful zeroing policy and correctReturnusage to avoid data leakage and correctness issues.Span<T>overloads; refactoring effort should be considered in the follow‑up PR.Acceptance Criteria
benchmark/that can be run locally and in CI (optional) for reproducible results.new, switch tostackallocfor ≤X bytes, or preferArrayPool<byte>beyond a threshold (or hybrid).NLightning.Bolt11.Deliverables
benchmark/results/with date and environment metadata.How to Run
dotnet build -c Releasedotnet run -c Release --project benchmark/NLightning.Bolt11.Benchmarksdotnet-counters monitor System.Runtime -- dotnet run ...dotnet-trace collect -- dotnet run ...and analyze with PerfView/Speedscope.