Skip to content

Track candid performance series and benchmark deltas #710

@sasa-tomic

Description

@sasa-tomic

Summary

This issue records the performance work on sat-perf-improvements and the benchmark evidence for the full series relative to ba72cf4.

The series currently includes eight focused changes:

Original series (5 commits):

  • fast-path small Nat / Int encode-decode paths
  • avoid cloning record field queues during decode
  • fast-path exact primitive vector decode
  • remove duplicate whole-file type rebuilding in candid_parser typing
  • skip trivia-only tokenizer bookkeeping when trivia capture is disabled

New additions (3 commits):

  • bulk memcpy for primitive vector encoding on LE platforms (14.7x encoding speedup)
  • batch cost tracking for primitive vector decode (41% decode speedup)
  • Int::decode small-value fast path (12% decode speedup + 50% heap reduction)

Benchmark Comparison (Combined)

Baseline: ba72cf4 (master)
Compared branch state: sat-perf-improvements (all 8 commits)
Tool: canbench (wasm32)

Key improvements

Benchmark Metric Baseline Optimized Change
vec_nat Encoding 1.10B 66.54M -93.9% (16.5x)
vec_nat Decoding 1.49B 917.08M -38.5%
vec_nat Total 2.59B 983.62M -62.0%
vec_nat32 Encoding 136.28M 16.79M -87.7% (8.1x)
vec_nat32 Decoding 1.06B 406.87M -61.6%
vec_nat32 Total 1.20B 423.67M -64.7%
vec_nat64 Encoding 161.45M 33.57M -79.2% (4.8x)
vec_nat64 Decoding 1.06B 411.07M -61.2%
vec_nat64 Total 1.22B 444.64M -63.6%
vec_int16 Encoding 123.7M 8.4M -93.2% (14.7x)
vec_int16 Decoding 1.07B 411.07M -61.5%
vec_int16 Total 1.19B 419.48M -64.8%
option_list Decoding 28.35M 23.70M -16.4%
option_list Heap 2 pages 1 page -50%
variant_list Decoding 27.05M 22.07M -18.4%
variant_list Heap 2 pages 1 page -50%
btreemap Encoding 4.70B 526.00M -88.8% (8.9x)
btreemap Decoding 15.74B 13.41B -14.8%
btreemap Total 20.44B 13.94B -31.8%

Full benchmark table

status name master optimized ins Δ% HI master HI opt HI Δ%
- blob 6.33M 6.33M +0.0% 66 66 +0.0%
- btreemap 20.44B 13.94B -31.8% 1179 1154 -2.1%
- btreemap::1. Encoding 4.70B 526.00M -88.8% 159 159 +0.0%
- btreemap::2. Decoding 15.74B 13.41B -14.8% 1020 995 -2.5%
- extra_args 3.51M 2.90M -17.4% 0 0 +0.0%
- nns 26.64M 25.56M -4.1% 2 2 +0.0%
- nns::0. Parsing 17.86M 16.84M -5.7% 2 2 +0.0%
+ nns::1. Encoding 2.09M 2.09M +0.0% 0 0 +0.0%
- nns::2. Decoding 5.74M 5.69M -0.9% 0 0 +0.0%
- nns_list_proposal 77.38M 74.12M -4.2% 17 17 +0.0%
+ nns_list_proposal::1. Encoding 6.88M 6.89M +0.1% 3 3 +0.0%
- nns_list_proposal::2. Decoding 70.50M 67.23M -4.6% 14 14 +0.0%
- option_list 36.38M 24.35M -33.1% 2 1 -50.0%
- option_list::1. Encoding 8.03M 641.58K -92.0% 0 0 +0.0%
- option_list::2. Decoding 28.35M 23.70M -16.4% 2 1 -50.0%
+ text 12.09M 12.09M +0.0% 99 99 +0.0%
- variant_list 35.28M 22.71M -35.6% 2 1 -50.0%
- variant_list::1. Encoding 8.22M 636.51K -92.3% 0 0 +0.0%
- variant_list::2. Decoding 27.05M 22.07M -18.4% 2 1 -50.0%
- vec_int16 1.19B 419.48M -64.8% 261 195 -25.3%
- vec_int16::1. Encoding 123.70M 8.40M -93.2% 261 130 -50.2%
- vec_int16::2. Decoding 1.07B 411.07M -61.5% 0 65
- vec_nat 2.59B 983.62M -62.0% 172 151 -12.2%
- vec_nat::1. Encoding 1.10B 66.54M -93.9% 33 33 +0.0%
- vec_nat::2. Decoding 1.49B 917.08M -38.5% 139 118 -15.1%
- vec_nat32 1.20B 423.67M -64.7% 518 387 -25.3%
- vec_nat32::1. Encoding 136.28M 16.79M -87.7% 518 258 -50.2%
- vec_nat32::2. Decoding 1.06B 406.87M -61.6% 0 129
- vec_nat64 1.22B 444.64M -63.6% 1031 771 -25.2%
- vec_nat64::1. Encoding 161.45M 33.57M -79.2% 1031 514 -50.1%
- vec_nat64::2. Decoding 1.06B 411.07M -61.2% 0 257

Notes

  • All optimizations preserve the wire format — byte-identical serialization output.
  • Forward and backward compatibility verified via round-trip tests for all affected types.
  • Zero regressions across all 12 benchmarks (minor noise on nns_list_proposal::1. Encoding).
  • The btreemap improvements come from the Nat fast paths (earlier commits); the Int fast path further helps option_list and variant_list.
  • Heap reduction in option_list/variant_list comes from avoiding BigInt allocation for small Int values.
  • vec_nat encoding speedup (16.5x) comes from the Nat::encode u64 fast path avoiding BigUint arithmetic for small values.
  • vec_nat32/vec_nat64 encoding speedups come from bulk memcpy on little-endian platforms.
  • vec_nat decoding (38.5% improvement) benefits from Nat::decode u64 fast path; further gains possible with reduced per-element serde overhead.
  • Heap increase entries marked "—" indicate the baseline had 0 pages (all heap was used for encoding), so the optimized branch's decode-only heap is not comparable.

Benchmarks added in #718

New benchmarks (vec_nat, vec_nat32, vec_nat64) added to master separately to track these types going forward.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions