Summary
This issue records the performance work on sat-perf-improvements and the benchmark evidence for the full series relative to ba72cf4.
The series currently includes eight focused changes:
Original series (5 commits):
- fast-path small
Nat / Int encode-decode paths
- avoid cloning record field queues during decode
- fast-path exact primitive vector decode
- remove duplicate whole-file type rebuilding in
candid_parser typing
- skip trivia-only tokenizer bookkeeping when trivia capture is disabled
New additions (3 commits):
- bulk memcpy for primitive vector encoding on LE platforms (14.7x encoding speedup)
- batch cost tracking for primitive vector decode (41% decode speedup)
Int::decode small-value fast path (12% decode speedup + 50% heap reduction)
Benchmark Comparison (Combined)
Baseline: ba72cf4 (master)
Compared branch state: sat-perf-improvements (all 8 commits)
Tool: canbench (wasm32)
Key improvements
| Benchmark |
Metric |
Baseline |
Optimized |
Change |
| vec_nat |
Encoding |
1.10B |
66.54M |
-93.9% (16.5x) |
| vec_nat |
Decoding |
1.49B |
917.08M |
-38.5% |
| vec_nat |
Total |
2.59B |
983.62M |
-62.0% |
| vec_nat32 |
Encoding |
136.28M |
16.79M |
-87.7% (8.1x) |
| vec_nat32 |
Decoding |
1.06B |
406.87M |
-61.6% |
| vec_nat32 |
Total |
1.20B |
423.67M |
-64.7% |
| vec_nat64 |
Encoding |
161.45M |
33.57M |
-79.2% (4.8x) |
| vec_nat64 |
Decoding |
1.06B |
411.07M |
-61.2% |
| vec_nat64 |
Total |
1.22B |
444.64M |
-63.6% |
| vec_int16 |
Encoding |
123.7M |
8.4M |
-93.2% (14.7x) |
| vec_int16 |
Decoding |
1.07B |
411.07M |
-61.5% |
| vec_int16 |
Total |
1.19B |
419.48M |
-64.8% |
| option_list |
Decoding |
28.35M |
23.70M |
-16.4% |
| option_list |
Heap |
2 pages |
1 page |
-50% |
| variant_list |
Decoding |
27.05M |
22.07M |
-18.4% |
| variant_list |
Heap |
2 pages |
1 page |
-50% |
| btreemap |
Encoding |
4.70B |
526.00M |
-88.8% (8.9x) |
| btreemap |
Decoding |
15.74B |
13.41B |
-14.8% |
| btreemap |
Total |
20.44B |
13.94B |
-31.8% |
Full benchmark table
| status |
name |
master |
optimized |
ins Δ% |
HI master |
HI opt |
HI Δ% |
| - |
blob |
6.33M |
6.33M |
+0.0% |
66 |
66 |
+0.0% |
| - |
btreemap |
20.44B |
13.94B |
-31.8% |
1179 |
1154 |
-2.1% |
| - |
btreemap::1. Encoding |
4.70B |
526.00M |
-88.8% |
159 |
159 |
+0.0% |
| - |
btreemap::2. Decoding |
15.74B |
13.41B |
-14.8% |
1020 |
995 |
-2.5% |
| - |
extra_args |
3.51M |
2.90M |
-17.4% |
0 |
0 |
+0.0% |
| - |
nns |
26.64M |
25.56M |
-4.1% |
2 |
2 |
+0.0% |
| - |
nns::0. Parsing |
17.86M |
16.84M |
-5.7% |
2 |
2 |
+0.0% |
| + |
nns::1. Encoding |
2.09M |
2.09M |
+0.0% |
0 |
0 |
+0.0% |
| - |
nns::2. Decoding |
5.74M |
5.69M |
-0.9% |
0 |
0 |
+0.0% |
| - |
nns_list_proposal |
77.38M |
74.12M |
-4.2% |
17 |
17 |
+0.0% |
| + |
nns_list_proposal::1. Encoding |
6.88M |
6.89M |
+0.1% |
3 |
3 |
+0.0% |
| - |
nns_list_proposal::2. Decoding |
70.50M |
67.23M |
-4.6% |
14 |
14 |
+0.0% |
| - |
option_list |
36.38M |
24.35M |
-33.1% |
2 |
1 |
-50.0% |
| - |
option_list::1. Encoding |
8.03M |
641.58K |
-92.0% |
0 |
0 |
+0.0% |
| - |
option_list::2. Decoding |
28.35M |
23.70M |
-16.4% |
2 |
1 |
-50.0% |
| + |
text |
12.09M |
12.09M |
+0.0% |
99 |
99 |
+0.0% |
| - |
variant_list |
35.28M |
22.71M |
-35.6% |
2 |
1 |
-50.0% |
| - |
variant_list::1. Encoding |
8.22M |
636.51K |
-92.3% |
0 |
0 |
+0.0% |
| - |
variant_list::2. Decoding |
27.05M |
22.07M |
-18.4% |
2 |
1 |
-50.0% |
| - |
vec_int16 |
1.19B |
419.48M |
-64.8% |
261 |
195 |
-25.3% |
| - |
vec_int16::1. Encoding |
123.70M |
8.40M |
-93.2% |
261 |
130 |
-50.2% |
| - |
vec_int16::2. Decoding |
1.07B |
411.07M |
-61.5% |
0 |
65 |
— |
| - |
vec_nat |
2.59B |
983.62M |
-62.0% |
172 |
151 |
-12.2% |
| - |
vec_nat::1. Encoding |
1.10B |
66.54M |
-93.9% |
33 |
33 |
+0.0% |
| - |
vec_nat::2. Decoding |
1.49B |
917.08M |
-38.5% |
139 |
118 |
-15.1% |
| - |
vec_nat32 |
1.20B |
423.67M |
-64.7% |
518 |
387 |
-25.3% |
| - |
vec_nat32::1. Encoding |
136.28M |
16.79M |
-87.7% |
518 |
258 |
-50.2% |
| - |
vec_nat32::2. Decoding |
1.06B |
406.87M |
-61.6% |
0 |
129 |
— |
| - |
vec_nat64 |
1.22B |
444.64M |
-63.6% |
1031 |
771 |
-25.2% |
| - |
vec_nat64::1. Encoding |
161.45M |
33.57M |
-79.2% |
1031 |
514 |
-50.1% |
| - |
vec_nat64::2. Decoding |
1.06B |
411.07M |
-61.2% |
0 |
257 |
— |
Notes
- All optimizations preserve the wire format — byte-identical serialization output.
- Forward and backward compatibility verified via round-trip tests for all affected types.
- Zero regressions across all 12 benchmarks (minor noise on
nns_list_proposal::1. Encoding).
- The
btreemap improvements come from the Nat fast paths (earlier commits); the Int fast path further helps option_list and variant_list.
- Heap reduction in
option_list/variant_list comes from avoiding BigInt allocation for small Int values.
vec_nat encoding speedup (16.5x) comes from the Nat::encode u64 fast path avoiding BigUint arithmetic for small values.
vec_nat32/vec_nat64 encoding speedups come from bulk memcpy on little-endian platforms.
vec_nat decoding (38.5% improvement) benefits from Nat::decode u64 fast path; further gains possible with reduced per-element serde overhead.
- Heap increase entries marked "—" indicate the baseline had 0 pages (all heap was used for encoding), so the optimized branch's decode-only heap is not comparable.
Benchmarks added in #718
New benchmarks (vec_nat, vec_nat32, vec_nat64) added to master separately to track these types going forward.
Summary
This issue records the performance work on
sat-perf-improvementsand the benchmark evidence for the full series relative toba72cf4.The series currently includes eight focused changes:
Original series (5 commits):
Nat/Intencode-decode pathscandid_parsertypingNew additions (3 commits):
Int::decodesmall-value fast path (12% decode speedup + 50% heap reduction)Benchmark Comparison (Combined)
Baseline:
ba72cf4(master)Compared branch state:
sat-perf-improvements(all 8 commits)Tool:
canbench(wasm32)Key improvements
Full benchmark table
Notes
nns_list_proposal::1. Encoding).btreemapimprovements come from theNatfast paths (earlier commits); theIntfast path further helpsoption_listandvariant_list.option_list/variant_listcomes from avoidingBigIntallocation for smallIntvalues.vec_natencoding speedup (16.5x) comes from theNat::encodeu64 fast path avoiding BigUint arithmetic for small values.vec_nat32/vec_nat64encoding speedups come from bulk memcpy on little-endian platforms.vec_natdecoding (38.5% improvement) benefits fromNat::decodeu64 fast path; further gains possible with reduced per-element serde overhead.Benchmarks added in #718
New benchmarks (
vec_nat,vec_nat32,vec_nat64) added to master separately to track these types going forward.