Skip to content

Conversation

@sayrer
Copy link
Contributor

@sayrer sayrer commented Jan 25, 2026

Instead of allocating a new []ArrayPos slice for each array element, use a single growing buffer (arrayPosBuffer) and slice into it. This reduces allocations significantly for JSON with many array elements.

Before:

goos: darwin
goarch: arm64
pkg: quamina.net/go/quamina
cpu: Apple M1 Ultra
BenchmarkCityLots-20    	 1345388	      4185 ns/op	     646 B/op	      37 allocs/op

After:

goos: darwin
goarch: arm64
pkg: quamina.net/go/quamina
cpu: Apple M1 Ultra
BenchmarkCityLots-20    	 1430053	      3870 ns/op	     198 B/op	      18 allocs/op

Instead of allocating a new []ArrayPos slice for each array element,
use a single growing buffer (arrayPosBuffer) and slice into it. This
reduces allocations significantly for JSON with many array elements.
@sayrer sayrer mentioned this pull request Jan 26, 2026
@sayrer
Copy link
Contributor Author

sayrer commented Jan 27, 2026

This does show up in Benchmark8259Example (see #470), but it's smaller than two other patches that should go first.

@timbray
Copy link
Owner

timbray commented Jan 27, 2026

Indeed, this speeds up on BenchmarkCityLots vs main as of #477, before/after below.

BenchmarkCityLots-12    	  305420	      6076 ns/op	    1078 B/op	      61 allocs/op
BenchmarkCityLots-12    	  325438	      5555 ns/op	     331 B/op	      29 allocs/op

But with the 8259 benchmark, it's a wash, just barely slower with more memory allocation

* main
113857/sec
Benchmark8259Example-12    	  147308	      8783 ns/op	     541 B/op	      17 allocs/op
115087/sec
Benchmark8259Example-12    	  149856	      8689 ns/op	     536 B/op	      17 allocs/op
114380/sec
Benchmark8259Example-12    	  147616	      8743 ns/op	     540 B/op	      17 allocs/op
117058/sec
Benchmark8259Example-12    	  150626	      8543 ns/op	     534 B/op	      17 allocs/op
115136/sec
Benchmark8259Example-12    	  146373	      8685 ns/op	     542 B/op	      17 allocs/op
M	citylots_bench_test.go
Switched to branch 'pr/batch-arraypos-allocation'
* pr/batch-arraypos-allocation
114115/sec
Benchmark8259Example-12    	  145053	      8763 ns/op	     605 B/op	      18 allocs/op
110725/sec
Benchmark8259Example-12    	  143186	      9031 ns/op	     609 B/op	      18 allocs/op
113459/sec
Benchmark8259Example-12    	  149449	      8814 ns/op	     596 B/op	      18 allocs/op
113601/sec
Benchmark8259Example-12    	  143335	      8803 ns/op	     608 B/op	      18 allocs/op
115121/sec
Benchmark8259Example-12    	  148314	      8687 ns/op	     599 B/op	      18 allocs/op

I'll look at the code now. (The epsilon closure one is much juicier.)

@sayrer
Copy link
Contributor Author

sayrer commented Jan 27, 2026

It won't be visible in Benchmark8259Example without the epsilon closure one, just noise.

@timbray
Copy link
Owner

timbray commented Jan 27, 2026

Ehh, harmless. And I can imagine scenarios where you have wildly heterogeneous events and this could help a lot.

@timbray timbray merged commit e3d13cd into timbray:main Jan 27, 2026
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants