Skip to content

Conversation

@sayrer
Copy link
Contributor

@sayrer sayrer commented Jan 23, 2026

This change adds a resultBuf field to nfaBuffers and a matchesInto()
method to matchSet. Instead of allocating a new slice for each match
result, we reuse the pooled buffer by appending matches to it. The
buffer is reset to length 0 before each use while preserving capacity.

See #470.

It only saves one alloc/op, but the bytes/op go way down.

Before:

% go test -bench=^Benchmark8259Example$ -run=^$
FA: Field matchers: 2 (avg size 2.500, max 4)
Value matchers: 5
SmallTables 20371 (splices 6, avg 4.033, max 66, epsilons avg 0.001, max 2) singletons 1
99842/sec
goos: darwin
goarch: arm64
pkg: quamina.net/go/quamina
cpu: Apple M1 Ultra
Benchmark8259Example-20    	  134223	     10016 ns/op	     904 B/op	      23 allocs/op

After:

% go test -bench=^Benchmark8259Example$ -run=^$
FA: Field matchers: 2 (avg size 2.500, max 4)
Value matchers: 5
SmallTables 20371 (splices 6, avg 4.033, max 66, epsilons avg 0.001, max 2) singletons 1
100248/sec
goos: darwin
goarch: arm64
pkg: quamina.net/go/quamina
cpu: Apple M1 Ultra
Benchmark8259Example-20    	  127650	      9975 ns/op	     680 B/op	      22 allocs/op

This change adds a resultBuf field to nfaBuffers and a matchesInto()
method to matchSet. Instead of allocating a new slice for each match
result, we reuse the pooled buffer by appending matches to it. The
buffer is reset to length 0 before each use while preserving capacity.
Copy link
Owner

@timbray timbray left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

saves one allocation and 41nsec per op. We're getting into diminishing returns. One question in the review.

// matchesInto appends matches to the provided buffer and returns it.
// This avoids allocating a new slice for the result.
func (m *matchSet) matchesInto(buf []X) []X {
for x := range m.set {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might the builtin copy() work here? Might even be a tiny bit faster.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh sorry, of course not, m.set is a map not a slice

// matchesInto appends matches to the provided buffer and returns it.
// This avoids allocating a new slice for the result.
func (m *matchSet) matchesInto(buf []X) []X {
for x := range m.set {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh sorry, of course not, m.set is a map not a slice

@timbray timbray merged commit 07aa8cd into timbray:main Jan 24, 2026
7 checks passed
@sayrer
Copy link
Contributor Author

sayrer commented Jan 24, 2026

It's a big win in memory footprint, even if only modest in CPU:

=== Heap Profile Comparison: main vs PR3 (pool-result-buffer) ===

TOTAL MEMORY ALLOCATED:
Main: 642.27 MB
PR3: 465.26 MB
SAVED: 177.01 MB (-27.6%)

TOTAL ALLOCATION OBJECTS:
Main: 12,775,054 allocations
PR3: 11,116,748 allocations
SAVED: 1,658,306 allocations (-13.0%)

KEY CHANGES:

matchSet.matches() (eliminated by pooling):
Main: 149.03 MB (23.20%), 651,139 allocations (5.10%)
PR3: ELIMINATED (0 MB, 0 allocations)
IMPACT: -149.03 MB, -651,139 allocations

transmap.all:
Main: 115.50 MB (17.98%), 4,325,491 allocations (33.86%)
PR3: 101.50 MB (21.82%), 3,776,613 allocations (33.97%)
SAVED: -14 MB, -548,878 allocations (-12.7%)

traverseNFA:
Main: 348.64 MB cumulative
PR3: 335.16 MB cumulative
SAVED: -13.48 MB (-3.9%)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants