kaizen: Precomputed epsilon closures #482

sayrer · 2026-01-27T19:53:25Z

Epsilon closures are a property of the automaton structure, not the input data. Once a pattern is added and the NFA is built, the epsilon closure for any given state is fixed and never changes. (see #470)

Before: Every time we matched an event, traverseNFA called getClosure(state) which:

Did a map lookup to check if we'd seen this state before
If not, traversed epsilon transitions recursively
Allocated a new []*faState slice for the result
Cached it in a map for future lookups

This happened on every state, every byte, every match.

After: When a pattern is added, we walk the automaton once and store each state's closure directly on the faState struct:

type faState struct {
    table            *smallTable
    fieldTransitions []*fieldMatcher
    isSpinner        bool
    epsilonClosure   []*faState  // precomputed
}

At match time, traverseNFA just reads state.epsilonClosure - a direct slice access instead of a map lookup.

Tradeoff: Small increase in build-time cost and memory per state, but eliminates map lookups and allocations from the hot path.

nfa.go:
- Added epsilonClosure []*faState field to faState struct
- Added precomputeEpsilonClosures(), precomputeClosuresRecursive(), computeClosureForState(), and traverseEpsilonsForClosure() functions
- Updated traverseNFA to use precomputed closures with fallback for tests
value_matcher.go:
- Call precomputeEpsilonClosures() after setting startTable, but only when isNondeterministic is true

~~The eClosure field in nfaBuffers was kept for backward compatibility with tests that call traverseNFA directly without going through the normal addPattern path.~~

1. **nfa.go**: - Added `epsilonClosure []*faState` field to `faState` struct - Added `precomputeEpsilonClosures()`, `precomputeClosuresRecursive()`, `computeClosureForState()`, and `traverseEpsilonsForClosure()` functions - Updated `traverseNFA` to use precomputed closures with fallback for tests 2. **value_matcher.go**: - Call `precomputeEpsilonClosures()` after setting `startTable`, but only when `isNondeterministic` is true The `eClosure` field in `nfaBuffers` was kept for backward compatibility with tests that call `traverseNFA` directly without going through the normal `addPattern` path.

sayrer · 2026-01-27T21:13:04Z

Before:

%  go test -bench=Benchmark8259Example -benchmem -benchtime=5s -run=^$
FA: Field matchers: 2 (avg size 2.500, max 4)
Value matchers: 5
SmallTables 20371 (splices 6, avg 4.033, max 66, epsilons avg 0.001, max 2) singletons 1
101793/sec
goos: darwin
goarch: arm64
pkg: quamina.net/go/quamina
cpu: Apple M1 Ultra
Benchmark8259Example-20    	  693202	      9824 ns/op	     498 B/op	      17 allocs/op

After:

%  go test -bench=Benchmark8259Example -benchmem -benchtime=5s -run=^$
FA: Field matchers: 2 (avg size 2.500, max 4)
Value matchers: 5
SmallTables 20371 (splices 6, avg 4.033, max 66, epsilons avg 0.001, max 2) singletons 1
169022/sec
goos: darwin
goarch: arm64
pkg: quamina.net/go/quamina
cpu: Apple M1 Ultra
Benchmark8259Example-20    	 1000000	      5916 ns/op	     288 B/op	      17 allocs/op

timbray · 2026-01-27T22:59:52Z

So, this is complicated, pardon me while I think out loud a bit. Note: Issue #481 is highly relevent here.

First of all, this is even better than you think it is. Remember, nfaBufs comes at the top level from the quamina.bufs field, which means it is stored per-thread. So if you have N goroutines pounding away at a Quamina instance, they'll each have cached not just the buffers (which is necessary) but also the epsilon closures which, as you point out, really want to be global and computed once and shared.

So why not just go ahead? What worries me is I think this is a ticking time bomb in the same way that nfa2Dfa is, i.e. that the horrors of O(2^N) could erupt when addPattern() is fed the wrong NFA as input. So, before we accept this, first of all we need to determine how likely and how bad this sort of explosion is. I think I know how to do that, and it's with our old friend TestShellStyleBuildTime.

Let's assume that indeed, a bad NFA can make this explode. Because now NFAs have two optimization paths: 1. Persist the epsilon closures and 2. Morph them into DFAs. The former is cheaper (I think?) but if you can do the latter you don't need epsilon closures. What's the best strategy for giving adopters of Quamina some dials to turn to get the best results and protect themselves from disasters?

I am definitely going to have to think about this for a bit. Any helpful thoughts on the conundrum are welcome.

sayrer · 2026-01-27T23:03:43Z

Sorry to cross threads, I just left a comment in #481 about this issue. I think the answer is a state budget there (no timers, etc).

timbray · 2026-01-27T23:07:30Z

I think time is easier for callers to understand? And the number of states is not a linear function of anything. But the idea isn't crazy. There's good news in that while Quamina is at work optimizing an NFA, this doesn't stall the matching (aside from probably burning 100% of a core), just blocks further calls to AddPattern(). Hmmm.

Stepping away from the keyboard for a bit.

sayrer · 2026-01-28T01:15:49Z

I think time is easier for callers to understand? And the number of states is not a linear function of anything. But the idea isn't crazy.

No need for a quick response. But this is right... No one will understand this setting as a casual user. It would be like setting a JPEG encoder to "90%" or something similar.

timbray

OK, you're right, epsilon closures are the cost of doing business with NFAs, so they should be built at the same time. Putting it in faState is correct. Unless I'm missing something, there's a chance here to use the existing epsi-closure code, no?

BTW,

The eClosure field in nfaBuffers was kept for backward compatibility with tests that call traverseNFA directly without going through the normal addPattern path.

Do we really have to do this? It's ugly.

nfa.go

sayrer · 2026-01-28T19:58:40Z

The eClosure field in nfaBuffers was kept for backward compatibility with tests that call traverseNFA directly without going through the normal addPattern path.

Do we really have to do this? It's ugly.

No, we can change the tests if you're cool with it. I always err on the side of no test changes.

…er/quamina into pr/optimize-epsilon-closure

sayrer · 2026-01-29T16:51:37Z

This patch is now ugly in a new way, just making sure everything passes before I keep going.

Move precomputeEpsilonClosures calls from test files into the NFA creation functions (makeShellStyleFA, makeWildCardFA, makeRegexpNFA). This ensures epsilon closures are always computed when an NFA is created, eliminating the need for callers to remember to call it. The production code in value_matcher.go still calls it after mergeFAs, which remains necessary because merging creates new states. Since precomputeEpsilonClosures is idempotent, this works correctly. Also changed TestSkinnyRuneTree to use traverseDFA since nfaFromSkinnyRuneTree creates a deterministic FA. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

sayrer · 2026-01-29T17:51:50Z

epsi_closure_test.go

-	//fmt.Println("B machine: " + pp.printNFA(bSsplice.table))

-	bEcShouldBeZero := []*faState{bSa, bSb, bSx, bSstar}
+	bEcShouldBeOne := []*faState{bSa, bSb, bSx, bSstar}


Did a double-take here. I think these names are what it's supposed to be.

sayrer · 2026-01-29T17:58:05Z

The previous numbers didn't have the JSON flattener optimization, that's what dropped the allocs from 17 to 13.

Before:

 % go test -bench=Benchmark8259Example -benchmem -benchtime=5s -run=^$
FA: Field matchers: 2 (avg size 2.500, max 4)
Value matchers: 5
SmallTables 20371 (splices 6, avg 4.033, max 66, epsilons avg 0.001, max 2) singletons 1
103804/sec
goos: darwin
goarch: arm64
pkg: quamina.net/go/quamina
cpu: Apple M1 Ultra
Benchmark8259Example-20    	  687200	      9634 ns/op	     464 B/op	      13 allocs/op

After:

% go test -bench=Benchmark8259Example -benchmem -benchtime=5s -run=^$
FA: Field matchers: 2 (avg size 2.500, max 4)
Value matchers: 5
SmallTables 20371 (splices 6, avg 4.033, max 66, epsilons avg 0.001, max 2) singletons 1
169301/sec
goos: darwin
goarch: arm64
pkg: quamina.net/go/quamina
cpu: Apple M1 Ultra
Benchmark8259Example-20    	 1000000	      5907 ns/op	     252 B/op	      13 allocs/op

timbray · 2026-01-29T20:10:29Z

I guess the assumption has to be that we're going to go ahead and compute the closure for every pattern that needs an NFA: shellstyle, wildcard, and regexp. But note that not all regexes need NFAs, for example ., [a-z0-9], [^z], (a|b){3}, ~p{} so the whole read_regexp and makeRegexpNFA are going to have to be refactored a bit.

Anyhow, obviously we'll take this given the measurements. Needs a bit of work; I apologize in advance for incoming pedantic grumbles about function naming and so on.

In parallel I'm working on code in a file named memory_cost.go, the contents of which are obvious.

sayrer · 2026-01-29T20:17:20Z

I guess the assumption has to be that we're going to go ahead and compute the closure for every pattern that needs an NFA: shellstyle, wildcard, and regexp. But note that not all regexes need NFAs, for example ., [a-z0-9], [^z], (a|b){3}, ~p{} so the whole read_regexp and makeRegexpNFA are going to have to be refactored a bit.

Anyhow, obviously we'll take this given the measurements. Needs a bit of work; I apologize in advance for incoming pedantic grumbles about function naming and so on.

Don't worry about that part, but this is an underlying flaw in the naming. nfa.go contains traverseDFA etc. I'd suggest getting this one and the one you have there in before changing lots of names.

timbray

Good stuff, the more I read this the more obvious it is that precomputing closures is the way to go.

epsi_closure.go

timbray · 2026-01-29T20:32:27Z

epsi_closure.go

+// and precomputes the epsilon closure for every reachable faState.
+func precomputeEpsilonClosures(table *smallTable) {
+	visited := make(map[*smallTable]bool)
+	computeClosureForNfa(table, visited)


Here's a refactoring idea to think about. I notice this recurring idiom:

computeClosureForState(state) computeClosureForNfa(state.table, visited)

If you passed visited to computeClosureForState then you could have computeClosureForState call computeClosureForNfa, which leaves the problem of the top-level function coming in at the smallTable level, which could be fixed by making the top-level function a one-liner

computeClosureForState(&faState{table:table}, make(map[*smallTable]bool))

I think this one is necessary. I didn't write it that way with performance in mind, but I did do it with pprof in hand. Here's the summary:

The suggested refactoring doesn't work because closureForState is called in the hot path (traverseNFA) for every event match. There, we only need to compute closure for a single wrapper state - walking the entire NFA would be catastrophic. The two-call pattern in closureForNfa is necessary to keep closureForState simple for external callers: closureForState(state) // compute one state's closure closureForNfa(state.table, visited) // recurse into NFA This separation is intentional - closureForState must remain a simple O(epsilons) operation, not an O(NFA size) walk.

epsi_closure.go

timbray · 2026-01-29T20:49:46Z

value_matcher.go

 	// there's already a table, thus an out-degree > 1
 	if fields.startTable != nil {
 		fields.startTable = mergeFAs(fields.startTable, newFA, printer)
+		if fields.isNondeterministic {


Hmm, not sure about this. We now have calls to precomputeEpsilonClosure in all the FA-generating code, and I agree that that work should best be done at the lower level where the code knows whether it's needed. So maybe not necessary?

I guess we still need the fields.isNondeterministic value to guide the valueMatcher on whether to use traverseNfa or traverseDfa, but I don't think for the closure computation?

epsi_closure.go

timbray · 2026-01-29T21:05:24Z

nfa.go

-	ec := newEpsilonClosure()
-	return n2dNode(startNfa, newStateLists(), ec)
+	startState := &faState{table: nfaTable}
+	computeClosureForState(startState)


My thinking is fuzzy on this one… we still haven't figured out when/how to call nfa2Dfa, but if/when we do, based on the rest of this commit, wouldn't the closure already have been computed, as the comment above says? So why call this again?

timbray · 2026-01-29T21:09:30Z

Don't worry about that part, but this is an underlying flaw in the naming. nfa.go contains traverseDFA etc. I'd suggest getting this one and the one you have there in before changing lots of names.

Let's clean up incrementally and when we check in names, be satisfied with them. I agree that nfa.go should be automaton.go or some such.

timbray

Actually, there may be a larger problem here, so sticking a reminder pin in the map.

In valueMatcher and a couple of other places, when we want to have the start of an automaton, it's a *smallTable, because … partly because you can't have any transitions there and … uh, don't remember for sure.

But I think we might need to have a *faState there instead because we need its epsilonClosure field.

sayrer · 2026-01-29T21:16:58Z

Don't worry about that part, but this is an underlying flaw in the naming. nfa.go contains traverseDFA etc. I'd suggest getting this one and the one you have there in before changing lots of names.

Let's clean up incrementally and when we check in names, be satisfied with them. I agree that nfa.go should be automaton.go or some such.

This one I disagree with, but I don't feel strongly about. My thinking is that if nfa2dfa starts working (and I am pretty sure it will), the names will be different, so we're just gilding the lilly.

This was referenced Jan 27, 2026

More allocation avoidance. #470

Closed

Pool transmap buffer #478

Merged

Merge origin/main into pr/optimize-epsilon-closure

6450d21

Merge branch 'main' into pr/optimize-epsilon-closure

2252458

sayrer mentioned this pull request Jan 27, 2026

NFA => DFA optimization #481

Open

Merge branch 'main' into pr/optimize-epsilon-closure

d9687e1

timbray reviewed Jan 28, 2026

View reviewed changes

nfa.go Outdated Show resolved Hide resolved

nfa.go Outdated Show resolved Hide resolved

nfa.go Outdated Show resolved Hide resolved

timbray changed the title ~~Precomputed epsilon closures~~ kaizen: Precomputed epsilon closures Jan 29, 2026

sayrer added 2 commits January 29, 2026 08:50

Remove test path.

8880e9b

Merge branch 'pr/optimize-epsilon-closure' of https://github.com/sayr…

0347d00

…er/quamina into pr/optimize-epsilon-closure

sayrer and others added 3 commits January 29, 2026 09:13

Move new functions to epsi_closure.go, delete old ones.

b82a6bf

Use the name computeClosureForNfa.

1615df3

sayrer commented Jan 29, 2026

View reviewed changes

timbray reviewed Jan 29, 2026

View reviewed changes

Remove redundant 'compute' prefixes.

fc35cb4

sayrer added 2 commits January 29, 2026 13:48

Rename precomputeEpsilonClosure to epsilonClosure.

db66150

Adjust epsilonClosure.

213360f

kaizen: Precomputed epsilon closures #482

Are you sure you want to change the base?

kaizen: Precomputed epsilon closures #482

Uh oh!

Conversation

sayrer commented Jan 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sayrer commented Jan 27, 2026

Uh oh!

timbray commented Jan 27, 2026

Uh oh!

sayrer commented Jan 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

timbray commented Jan 27, 2026

Uh oh!

sayrer commented Jan 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

timbray left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

sayrer commented Jan 28, 2026

Uh oh!

sayrer commented Jan 29, 2026

Uh oh!

sayrer Jan 29, 2026

Choose a reason for hiding this comment

Uh oh!

sayrer commented Jan 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

timbray commented Jan 29, 2026

Uh oh!

sayrer commented Jan 29, 2026

Uh oh!

timbray left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

timbray Jan 29, 2026

Choose a reason for hiding this comment

Uh oh!

sayrer Jan 29, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

timbray Jan 29, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

timbray Jan 29, 2026

Choose a reason for hiding this comment

Uh oh!

timbray commented Jan 29, 2026

Uh oh!

timbray left a comment

Choose a reason for hiding this comment

Uh oh!

sayrer commented Jan 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

sayrer commented Jan 27, 2026 •

edited

Loading

sayrer commented Jan 27, 2026 •

edited

Loading

sayrer commented Jan 28, 2026 •

edited

Loading

sayrer commented Jan 29, 2026 •

edited

Loading

sayrer commented Jan 29, 2026 •

edited

Loading