Skip to content

Conversation

@sayrer
Copy link
Contributor

@sayrer sayrer commented Jan 27, 2026

Epsilon closures are a property of the automaton structure, not the input data. Once a pattern is added and the NFA is built, the epsilon closure for any given state is fixed and never changes. (see #470)

Before: Every time we matched an event, traverseNFA called getClosure(state) which:

  1. Did a map lookup to check if we'd seen this state before
  2. If not, traversed epsilon transitions recursively
  3. Allocated a new []*faState slice for the result
  4. Cached it in a map for future lookups

This happened on every state, every byte, every match.

After: When a pattern is added, we walk the automaton once and store each state's closure directly on the faState struct:

type faState struct {
    table            *smallTable
    fieldTransitions []*fieldMatcher
    isSpinner        bool
    epsilonClosure   []*faState  // precomputed
}

At match time, traverseNFA just reads state.epsilonClosure - a direct slice access instead of a map lookup.

Tradeoff: Small increase in build-time cost and memory per state, but eliminates map lookups and allocations from the hot path.

  1. nfa.go:

    • Added epsilonClosure []*faState field to faState struct
    • Added precomputeEpsilonClosures(), precomputeClosuresRecursive(), computeClosureForState(), and traverseEpsilonsForClosure() functions
    • Updated traverseNFA to use precomputed closures with fallback for tests
  2. value_matcher.go:

    • Call precomputeEpsilonClosures() after setting startTable, but only when isNondeterministic is true

The eClosure field in nfaBuffers was kept for backward compatibility with tests that call traverseNFA directly without going through the normal addPattern path.

1. **nfa.go**:
   - Added `epsilonClosure []*faState` field to `faState` struct
   - Added `precomputeEpsilonClosures()`, `precomputeClosuresRecursive()`, `computeClosureForState()`, and `traverseEpsilonsForClosure()` functions
   - Updated `traverseNFA` to use precomputed closures with fallback for tests

2. **value_matcher.go**:
   - Call `precomputeEpsilonClosures()` after setting `startTable`, but only when `isNondeterministic` is true

The `eClosure` field in `nfaBuffers` was kept for backward compatibility with tests that call `traverseNFA` directly without going through the normal `addPattern` path.
This was referenced Jan 27, 2026
@sayrer
Copy link
Contributor Author

sayrer commented Jan 27, 2026

Before:

%  go test -bench=Benchmark8259Example -benchmem -benchtime=5s -run=^$
FA: Field matchers: 2 (avg size 2.500, max 4)
Value matchers: 5
SmallTables 20371 (splices 6, avg 4.033, max 66, epsilons avg 0.001, max 2) singletons 1
101793/sec
goos: darwin
goarch: arm64
pkg: quamina.net/go/quamina
cpu: Apple M1 Ultra
Benchmark8259Example-20    	  693202	      9824 ns/op	     498 B/op	      17 allocs/op

After:

%  go test -bench=Benchmark8259Example -benchmem -benchtime=5s -run=^$
FA: Field matchers: 2 (avg size 2.500, max 4)
Value matchers: 5
SmallTables 20371 (splices 6, avg 4.033, max 66, epsilons avg 0.001, max 2) singletons 1
169022/sec
goos: darwin
goarch: arm64
pkg: quamina.net/go/quamina
cpu: Apple M1 Ultra
Benchmark8259Example-20    	 1000000	      5916 ns/op	     288 B/op	      17 allocs/op

@timbray
Copy link
Owner

timbray commented Jan 27, 2026

So, this is complicated, pardon me while I think out loud a bit. Note: Issue #481 is highly relevent here.

First of all, this is even better than you think it is. Remember, nfaBufs comes at the top level from the quamina.bufs field, which means it is stored per-thread. So if you have N goroutines pounding away at a Quamina instance, they'll each have cached not just the buffers (which is necessary) but also the epsilon closures which, as you point out, really want to be global and computed once and shared.

So why not just go ahead? What worries me is I think this is a ticking time bomb in the same way that nfa2Dfa is, i.e. that the horrors of O(2^N) could erupt when addPattern() is fed the wrong NFA as input. So, before we accept this, first of all we need to determine how likely and how bad this sort of explosion is. I think I know how to do that, and it's with our old friend TestShellStyleBuildTime.

Let's assume that indeed, a bad NFA can make this explode. Because now NFAs have two optimization paths: 1. Persist the epsilon closures and 2. Morph them into DFAs. The former is cheaper (I think?) but if you can do the latter you don't need epsilon closures. What's the best strategy for giving adopters of Quamina some dials to turn to get the best results and protect themselves from disasters?

I am definitely going to have to think about this for a bit. Any helpful thoughts on the conundrum are welcome.

@sayrer
Copy link
Contributor Author

sayrer commented Jan 27, 2026

Sorry to cross threads, I just left a comment in #481 about this issue. I think the answer is a state budget there (no timers, etc).

@timbray
Copy link
Owner

timbray commented Jan 27, 2026

I think time is easier for callers to understand? And the number of states is not a linear function of anything. But the idea isn't crazy. There's good news in that while Quamina is at work optimizing an NFA, this doesn't stall the matching (aside from probably burning 100% of a core), just blocks further calls to AddPattern(). Hmmm.

Stepping away from the keyboard for a bit.

@sayrer
Copy link
Contributor Author

sayrer commented Jan 28, 2026

I think time is easier for callers to understand? And the number of states is not a linear function of anything. But the idea isn't crazy.

No need for a quick response. But this is right... No one will understand this setting as a casual user. It would be like setting a JPEG encoder to "90%" or something similar.

Copy link
Owner

@timbray timbray left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, you're right, epsilon closures are the cost of doing business with NFAs, so they should be built at the same time. Putting it in faState is correct. Unless I'm missing something, there's a chance here to use the existing epsi-closure code, no?

BTW,

The eClosure field in nfaBuffers was kept for backward compatibility with tests that call traverseNFA directly without going through the normal addPattern path.

Do we really have to do this? It's ugly.

@sayrer
Copy link
Contributor Author

sayrer commented Jan 28, 2026

The eClosure field in nfaBuffers was kept for backward compatibility with tests that call traverseNFA directly without going through the normal addPattern path.

Do we really have to do this? It's ugly.

No, we can change the tests if you're cool with it. I always err on the side of no test changes.

@timbray timbray changed the title Precomputed epsilon closures kaizen: Precomputed epsilon closures Jan 29, 2026
@sayrer
Copy link
Contributor Author

sayrer commented Jan 29, 2026

This patch is now ugly in a new way, just making sure everything passes before I keep going.

sayrer and others added 3 commits January 29, 2026 09:13
Move precomputeEpsilonClosures calls from test files into the NFA
creation functions (makeShellStyleFA, makeWildCardFA, makeRegexpNFA).
This ensures epsilon closures are always computed when an NFA is
created, eliminating the need for callers to remember to call it.

The production code in value_matcher.go still calls it after mergeFAs,
which remains necessary because merging creates new states. Since
precomputeEpsilonClosures is idempotent, this works correctly.

Also changed TestSkinnyRuneTree to use traverseDFA since
nfaFromSkinnyRuneTree creates a deterministic FA.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
//fmt.Println("B machine: " + pp.printNFA(bSsplice.table))

bEcShouldBeZero := []*faState{bSa, bSb, bSx, bSstar}
bEcShouldBeOne := []*faState{bSa, bSb, bSx, bSstar}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did a double-take here. I think these names are what it's supposed to be.

@sayrer
Copy link
Contributor Author

sayrer commented Jan 29, 2026

The previous numbers didn't have the JSON flattener optimization, that's what dropped the allocs from 17 to 13.

Before:

 % go test -bench=Benchmark8259Example -benchmem -benchtime=5s -run=^$
FA: Field matchers: 2 (avg size 2.500, max 4)
Value matchers: 5
SmallTables 20371 (splices 6, avg 4.033, max 66, epsilons avg 0.001, max 2) singletons 1
103804/sec
goos: darwin
goarch: arm64
pkg: quamina.net/go/quamina
cpu: Apple M1 Ultra
Benchmark8259Example-20    	  687200	      9634 ns/op	     464 B/op	      13 allocs/op

After:

% go test -bench=Benchmark8259Example -benchmem -benchtime=5s -run=^$
FA: Field matchers: 2 (avg size 2.500, max 4)
Value matchers: 5
SmallTables 20371 (splices 6, avg 4.033, max 66, epsilons avg 0.001, max 2) singletons 1
169301/sec
goos: darwin
goarch: arm64
pkg: quamina.net/go/quamina
cpu: Apple M1 Ultra
Benchmark8259Example-20    	 1000000	      5907 ns/op	     252 B/op	      13 allocs/op

@timbray
Copy link
Owner

timbray commented Jan 29, 2026

I guess the assumption has to be that we're going to go ahead and compute the closure for every pattern that needs an NFA: shellstyle, wildcard, and regexp. But note that not all regexes need NFAs, for example ., [a-z0-9], [^z], (a|b){3}, ~p{} so the whole read_regexp and makeRegexpNFA are going to have to be refactored a bit.

Anyhow, obviously we'll take this given the measurements. Needs a bit of work; I apologize in advance for incoming pedantic grumbles about function naming and so on.

In parallel I'm working on code in a file named memory_cost.go, the contents of which are obvious.

@sayrer
Copy link
Contributor Author

sayrer commented Jan 29, 2026

I guess the assumption has to be that we're going to go ahead and compute the closure for every pattern that needs an NFA: shellstyle, wildcard, and regexp. But note that not all regexes need NFAs, for example ., [a-z0-9], [^z], (a|b){3}, ~p{} so the whole read_regexp and makeRegexpNFA are going to have to be refactored a bit.

Anyhow, obviously we'll take this given the measurements. Needs a bit of work; I apologize in advance for incoming pedantic grumbles about function naming and so on.

Don't worry about that part, but this is an underlying flaw in the naming. nfa.go contains traverseDFA etc. I'd suggest getting this one and the one you have there in before changing lots of names.

Copy link
Owner

@timbray timbray left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good stuff, the more I read this the more obvious it is that precomputing closures is the way to go.

epsi_closure.go Outdated
// and precomputes the epsilon closure for every reachable faState.
func precomputeEpsilonClosures(table *smallTable) {
visited := make(map[*smallTable]bool)
computeClosureForNfa(table, visited)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's a refactoring idea to think about. I notice this recurring idiom:

			computeClosureForState(state)
			computeClosureForNfa(state.table, visited)

If you passed visited to computeClosureForState then you could have computeClosureForState call computeClosureForNfa, which leaves the problem of the top-level function coming in at the smallTable level, which could be fixed by making the top-level function a one-liner

            computeClosureForState(&faState{table:table}, make(map[*smallTable]bool))

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this one is necessary. I didn't write it that way with performance in mind, but I did do it with pprof in hand. Here's the summary:

  The suggested refactoring doesn't work because closureForState is called in the hot path (traverseNFA) for every event match. There, we only need to compute closure for a single wrapper
   state - walking the entire NFA would be catastrophic.                                                                                                                                   
                                                                                                                                                                                           
  The two-call pattern in closureForNfa is necessary to keep closureForState simple for external callers:                                                                                  
                                                                                                                                                                                           
  closureForState(state)           // compute one state's closure                                                                                                                          
  closureForNfa(state.table, visited)  // recurse into NFA                                                                                                                                 
                                                                                                                                                                                           
  This separation is intentional - closureForState must remain a simple O(epsilons) operation, not an O(NFA size) walk.

// there's already a table, thus an out-degree > 1
if fields.startTable != nil {
fields.startTable = mergeFAs(fields.startTable, newFA, printer)
if fields.isNondeterministic {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, not sure about this. We now have calls to precomputeEpsilonClosure in all the FA-generating code, and I agree that that work should best be done at the lower level where the code knows whether it's needed. So maybe not necessary?

I guess we still need the fields.isNondeterministic value to guide the valueMatcher on whether to use traverseNfa or traverseDfa, but I don't think for the closure computation?

nfa.go Outdated
ec := newEpsilonClosure()
return n2dNode(startNfa, newStateLists(), ec)
startState := &faState{table: nfaTable}
computeClosureForState(startState)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My thinking is fuzzy on this one… we still haven't figured out when/how to call nfa2Dfa, but if/when we do, based on the rest of this commit, wouldn't the closure already have been computed, as the comment above says? So why call this again?

@timbray
Copy link
Owner

timbray commented Jan 29, 2026

Don't worry about that part, but this is an underlying flaw in the naming. nfa.go contains traverseDFA etc. I'd suggest getting this one and the one you have there in before changing lots of names.

Let's clean up incrementally and when we check in names, be satisfied with them. I agree that nfa.go should be automaton.go or some such.

Copy link
Owner

@timbray timbray left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, there may be a larger problem here, so sticking a reminder pin in the map.

In valueMatcher and a couple of other places, when we want to have the start of an automaton, it's a *smallTable, because … partly because you can't have any transitions there and … uh, don't remember for sure.

But I think we might need to have a *faState there instead because we need its epsilonClosure field.

@sayrer
Copy link
Contributor Author

sayrer commented Jan 29, 2026

Don't worry about that part, but this is an underlying flaw in the naming. nfa.go contains traverseDFA etc. I'd suggest getting this one and the one you have there in before changing lots of names.

Let's clean up incrementally and when we check in names, be satisfied with them. I agree that nfa.go should be automaton.go or some such.

This one I disagree with, but I don't feel strongly about. My thinking is that if nfa2dfa starts working (and I am pretty sure it will), the names will be different, so we're just gilding the lilly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants