-
Notifications
You must be signed in to change notification settings - Fork 26
kaizen: Precomputed epsilon closures #482
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
1. **nfa.go**: - Added `epsilonClosure []*faState` field to `faState` struct - Added `precomputeEpsilonClosures()`, `precomputeClosuresRecursive()`, `computeClosureForState()`, and `traverseEpsilonsForClosure()` functions - Updated `traverseNFA` to use precomputed closures with fallback for tests 2. **value_matcher.go**: - Call `precomputeEpsilonClosures()` after setting `startTable`, but only when `isNondeterministic` is true The `eClosure` field in `nfaBuffers` was kept for backward compatibility with tests that call `traverseNFA` directly without going through the normal `addPattern` path.
|
Before: After: |
|
So, this is complicated, pardon me while I think out loud a bit. Note: Issue #481 is highly relevent here. First of all, this is even better than you think it is. Remember, nfaBufs comes at the top level from the So why not just go ahead? What worries me is I think this is a ticking time bomb in the same way that Let's assume that indeed, a bad NFA can make this explode. Because now NFAs have two optimization paths: 1. Persist the epsilon closures and 2. Morph them into DFAs. The former is cheaper (I think?) but if you can do the latter you don't need epsilon closures. What's the best strategy for giving adopters of Quamina some dials to turn to get the best results and protect themselves from disasters? I am definitely going to have to think about this for a bit. Any helpful thoughts on the conundrum are welcome. |
|
Sorry to cross threads, I just left a comment in #481 about this issue. I think the answer is a state budget there (no timers, etc). |
|
I think time is easier for callers to understand? And the number of states is not a linear function of anything. But the idea isn't crazy. There's good news in that while Quamina is at work optimizing an NFA, this doesn't stall the matching (aside from probably burning 100% of a core), just blocks further calls to Stepping away from the keyboard for a bit. |
No need for a quick response. But this is right... No one will understand this setting as a casual user. It would be like setting a JPEG encoder to "90%" or something similar. |
timbray
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, you're right, epsilon closures are the cost of doing business with NFAs, so they should be built at the same time. Putting it in faState is correct. Unless I'm missing something, there's a chance here to use the existing epsi-closure code, no?
BTW,
The eClosure field in nfaBuffers was kept for backward compatibility with tests that call traverseNFA directly without going through the normal addPattern path.
Do we really have to do this? It's ugly.
No, we can change the tests if you're cool with it. I always err on the side of no test changes. |
…er/quamina into pr/optimize-epsilon-closure
|
This patch is now ugly in a new way, just making sure everything passes before I keep going. |
Move precomputeEpsilonClosures calls from test files into the NFA creation functions (makeShellStyleFA, makeWildCardFA, makeRegexpNFA). This ensures epsilon closures are always computed when an NFA is created, eliminating the need for callers to remember to call it. The production code in value_matcher.go still calls it after mergeFAs, which remains necessary because merging creates new states. Since precomputeEpsilonClosures is idempotent, this works correctly. Also changed TestSkinnyRuneTree to use traverseDFA since nfaFromSkinnyRuneTree creates a deterministic FA. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
| //fmt.Println("B machine: " + pp.printNFA(bSsplice.table)) | ||
|
|
||
| bEcShouldBeZero := []*faState{bSa, bSb, bSx, bSstar} | ||
| bEcShouldBeOne := []*faState{bSa, bSb, bSx, bSstar} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did a double-take here. I think these names are what it's supposed to be.
|
The previous numbers didn't have the JSON flattener optimization, that's what dropped the allocs from 17 to 13. Before: After: |
|
I guess the assumption has to be that we're going to go ahead and compute the closure for every pattern that needs an NFA: shellstyle, wildcard, and regexp. But note that not all regexes need NFAs, for example Anyhow, obviously we'll take this given the measurements. Needs a bit of work; I apologize in advance for incoming pedantic grumbles about function naming and so on. In parallel I'm working on code in a file named memory_cost.go, the contents of which are obvious. |
Don't worry about that part, but this is an underlying flaw in the naming. |
timbray
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good stuff, the more I read this the more obvious it is that precomputing closures is the way to go.
epsi_closure.go
Outdated
| // and precomputes the epsilon closure for every reachable faState. | ||
| func precomputeEpsilonClosures(table *smallTable) { | ||
| visited := make(map[*smallTable]bool) | ||
| computeClosureForNfa(table, visited) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's a refactoring idea to think about. I notice this recurring idiom:
computeClosureForState(state)
computeClosureForNfa(state.table, visited)If you passed visited to computeClosureForState then you could have computeClosureForState call computeClosureForNfa, which leaves the problem of the top-level function coming in at the smallTable level, which could be fixed by making the top-level function a one-liner
computeClosureForState(&faState{table:table}, make(map[*smallTable]bool))There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this one is necessary. I didn't write it that way with performance in mind, but I did do it with pprof in hand. Here's the summary:
The suggested refactoring doesn't work because closureForState is called in the hot path (traverseNFA) for every event match. There, we only need to compute closure for a single wrapper
state - walking the entire NFA would be catastrophic.
The two-call pattern in closureForNfa is necessary to keep closureForState simple for external callers:
closureForState(state) // compute one state's closure
closureForNfa(state.table, visited) // recurse into NFA
This separation is intentional - closureForState must remain a simple O(epsilons) operation, not an O(NFA size) walk.
| // there's already a table, thus an out-degree > 1 | ||
| if fields.startTable != nil { | ||
| fields.startTable = mergeFAs(fields.startTable, newFA, printer) | ||
| if fields.isNondeterministic { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, not sure about this. We now have calls to precomputeEpsilonClosure in all the FA-generating code, and I agree that that work should best be done at the lower level where the code knows whether it's needed. So maybe not necessary?
I guess we still need the fields.isNondeterministic value to guide the valueMatcher on whether to use traverseNfa or traverseDfa, but I don't think for the closure computation?
nfa.go
Outdated
| ec := newEpsilonClosure() | ||
| return n2dNode(startNfa, newStateLists(), ec) | ||
| startState := &faState{table: nfaTable} | ||
| computeClosureForState(startState) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My thinking is fuzzy on this one… we still haven't figured out when/how to call nfa2Dfa, but if/when we do, based on the rest of this commit, wouldn't the closure already have been computed, as the comment above says? So why call this again?
Let's clean up incrementally and when we check in names, be satisfied with them. I agree that nfa.go should be automaton.go or some such. |
timbray
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, there may be a larger problem here, so sticking a reminder pin in the map.
In valueMatcher and a couple of other places, when we want to have the start of an automaton, it's a *smallTable, because … partly because you can't have any transitions there and … uh, don't remember for sure.
But I think we might need to have a *faState there instead because we need its epsilonClosure field.
This one I disagree with, but I don't feel strongly about. My thinking is that if |
Epsilon closures are a property of the automaton structure, not the input data. Once a pattern is added and the NFA is built, the epsilon closure for any given state is fixed and never changes. (see #470)
Before: Every time we matched an event,
traverseNFAcalledgetClosure(state)which:[]*faStateslice for the resultThis happened on every state, every byte, every match.
After: When a pattern is added, we walk the automaton once and store each state's closure directly on the
faStatestruct:At match time,
traverseNFAjust readsstate.epsilonClosure- a direct slice access instead of a map lookup.Tradeoff: Small increase in build-time cost and memory per state, but eliminates map lookups and allocations from the hot path.
nfa.go:
epsilonClosure []*faStatefield tofaStatestructprecomputeEpsilonClosures(),precomputeClosuresRecursive(),computeClosureForState(), andtraverseEpsilonsForClosure()functionstraverseNFAto use precomputed closures with fallback for testsvalue_matcher.go:
precomputeEpsilonClosures()after settingstartTable, but only whenisNondeterministicis trueTheeClosurefield innfaBufferswas kept for backward compatibility with tests that calltraverseNFAdirectly without going through the normaladdPatternpath.