Performance Optimisation for String Literal Matching#32
Open
AntonLydike wants to merge 4 commits intomainfrom
Open
Performance Optimisation for String Literal Matching#32AntonLydike wants to merge 4 commits intomainfrom
AntonLydike wants to merge 4 commits intomainfrom
Conversation
Owner
Author
|
I will benchmark this tonight against some real-world workloads and report numbers. Edit: Don't have tine tonight :/ |
superlopuh
approved these changes
Jul 30, 2024
Owner
Author
|
It's hard to tell if this actually speeds anything up in practice, MLIRs benchmark suite goes from a geomean of Further digging needed here. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR adds a performance optimisation that skips compiling a regex for string literal matching.
Motivation:
Issue #25 points out, that we are about ~6x slower than
filecheck 0.24in the worst case, and about 3.3x on average.We are also about 34x slower than LLVMs filecheck, but we can't get that down too far, due to pythons limitations. FileCheck is usually done before CPython finished loading the runtime.
Approach:
After some digging in traces (thanks to viztracer), I found that we spend a lot of time compiling regexes, even when they are just for fancy string literals (most of them are of the form
test\s+string\s..., which is regular enough to special case. This time is dominating everything else by a huge margin:The regex compile is about 135us of 156us total time spent, so about 85%. We then spend ~.8us on average in the actual matching logic. I was wondering how "slow" a non-regex implementation would compare.
I added logic in the existing check compiler that detects if the check is only made up of string literals, and returns a new
LiteralMatcherthat duck typesre.Patternfor all cases that mater for our implementation. As it turns out that is justfindandmatch.Sadly we can't just replace
re.searchbystring.findin all cases, as we need to handle white-space normalisation, which bloats the below code a bit. Otherwise it's quite readable though.LiteralMatcherreturns a special duck-typed version ofre.MatchcalledLiteralMatchthat only has a single group. This is all that's needed for this little hack, and the other code can be left unmodified, thanks to the power of duck typing (and modifying some type hints).Results:
The optimisation gets an average speedup in our benchmarks of 1.6x, making the new implementation only about 2.1x slower on average. This understates the effect though, as this optimisation manages to really cut down the longest benchmark (4.7k lines of
CHECK-NEXTstatements) times by more than 3x.See the below chart for overall results:
The new trace shows us that we have indeed removed a bottleneck:
The new timing shows that compilation time is down to 5.5us on average, but the matching has grown to ~14.7us on average. Still the average
CHECK-DAGstatement now is down to 21.6us, so a reduction of 7x.