Spicy regular expressions are far slower than binpac #1990
evantypanski
started this conversation in
General
Replies: 1 comment
-
I took a stab at this as it has bothered me for a while already that the implementation of CCL matching is so naiive: rsmmr/justrx#12. Need to test this a bit more, but it seems even matching a single code point gets slightly faster. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
This is a relatively vague issue. I'm putting it in Spicy issues (not justrx) for visibility.
I think the "complete" condition for closing this issue would be a significant speedup for given test cases, or justification for why that will not happen. With a strong bias towards the former.
Looking at Spicy performance compared to binpac, I found that regular expressions are far slower than binpac's. I haven't looked much into why - this issue is for the "why" and "what to do about it"
For example, this Spicy unit takes 10 seconds to parse on my machine given the programmatically generated input:
The equivalent on binpac with a similar host application for timing takes about 0.3 seconds (the following is the binpac code):
This is a pretty big difference and one that, as a user, I would want explanation for. Capturing accounts for some of this; changing the regular expression unit to this lowers it from 10 seconds to about 3 seconds:
My guess is most use cases would want to capture large output like that, as it's likely some sort of data and not just resyncing and whatnot.
Aside: Character classes
The following unit takes differing amount of time depending on the input - All Z's makes it take 13 seconds, all A's makes it take 10 seconds (roughly):
Optimizing this case could reduce that discrepancy, but I don't believe it's the top priority here compared to the simple cases. Looking through regexes in parsers I don't see this kind of thing happening a lot, where a character class can't be reduced to a simple range that will get this worst-case parse time.
Beta Was this translation helpful? Give feedback.
All reactions