_____ ____
/ \ | o |
| |/ ___\|
|_________/
|_|_| |_|_| Turtle part of the logo credit goes to:
--====:::Turtle RegEx Https://www.asciiart.eu/animals/reptiles/turtles
This is an incomplete (and likely incorrect in someways) implementation of a RegEx engine is pure Python.
It consists of 3 modules:
- The parser - which parses RegEx strings into match expressions represented as a tree of Match Node objects
- The matcher - which implements a
matchfunction to execute the match tree/ pattern against a source string - The regex - That provides two ways to search across a source string for substrings that match a pattern
searchfunction that takes a RegEx string and a source string and preforms the searchcompliefunction that takes a RegEx string and returns a Matcher object that has a pre-complied match tree- The
searchmethod on the Matcher object takes a source string and performs the match to the match pattern stored on the instance
- The
And main.py which provides a super basic CLI access to the underlying functionality.
On occasion I find it fun to build random software projects that implement some silly idea or a fundamental software engineering concept. Hopefully it can also serve as a illustration around the basics of some of these core concepts work, in this case Regular Expressions as a way to search in machine readable text.
More or less because it was fun for me to try and build this without referencing other implementations or going back to the computer science/ math theory behind it.
There is no pressing need in the world for another actually correct and performant implementation, much smarter folks have build those already, but building one in Python that can show the concepts in a simple way brought me joy! shurg
I call projects like these my "How it's made - Software edition" series, you can find more of them in my repos.
pip install uv
uv install --dev
uv main.py "some regex string" "some source string"
or
cat somefile.txt | uv main.py "some regex string"
uv run parser.py
uv run matcher.py
uv run regex.py
- TODO: (Hristo) Fix range matching for upper limit (if exceeded only stop, match is still true)
- TODO: (Hristo) Fix Matching
.any char is not lazy so consumes all - Maybe possibly add more functionality