Implement stateful tokenization #59
Open
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR adds stateful tokenization to the library. Based on previously matched tokens, the lexer can be put into different states, meaning it can work with a different ruleset for matching the next tokens. This change is useful, for instance, when trying to tokenize/parse string literals that contain string interpolations (e.g.
"foo${42}bar") or nested structures such as nested c-style block comments. The states are maintained using a stack situated in the lexer. To fully understand these changes, I suggest taking a look at the Readme changes, where the new behavior is explained in detail.Notes:
Tokenizer.mdReadme.These changes only make sense in case they don't conflict with any competing implementations (planned or pending), since the Readme states the following: