Improve parser and lexer #811

diegommm · 2025-06-28T08:10:19Z

Fixes #810

Benchmark results (90% faster, 96% less memory copying, 89% less allocations):

goos: darwin
goarch: arm64
pkg: github.com/expr-lang/expr/parser
cpu: Apple M4 Pro
          │   old.txt    │               new.txt               │
          │    sec/op    │   sec/op     vs base                │
Parser-12   3219.5n ± 2%   310.2n ± 1%  -90.36% (p=0.000 n=20)

          │   old.txt   │              new.txt               │
          │    B/op     │    B/op     vs base                │
Parser-12   4971.0 ± 0%   208.0 ± 0%  -95.82% (p=0.000 n=20)

          │   old.txt   │              new.txt               │
          │  allocs/op  │ allocs/op   vs base                │
Parser-12   56.000 ± 0%   6.000 ± 0%  -89.29% (p=0.000 n=20)

Benchmarks were performed in the parser/lexer directory executing the command:

go test -run=zzz-no-tests -bench=. -count=20 > file

The file was renamed to either old.txt or new.txt, and the comparison was made with:

benchstat old.txt new.txt

Raw results from old.txt

goos: darwin
goarch: arm64
pkg: github.com/expr-lang/expr/parser
cpu: Apple M4 Pro
BenchmarkParser-12        375018              3263 ns/op            4971 B/op         56 allocs/op
BenchmarkParser-12        376640              3159 ns/op            4971 B/op         56 allocs/op
BenchmarkParser-12        378164              3280 ns/op            4971 B/op         56 allocs/op
BenchmarkParser-12        350582              3214 ns/op            4971 B/op         56 allocs/op
BenchmarkParser-12        377919              3296 ns/op            4971 B/op         56 allocs/op
BenchmarkParser-12        360694              3185 ns/op            4971 B/op         56 allocs/op
BenchmarkParser-12        371702              3170 ns/op            4971 B/op         56 allocs/op
BenchmarkParser-12        373135              3181 ns/op            4971 B/op         56 allocs/op
BenchmarkParser-12        374607              3201 ns/op            4971 B/op         56 allocs/op
BenchmarkParser-12        372498              3320 ns/op            4971 B/op         56 allocs/op
BenchmarkParser-12        339115              3202 ns/op            4971 B/op         56 allocs/op
BenchmarkParser-12        362331              3190 ns/op            4971 B/op         56 allocs/op
BenchmarkParser-12        371295              3184 ns/op            4971 B/op         56 allocs/op
BenchmarkParser-12        372907              3237 ns/op            4971 B/op         56 allocs/op
BenchmarkParser-12        355515              3281 ns/op            4971 B/op         56 allocs/op
BenchmarkParser-12        371984              3206 ns/op            4971 B/op         56 allocs/op
BenchmarkParser-12        370429              3247 ns/op            4971 B/op         56 allocs/op
BenchmarkParser-12        369814              3225 ns/op            4971 B/op         56 allocs/op
BenchmarkParser-12        373510              3310 ns/op            4971 B/op         56 allocs/op
BenchmarkParser-12        351752              3279 ns/op            4971 B/op         56 allocs/op
PASS
ok      github.com/expr-lang/expr/parser        26.066s

Raw results from new.txt

goos: darwin
goarch: arm64
pkg: github.com/expr-lang/expr/parser
cpu: Apple M4 Pro
BenchmarkParser-12       3644658               311.9 ns/op           208 B/op          6 allocs/op
BenchmarkParser-12       3916356               305.8 ns/op           208 B/op          6 allocs/op
BenchmarkParser-12       3960404               304.3 ns/op           208 B/op          6 allocs/op
BenchmarkParser-12       3943411               308.3 ns/op           208 B/op          6 allocs/op
BenchmarkParser-12       3724376               311.1 ns/op           208 B/op          6 allocs/op
BenchmarkParser-12       3911833               304.8 ns/op           208 B/op          6 allocs/op
BenchmarkParser-12       3846391               305.4 ns/op           208 B/op          6 allocs/op
BenchmarkParser-12       3931287               312.6 ns/op           208 B/op          6 allocs/op
BenchmarkParser-12       3720000               310.0 ns/op           208 B/op          6 allocs/op
BenchmarkParser-12       3918597               308.1 ns/op           208 B/op          6 allocs/op
BenchmarkParser-12       3835420               309.2 ns/op           208 B/op          6 allocs/op
BenchmarkParser-12       3940215               315.4 ns/op           208 B/op          6 allocs/op
BenchmarkParser-12       3789723               309.2 ns/op           208 B/op          6 allocs/op
BenchmarkParser-12       3935113               308.1 ns/op           208 B/op          6 allocs/op
BenchmarkParser-12       3914610               311.9 ns/op           208 B/op          6 allocs/op
BenchmarkParser-12       3722302               343.2 ns/op           208 B/op          6 allocs/op
BenchmarkParser-12       3902730               312.3 ns/op           208 B/op          6 allocs/op
BenchmarkParser-12       3876708               310.4 ns/op           208 B/op          6 allocs/op
BenchmarkParser-12       3889995               311.9 ns/op           208 B/op          6 allocs/op
BenchmarkParser-12       3704998               325.5 ns/op           208 B/op          6 allocs/op
PASS
ok      github.com/expr-lang/expr/parser        31.117s

diegommm · 2025-06-28T08:11:02Z

file/source.go

 }

 func (s Source) String() string {
-	return string(s)
+	return s.raw
 }

 func (s Source) Snippet(line int) (string, bool) {


This method is no longer used but I'm keeping it in case someone was using it.

diegommm · 2025-06-28T08:17:09Z

parser/lexer/lexer.go


 	"github.com/expr-lang/expr/file"
 )

+const minTokens = 10


This allows for optimistically allocating tokens. Discussing the "right" value for this parameter can be a loss of time (everyone has an opinion on what should be the "right" value).
The only thing I will hold is that using zero (the default) is the worst of all values because you will likely always need some nodes. It would be very rare that you don't. And the first few times your add a new node with append then the runtime will just have to copy over and over the same underlying array while it grows the slice.

diegommm · 2025-06-28T08:25:11Z

parser/lexer/lexer_test.go

@@ -335,6 +335,7 @@ literal not terminated (1:10)
 früh ♥︎
 unrecognized character: U+2665 '♥' (1:6)
 | früh ♥︎
+ | .....^


This appeared to be a small bug. In the original code, if it found a rune longer than 1 byte it would skip the indicator line entirely. I imagine that wouldn't be good if we have something like:

let myStr = "Hello, 世界"; someError

In that case it wouldn't put the indicator line because it will find the '世' rune.

Nice, I guess the problem is this rune width in terminal.

antonmedv · 2025-06-29T18:04:25Z

file/source.go


-type Source []rune
+type Source struct {


Maybe just abandon Source altogether? And simply use string?

Agree, I can do that really quick or in a separate PR if you prefer for easier reviewing.

diegommm · 2025-06-29T18:44:22Z

@antonmedv I have just committed new changes that allow the parser to use the new iterator API and made new benchmarks on the overall process of parsing (with iterator) instead of benchmarking only the lexer.

Checkout the new results! It's a lot. I'm done for this PR.

diegommm · 2025-06-29T18:46:54Z

parser/bench_test.go

+	p := new(Parser)
+	for i := 0; i < b.N; i++ {
+		p.Parse(source, nil)
+	}


As the previous code does not have a reusable parser, the code that I run to benchmark the old code was:

for i := 0; i < b.N; i++ { Parse(source) }

diegommm added 3 commits June 28, 2025 01:46

convert source to a struct and fix minor issue displaying error

f835edd

lexer: add byte and char pos

10e4040

improve parser to use string instead of []rune

89f608f

diegommm commented Jun 28, 2025

View reviewed changes

diegommm added 3 commits June 28, 2025 12:43

add lexer benchmarks

436b8ae

make lexer work as an iterator

152f67d

make parser use the new iterator API in lexer

2f4437c

antonmedv reviewed Jun 29, 2025

View reviewed changes

antonmedv approved these changes Jun 29, 2025

View reviewed changes

diegommm added 3 commits June 29, 2025 15:28

allow reusing the parser

a13ee0e

cleanup code

af94f4b

add parser benchmarks

e2111b8

diegommm changed the title ~~Reduce lexer memory allocs~~ Improve parser and lexer Jun 29, 2025

diegommm commented Jun 29, 2025

View reviewed changes

diegommm requested a review from antonmedv June 29, 2025 18:54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Improve parser and lexer #811

Improve parser and lexer #811

diegommm commented Jun 28, 2025 •

edited

Loading

Uh oh!

diegommm Jun 28, 2025

Uh oh!

diegommm Jun 28, 2025 •

edited

Loading

Uh oh!

diegommm Jun 28, 2025 •

edited

Loading

Uh oh!

antonmedv Jun 29, 2025

Uh oh!

antonmedv Jun 29, 2025

Uh oh!

diegommm Jun 29, 2025

Uh oh!

diegommm commented Jun 29, 2025 •

edited

Loading

Uh oh!

diegommm Jun 29, 2025

Uh oh!

Uh oh!

Uh oh!

Improve parser and lexer #811

Are you sure you want to change the base?

Improve parser and lexer #811

Conversation

diegommm commented Jun 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

diegommm Jun 28, 2025

Choose a reason for hiding this comment

Uh oh!

diegommm Jun 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

diegommm Jun 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

antonmedv Jun 29, 2025

Choose a reason for hiding this comment

Uh oh!

antonmedv Jun 29, 2025

Choose a reason for hiding this comment

Uh oh!

diegommm Jun 29, 2025

Choose a reason for hiding this comment

Uh oh!

diegommm commented Jun 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

diegommm Jun 29, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

diegommm commented Jun 28, 2025 •

edited

Loading

diegommm Jun 28, 2025 •

edited

Loading

diegommm Jun 28, 2025 •

edited

Loading

diegommm commented Jun 29, 2025 •

edited

Loading