Skip to content

Improve parser and lexer #811

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 9 commits into
base: master
Choose a base branch
from

Conversation

diegommm
Copy link

@diegommm diegommm commented Jun 28, 2025

Fixes #810

Benchmark results (90% faster, 96% less memory copying, 89% less allocations):

goos: darwin
goarch: arm64
pkg: github.com/expr-lang/expr/parser
cpu: Apple M4 Pro
          │   old.txt    │               new.txt               │
          │    sec/op    │   sec/op     vs base                │
Parser-12   3219.5n ± 2%   310.2n ± 1%  -90.36% (p=0.000 n=20)

          │   old.txt   │              new.txt               │
          │    B/op     │    B/op     vs base                │
Parser-12   4971.0 ± 0%   208.0 ± 0%  -95.82% (p=0.000 n=20)

          │   old.txt   │              new.txt               │
          │  allocs/op  │ allocs/op   vs base                │
Parser-12   56.000 ± 0%   6.000 ± 0%  -89.29% (p=0.000 n=20)

Benchmarks were performed in the parser/lexer directory executing the command:

go test -run=zzz-no-tests -bench=. -count=20 > file

The file was renamed to either old.txt or new.txt, and the comparison was made with:

benchstat old.txt new.txt
Raw results from old.txt

goos: darwin
goarch: arm64
pkg: github.com/expr-lang/expr/parser
cpu: Apple M4 Pro
BenchmarkParser-12        375018              3263 ns/op            4971 B/op         56 allocs/op
BenchmarkParser-12        376640              3159 ns/op            4971 B/op         56 allocs/op
BenchmarkParser-12        378164              3280 ns/op            4971 B/op         56 allocs/op
BenchmarkParser-12        350582              3214 ns/op            4971 B/op         56 allocs/op
BenchmarkParser-12        377919              3296 ns/op            4971 B/op         56 allocs/op
BenchmarkParser-12        360694              3185 ns/op            4971 B/op         56 allocs/op
BenchmarkParser-12        371702              3170 ns/op            4971 B/op         56 allocs/op
BenchmarkParser-12        373135              3181 ns/op            4971 B/op         56 allocs/op
BenchmarkParser-12        374607              3201 ns/op            4971 B/op         56 allocs/op
BenchmarkParser-12        372498              3320 ns/op            4971 B/op         56 allocs/op
BenchmarkParser-12        339115              3202 ns/op            4971 B/op         56 allocs/op
BenchmarkParser-12        362331              3190 ns/op            4971 B/op         56 allocs/op
BenchmarkParser-12        371295              3184 ns/op            4971 B/op         56 allocs/op
BenchmarkParser-12        372907              3237 ns/op            4971 B/op         56 allocs/op
BenchmarkParser-12        355515              3281 ns/op            4971 B/op         56 allocs/op
BenchmarkParser-12        371984              3206 ns/op            4971 B/op         56 allocs/op
BenchmarkParser-12        370429              3247 ns/op            4971 B/op         56 allocs/op
BenchmarkParser-12        369814              3225 ns/op            4971 B/op         56 allocs/op
BenchmarkParser-12        373510              3310 ns/op            4971 B/op         56 allocs/op
BenchmarkParser-12        351752              3279 ns/op            4971 B/op         56 allocs/op
PASS
ok      github.com/expr-lang/expr/parser        26.066s

Raw results from new.txt

goos: darwin
goarch: arm64
pkg: github.com/expr-lang/expr/parser
cpu: Apple M4 Pro
BenchmarkParser-12       3644658               311.9 ns/op           208 B/op          6 allocs/op
BenchmarkParser-12       3916356               305.8 ns/op           208 B/op          6 allocs/op
BenchmarkParser-12       3960404               304.3 ns/op           208 B/op          6 allocs/op
BenchmarkParser-12       3943411               308.3 ns/op           208 B/op          6 allocs/op
BenchmarkParser-12       3724376               311.1 ns/op           208 B/op          6 allocs/op
BenchmarkParser-12       3911833               304.8 ns/op           208 B/op          6 allocs/op
BenchmarkParser-12       3846391               305.4 ns/op           208 B/op          6 allocs/op
BenchmarkParser-12       3931287               312.6 ns/op           208 B/op          6 allocs/op
BenchmarkParser-12       3720000               310.0 ns/op           208 B/op          6 allocs/op
BenchmarkParser-12       3918597               308.1 ns/op           208 B/op          6 allocs/op
BenchmarkParser-12       3835420               309.2 ns/op           208 B/op          6 allocs/op
BenchmarkParser-12       3940215               315.4 ns/op           208 B/op          6 allocs/op
BenchmarkParser-12       3789723               309.2 ns/op           208 B/op          6 allocs/op
BenchmarkParser-12       3935113               308.1 ns/op           208 B/op          6 allocs/op
BenchmarkParser-12       3914610               311.9 ns/op           208 B/op          6 allocs/op
BenchmarkParser-12       3722302               343.2 ns/op           208 B/op          6 allocs/op
BenchmarkParser-12       3902730               312.3 ns/op           208 B/op          6 allocs/op
BenchmarkParser-12       3876708               310.4 ns/op           208 B/op          6 allocs/op
BenchmarkParser-12       3889995               311.9 ns/op           208 B/op          6 allocs/op
BenchmarkParser-12       3704998               325.5 ns/op           208 B/op          6 allocs/op
PASS
ok      github.com/expr-lang/expr/parser        31.117s

}

func (s Source) String() string {
return string(s)
return s.raw
}

func (s Source) Snippet(line int) (string, bool) {
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method is no longer used but I'm keeping it in case someone was using it.


"github.com/expr-lang/expr/file"
)

const minTokens = 10
Copy link
Author

@diegommm diegommm Jun 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This allows for optimistically allocating tokens. Discussing the "right" value for this parameter can be a loss of time (everyone has an opinion on what should be the "right" value).
The only thing I will hold is that using zero (the default) is the worst of all values because you will likely always need some nodes. It would be very rare that you don't. And the first few times your add a new node with append then the runtime will just have to copy over and over the same underlying array while it grows the slice.

@@ -335,6 +335,7 @@ literal not terminated (1:10)
früh ♥︎
unrecognized character: U+2665 '♥' (1:6)
| früh ♥︎
| .....^
Copy link
Author

@diegommm diegommm Jun 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This appeared to be a small bug. In the original code, if it found a rune longer than 1 byte it would skip the indicator line entirely. I imagine that wouldn't be good if we have something like:

let myStr = "Hello, 世界"; someError

In that case it wouldn't put the indicator line because it will find the '世' rune.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, I guess the problem is this rune width in terminal.


type Source []rune
type Source struct {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe just abandon Source altogether? And simply use string?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree, I can do that really quick or in a separate PR if you prefer for easier reviewing.

@diegommm diegommm changed the title Reduce lexer memory allocs Improve parser and lexer Jun 29, 2025
@diegommm
Copy link
Author

diegommm commented Jun 29, 2025

@antonmedv I have just committed new changes that allow the parser to use the new iterator API and made new benchmarks on the overall process of parsing (with iterator) instead of benchmarking only the lexer.

Checkout the new results! It's a lot. I'm done for this PR.

Comment on lines +16 to +19
p := new(Parser)
for i := 0; i < b.N; i++ {
p.Parse(source, nil)
}
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As the previous code does not have a reusable parser, the code that I run to benchmark the old code was:

	for i := 0; i < b.N; i++ {
		Parse(source)
	}

@diegommm diegommm requested a review from antonmedv June 29, 2025 18:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Reduce allocations during parsing
2 participants