src: improve StringBytes::Encode perf on UTF8 #61131
Draft
+36
−6
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Tracking: #61041
Depends on #61119, this is a PR chain, only the last commit belongs to this PR
Most data is valid utf-8, no need to wait for v8 optimizations or for simdutf implementing fast replacement.
We can just check + simdutf in fast case.
This is a 2x-10x speedup according to https://github.com/lemire/jstextdecoderbench bench (+ I added extra cases)
There is still room for improvement here (e.g. avoiding triple scans), but this change alone improves results significantly
We can improve further iteratively
This performs mallocs only for valid strings, instead of optimistically malloc-ing and decoding until error
Switching that behavior to optimistic would be a separate PR (perf needs to be checked against this not main or #61119)
Buffer#toString() - utf8
main:
#61119 :
PR:
TextDecoder, loose
main:
#61119 :
PR:
TextDecoder, fatal
main:
#61119 :
PR:
cc @nodejs/performance