src: improve StringBytes::Encode perf on UTF8 #61131

ChALkeR · 2025-12-20T00:03:59Z

Tracking: #61041

Depends on #61119, this is a PR chain, only the last commit belongs to this PR

Most data is valid utf-8, no need to wait for v8 optimizations or for simdutf implementing fast replacement.
We can just check + simdutf in fast case.

This is a 2x-10x speedup according to https://github.com/lemire/jstextdecoderbench bench (+ I added extra cases)

There is still room for improvement here (e.g. avoiding triple scans), but this change alone improves results significantly
We can improve further iteratively
This performs mallocs only for valid strings, instead of optimistically malloc-ing and decoding until error
Switching that behavior to optimistic would be a separate PR (perf needs to be checked against this not main or #61119)

Buffer#toString() - utf8

main:

Test	Size	Throughput	Mean Time
Latin lipsum (ASCII)	84.902 KiB	18.21 GiB/s	0.005 ms
Arabic lipsum	79.771 KiB	0.29 GiB/s	0.266 ms
Chinese lipsum	68.203 KiB	0.34 GiB/s	0.192 ms
Arabic + 2 * ASCII	249.575 KiB	0.73 GiB/s	0.329 ms

#61119 :

Test	Size	Throughput	Mean Time
Latin lipsum (ASCII)	84.902 KiB	36.75 GiB/s	0.002 ms
Arabic lipsum	79.771 KiB	0.28 GiB/s	0.273 ms
Chinese lipsum	68.203 KiB	0.33 GiB/s	0.197 ms
Arabic + 2 * ASCII	249.575 KiB	0.69 GiB/s	0.344 ms

PR:

Test	Size	Throughput	Mean Time
Latin lipsum (ASCII)	84.902 KiB	36.84 GiB/s	0.002 ms
Arabic lipsum	79.771 KiB	2.03 GiB/s	0.038 ms
Chinese lipsum	68.203 KiB	4.06 GiB/s	0.016 ms
Arabic + 2 * ASCII	249.577 KiB	3.42 GiB/s	0.072 ms

TextDecoder, loose

main:

Test	Size	Throughput	Mean Time
Latin lipsum (ASCII)	84.902 KiB	17.99 GiB/s	0.005 ms
Arabic lipsum	79.771 KiB	0.28 GiB/s	0.270 ms
Chinese lipsum	68.203 KiB	0.34 GiB/s	0.194 ms
Arabic + 2 * ASCII	249.577 KiB	0.71 GiB/s	0.333 ms

#61119 :

Test	Size	Throughput	Mean Time
Latin lipsum (ASCII)	84.902 KiB	36.59 GiB/s	0.002 ms
Arabic lipsum	79.771 KiB	0.28 GiB/s	0.271 ms
Chinese lipsum	68.203 KiB	0.34 GiB/s	0.192 ms
Arabic + 2 * ASCII	249.577 KiB	0.70 GiB/s	0.340 ms

PR:

Test	Size	Throughput	Mean Time
Latin lipsum (ASCII)	84.902 KiB	36.78 GiB/s	0.002 ms
Arabic lipsum	79.771 KiB	2.03 GiB/s	0.038 ms
Chinese lipsum	68.203 KiB	4.01 GiB/s	0.016 ms
Arabic + 2 * ASCII	249.577 KiB	3.42 GiB/s	0.072 ms

TextDecoder, fatal

main:

Test	Size	Throughput	Mean Time
Latin lipsum (ASCII)	84.902 KiB	15.31 GiB/s	0.006 ms
Arabic lipsum	79.771 KiB	0.27 GiB/s	0.279 ms
Chinese lipsum	68.203 KiB	0.34 GiB/s	0.194 ms
Arabic + 2 * ASCII	249.577 KiB	0.71 GiB/s	0.338 ms

#61119 :

Test	Size	Throughput	Mean Time
Latin lipsum (ASCII)	84.902 KiB	36.63 GiB/s	0.002 ms
Arabic lipsum	79.771 KiB	0.28 GiB/s	0.272 ms
Chinese lipsum	68.203 KiB	0.33 GiB/s	0.197 ms
Arabic + 2 * ASCII	249.577 KiB	0.68 GiB/s	0.351 ms

PR:

Test	Size	Throughput	Mean Time
Latin lipsum (ASCII)	84.902 KiB	36.71 GiB/s	0.002 ms
Arabic lipsum	79.771 KiB	1.70 GiB/s	0.046 ms
Chinese lipsum	68.203 KiB	2.97 GiB/s	0.022 ms
Arabic + 2 * ASCII	249.577 KiB	3.01 GiB/s	0.082 ms

cc @nodejs/performance

ChALkeR added 2 commits December 19, 2025 09:09

src: improve StringBytes::Encode perf on ASCII

866781f

src: use validate_ascii_with_errors

42b4ff4

nodejs-github-bot added buffer Issues and PRs related to the buffer subsystem. c++ Issues and PRs that require attention from people who are familiar with C++. needs-ci PRs that need a full CI run. labels Dec 20, 2025

ChALkeR force-pushed the chalker/non-ascii/0 branch from 63c910f to 5b2b040 Compare December 20, 2025 05:38

src: improve StringBytes::Encode perf on UTF8

aee5408

ChALkeR force-pushed the chalker/non-ascii/0 branch from 5b2b040 to aee5408 Compare December 20, 2025 05:49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

src: improve StringBytes::Encode perf on UTF8 #61131

src: improve StringBytes::Encode perf on UTF8 #61131

ChALkeR commented Dec 20, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

src: improve StringBytes::Encode perf on UTF8 #61131

Are you sure you want to change the base?

src: improve StringBytes::Encode perf on UTF8 #61131

Conversation

ChALkeR commented Dec 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Buffer#toString() - utf8

TextDecoder, loose

TextDecoder, fatal

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ChALkeR commented Dec 20, 2025 •

edited

Loading