Skip to content

Conversation

@ChALkeR
Copy link
Member

@ChALkeR ChALkeR commented Dec 20, 2025

Tracking: #61041

Depends on #61119, this is a PR chain, only the last commit belongs to this PR

Most data is valid utf-8, no need to wait for v8 optimizations or for simdutf implementing fast replacement.
We can just check + simdutf in fast case.

This is a 2x-10x speedup according to https://github.com/lemire/jstextdecoderbench bench (+ I added extra cases)

There is still room for improvement here (e.g. avoiding triple scans), but this change alone improves results significantly
We can improve further iteratively
This performs mallocs only for valid strings, instead of optimistically malloc-ing and decoding until error
Switching that behavior to optimistic would be a separate PR (perf needs to be checked against this not main or #61119)

Buffer#toString() - utf8

main:

Test Size Throughput Mean Time
Latin lipsum (ASCII) 84.902 KiB 18.21 GiB/s 0.005 ms
Arabic lipsum 79.771 KiB 0.29 GiB/s 0.266 ms
Chinese lipsum 68.203 KiB 0.34 GiB/s 0.192 ms
Arabic + 2 * ASCII 249.575 KiB 0.73 GiB/s 0.329 ms

#61119 :

Test Size Throughput Mean Time
Latin lipsum (ASCII) 84.902 KiB 36.75 GiB/s 0.002 ms
Arabic lipsum 79.771 KiB 0.28 GiB/s 0.273 ms
Chinese lipsum 68.203 KiB 0.33 GiB/s 0.197 ms
Arabic + 2 * ASCII 249.575 KiB 0.69 GiB/s 0.344 ms

PR:

Test Size Throughput Mean Time
Latin lipsum (ASCII) 84.902 KiB 36.84 GiB/s 0.002 ms
Arabic lipsum 79.771 KiB 2.03 GiB/s 0.038 ms
Chinese lipsum 68.203 KiB 4.06 GiB/s 0.016 ms
Arabic + 2 * ASCII 249.577 KiB 3.42 GiB/s 0.072 ms

TextDecoder, loose

main:

Test Size Throughput Mean Time
Latin lipsum (ASCII) 84.902 KiB 17.99 GiB/s 0.005 ms
Arabic lipsum 79.771 KiB 0.28 GiB/s 0.270 ms
Chinese lipsum 68.203 KiB 0.34 GiB/s 0.194 ms
Arabic + 2 * ASCII 249.577 KiB 0.71 GiB/s 0.333 ms

#61119 :

Test Size Throughput Mean Time
Latin lipsum (ASCII) 84.902 KiB 36.59 GiB/s 0.002 ms
Arabic lipsum 79.771 KiB 0.28 GiB/s 0.271 ms
Chinese lipsum 68.203 KiB 0.34 GiB/s 0.192 ms
Arabic + 2 * ASCII 249.577 KiB 0.70 GiB/s 0.340 ms

PR:

Test Size Throughput Mean Time
Latin lipsum (ASCII) 84.902 KiB 36.78 GiB/s 0.002 ms
Arabic lipsum 79.771 KiB 2.03 GiB/s 0.038 ms
Chinese lipsum 68.203 KiB 4.01 GiB/s 0.016 ms
Arabic + 2 * ASCII 249.577 KiB 3.42 GiB/s 0.072 ms

TextDecoder, fatal

main:

Test Size Throughput Mean Time
Latin lipsum (ASCII) 84.902 KiB 15.31 GiB/s 0.006 ms
Arabic lipsum 79.771 KiB 0.27 GiB/s 0.279 ms
Chinese lipsum 68.203 KiB 0.34 GiB/s 0.194 ms
Arabic + 2 * ASCII 249.577 KiB 0.71 GiB/s 0.338 ms

#61119 :

Test Size Throughput Mean Time
Latin lipsum (ASCII) 84.902 KiB 36.63 GiB/s 0.002 ms
Arabic lipsum 79.771 KiB 0.28 GiB/s 0.272 ms
Chinese lipsum 68.203 KiB 0.33 GiB/s 0.197 ms
Arabic + 2 * ASCII 249.577 KiB 0.68 GiB/s 0.351 ms

PR:

Test Size Throughput Mean Time
Latin lipsum (ASCII) 84.902 KiB 36.71 GiB/s 0.002 ms
Arabic lipsum 79.771 KiB 1.70 GiB/s 0.046 ms
Chinese lipsum 68.203 KiB 2.97 GiB/s 0.022 ms
Arabic + 2 * ASCII 249.577 KiB 3.01 GiB/s 0.082 ms

cc @nodejs/performance

@nodejs-github-bot nodejs-github-bot added buffer Issues and PRs related to the buffer subsystem. c++ Issues and PRs that require attention from people who are familiar with C++. needs-ci PRs that need a full CI run. labels Dec 20, 2025
@ChALkeR ChALkeR force-pushed the chalker/non-ascii/0 branch from 63c910f to 5b2b040 Compare December 20, 2025 05:38
@ChALkeR ChALkeR force-pushed the chalker/non-ascii/0 branch from 5b2b040 to aee5408 Compare December 20, 2025 05:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

buffer Issues and PRs related to the buffer subsystem. c++ Issues and PRs that require attention from people who are familiar with C++. needs-ci PRs that need a full CI run.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants