Use memchr to search for characters to escape #664

Dr-Emann · 2023-10-12T00:47:35Z

Use memchr iterators to search for characters to escape.

In some (most) cases, we need to combine multiple memchr searches (since memchr allows searching for up to 3 chars at the same time), so introduce a MergeIter type, which takes two iterators, and combines them in order.

This appears to be better for performance in almost all cases, ~~except it's very slightly slower for escaping small strings with escapes present~~ thanks to dcb2104, it's actually faster even in that case. However, it's MUCH faster when there are no characters to escape, or when escaping a long string.

Only results shown with >1% change, reported by critcmp

Results for M1 Pro Macbook

One event/Comment
-----------------
after      1.00      61.3±0.63ns       ? ?/sec
before     1.02      62.2±1.13ns       ? ?/sec

attributes/with_checks = false
------------------------------
after      1.00      32.3±0.42µs       ? ?/sec
before     1.02      32.9±0.35µs       ? ?/sec

decode_and_parse_document/linescore.xml
---------------------------------------
after      1.00      10.1±0.15µs  350.9 MB/sec
before     1.02      10.2±0.09µs  345.4 MB/sec

decode_and_parse_document/players.xml
-------------------------------------
after      1.00      62.4±0.37µs  232.4 MB/sec
before     1.02      63.4±0.44µs  228.5 MB/sec

decode_and_parse_document/rpm_primary.xml
-----------------------------------------
after      1.00      60.6±0.40µs  334.7 MB/sec
before     1.01      61.3±0.58µs  330.4 MB/sec

decode_and_parse_document/sample_ns.xml
---------------------------------------
after      1.00       2.4±0.02µs  298.7 MB/sec
before     1.01       2.5±0.02µs  294.6 MB/sec

decode_and_parse_document/sample_rss.xml
----------------------------------------
after      1.00     244.4±1.83µs  771.8 MB/sec
before     1.01     247.0±1.83µs  763.4 MB/sec

decode_and_parse_document_with_namespaces/document.xml
------------------------------------------------------
after      1.00      64.0±0.68µs  171.8 MB/sec
before     1.01      64.8±0.62µs  169.5 MB/sec

decode_and_parse_document_with_namespaces/libreoffice_document.fodt
-------------------------------------------------------------------
after      1.00     232.2±3.44µs  235.1 MB/sec
before     1.01     235.3±4.81µs  232.1 MB/sec

decode_and_parse_document_with_namespaces/linescore.xml
-------------------------------------------------------
after      1.00      12.8±0.09µs  276.0 MB/sec
before     1.02      13.0±0.27µs  271.9 MB/sec

decode_and_parse_document_with_namespaces/players.xml
-----------------------------------------------------
after      1.00      78.1±0.49µs  185.5 MB/sec
before     1.01      79.2±0.83µs  183.0 MB/sec

decode_and_parse_document_with_namespaces/rpm_filelists.xml
-----------------------------------------------------------
after      1.00      39.7±0.33µs  277.0 MB/sec
before     1.02      40.5±0.42µs  271.4 MB/sec

decode_and_parse_document_with_namespaces/rpm_other.xml
-------------------------------------------------------
after      1.00      63.5±0.69µs  348.7 MB/sec
before     1.02      64.7±1.13µs  342.1 MB/sec

decode_and_parse_document_with_namespaces/rpm_primary.xml
---------------------------------------------------------
after      1.00      90.4±1.10µs  224.1 MB/sec
before     1.02      92.1±1.28µs  220.1 MB/sec

decode_and_parse_document_with_namespaces/rpm_primary2.xml
----------------------------------------------------------
after      1.00      29.3±0.31µs  244.6 MB/sec
before     1.03      30.1±0.22µs  237.9 MB/sec

decode_and_parse_document_with_namespaces/sample_1.xml
------------------------------------------------------
after      1.00       4.7±0.05µs  232.6 MB/sec
before     1.01       4.8±0.04µs  230.1 MB/sec

decode_and_parse_document_with_namespaces/sample_ns.xml
-------------------------------------------------------
after      1.00       3.9±0.03µs  187.5 MB/sec
before     1.01       3.9±0.02µs  184.9 MB/sec

decode_and_parse_document_with_namespaces/sample_rss.xml
--------------------------------------------------------
after      1.00     374.5±3.32µs  503.6 MB/sec
before     1.02     380.4±3.39µs  495.8 MB/sec

escape_text/escaped_chars_long
------------------------------
after      1.00     271.8±4.24ns       ? ?/sec
before     3.54     962.1±7.71ns       ? ?/sec

escape_text/escaped_chars_short
-------------------------------
before     1.00     223.3±3.96ns       ? ?/sec
after      1.10     246.5±1.40ns       ? ?/sec

escape_text/no_chars_to_escape_long
-----------------------------------
after      1.00      71.1±3.15ns       ? ?/sec
before    10.58     752.6±6.29ns       ? ?/sec

escape_text/no_chars_to_escape_short
------------------------------------
after      1.00       7.9±0.11ns       ? ?/sec
before     1.24       9.8±0.08ns       ? ?/sec

parse_document_nocopy/document.xml
----------------------------------
after      1.00      40.9±0.29µs  268.4 MB/sec
before     1.01      41.4±0.75µs  265.2 MB/sec

parse_document_nocopy/linescore.xml
-----------------------------------
after      1.00      10.1±0.09µs  350.2 MB/sec
before     1.01      10.2±0.14µs  346.0 MB/sec

parse_document_nocopy/rpm_other.xml
-----------------------------------
after      1.00      48.4±0.31µs  457.5 MB/sec
before     1.02      49.5±1.26µs  447.7 MB/sec

parse_document_nocopy/sample_rss.xml
------------------------------------
after      1.00     260.5±1.47µs  724.0 MB/sec
before     1.01     264.1±2.39µs  714.2 MB/sec

parse_document_nocopy_with_namespaces/document.xml
--------------------------------------------------
after      1.00      64.1±0.39µs  171.4 MB/sec
before     1.02      65.4±0.68µs  168.0 MB/sec

parse_document_nocopy_with_namespaces/libreoffice_document.fodt
---------------------------------------------------------------
after      1.00     241.4±2.64µs  226.2 MB/sec
before     1.01     244.5±2.95µs  223.3 MB/sec

parse_document_nocopy_with_namespaces/linescore.xml
---------------------------------------------------
after      1.00      12.7±0.11µs  278.7 MB/sec
before     1.02      12.9±0.14µs  273.3 MB/sec

parse_document_nocopy_with_namespaces/players.xml
-------------------------------------------------
after      1.00      78.5±0.55µs  184.6 MB/sec
before     1.01      79.5±0.60µs  182.4 MB/sec

parse_document_nocopy_with_namespaces/rpm_filelists.xml
-------------------------------------------------------
after      1.00      42.5±0.42µs  258.4 MB/sec
before     1.01      43.1±0.39µs  254.8 MB/sec

parse_document_nocopy_with_namespaces/rpm_other.xml
---------------------------------------------------
after      1.00      65.9±0.42µs  335.9 MB/sec
before     1.01      66.6±0.48µs  332.2 MB/sec

parse_document_nocopy_with_namespaces/rpm_primary2.xml
------------------------------------------------------
after      1.00      29.8±0.31µs  240.8 MB/sec
before     1.01      30.2±0.27µs  237.4 MB/sec

parse_document_nocopy_with_namespaces/sample_1.xml
--------------------------------------------------
after      1.00       4.6±0.05µs  241.0 MB/sec
before     1.02       4.7±0.07µs  235.9 MB/sec

parse_document_nocopy_with_namespaces/sample_ns.xml
---------------------------------------------------
after      1.00       3.7±0.03µs  193.3 MB/sec
before     1.01       3.8±0.04µs  191.2 MB/sec

parse_document_nocopy_with_namespaces/sample_rss.xml
----------------------------------------------------
after      1.00     397.2±2.82µs  474.8 MB/sec
before     1.02     404.5±5.94µs  466.3 MB/sec

parse_document_nocopy_with_namespaces/test_writer_ident.xml
-----------------------------------------------------------
after      1.00      13.7±0.10µs  309.8 MB/sec
before     1.02      14.0±0.12µs  303.0 MB/sec

read_event/trim_text = false
----------------------------
before     1.00      93.1±0.64µs       ? ?/sec
after      1.01      94.3±0.90µs       ? ?/sec

read_event/trim_text = true
---------------------------
before     1.00      94.7±0.80µs       ? ?/sec
after      1.02      96.9±1.12µs       ? ?/sec

unescape_text/char_reference
----------------------------
after      1.00     105.8±0.75ns       ? ?/sec
before     1.03     108.6±0.80ns       ? ?/sec

unescape_text/entity_reference
------------------------------
after      1.00     114.8±2.80ns       ? ?/sec
before     1.02     116.9±1.70ns       ? ?/sec

unescape_text/mixed
-------------------
after      1.00     127.3±0.79ns       ? ?/sec
before     1.01     128.8±1.15ns       ? ?/sec

unescape_text/no_chars_to_unescape_long
---------------------------------------
after      1.00      34.3±0.77ns       ? ?/sec
before     1.02      35.1±0.41ns       ? ?/sec

unescape_text/no_chars_to_unescape_short
----------------------------------------
after      1.00       4.4±0.03ns       ? ?/sec
before     1.03       4.5±0.03ns       ? ?/sec

Results for x64 i5-6600K Windows

NsReader::read_resolved_event_into/trim_text = true
---------------------------------------------------
after      1.00  1042.6±125.28µs       ? ?/sec
before     1.41  1469.3±354.43µs       ? ?/sec

One event/CData
---------------
after      1.00     94.5±66.69ns       ? ?/sec
before     2.16    204.0±22.39ns       ? ?/sec

One event/Comment
-----------------
after      1.00     129.3±5.60ns       ? ?/sec
before     3.63    468.9±29.17ns       ? ?/sec

One event/Start
---------------
after      1.00    335.9±20.11ns       ? ?/sec
before     2.31   776.9±106.31ns       ? ?/sec

attributes/try_get_attribute
----------------------------
before     1.00    376.1±51.69µs       ? ?/sec
after      1.05    395.3±48.46µs       ? ?/sec

attributes/with_checks = false
------------------------------
before     1.00    198.9±40.57µs       ? ?/sec
after      1.10    219.5±28.25µs       ? ?/sec

attributes/with_checks = true
-----------------------------
after      1.00    232.6±63.64µs       ? ?/sec
before     1.55    359.7±51.88µs       ? ?/sec

decode_and_parse_document/document.xml
--------------------------------------
after      1.00    357.3±43.03µs   30.8 MB/sec
before     1.10    393.0±46.29µs   28.0 MB/sec

decode_and_parse_document/libreoffice_document.fodt
---------------------------------------------------
after      1.00   625.6±461.14µs   87.3 MB/sec
before     2.16  1348.4±186.66µs   40.5 MB/sec

decode_and_parse_document/linescore.xml
---------------------------------------
after      1.00     77.3±16.13µs   45.7 MB/sec
before     1.31    100.9±10.21µs   35.0 MB/sec

decode_and_parse_document/players.xml
-------------------------------------
after      1.00   489.7±113.93µs   29.6 MB/sec
before     1.22    598.4±74.78µs   24.2 MB/sec

decode_and_parse_document/rpm_filelists.xml
-------------------------------------------
after      1.00      55.9±1.42µs  196.6 MB/sec
before     4.97    277.5±26.25µs   39.6 MB/sec

decode_and_parse_document/rpm_other.xml
---------------------------------------
after      1.00      90.8±2.71µs  243.8 MB/sec
before     5.10    463.0±44.96µs   47.8 MB/sec

decode_and_parse_document/rpm_primary.xml
-----------------------------------------
after      1.00     130.3±9.67µs  155.5 MB/sec
before     4.64    605.1±92.95µs   33.5 MB/sec

decode_and_parse_document/rpm_primary2.xml
------------------------------------------
after      1.00      41.7±0.97µs  171.9 MB/sec
before     5.41    225.8±21.91µs   31.8 MB/sec

decode_and_parse_document/sample_1.xml
--------------------------------------
after      1.00      29.3±4.07µs   37.5 MB/sec
before     1.41      41.3±2.50µs   26.6 MB/sec

decode_and_parse_document/sample_ns.xml
---------------------------------------
after      1.00      21.9±5.67µs   33.0 MB/sec
before     1.43      31.4±2.94µs   23.0 MB/sec

decode_and_parse_document/sample_rss.xml
----------------------------------------
after      1.00  1946.7±378.24µs   96.9 MB/sec
before     1.20       2.3±0.35ms   80.6 MB/sec

decode_and_parse_document/test_writer_ident.xml
-----------------------------------------------
after      1.00     88.9±14.65µs   47.7 MB/sec
before     1.06     94.4±12.79µs   44.9 MB/sec

decode_and_parse_document_with_namespaces/document.xml
------------------------------------------------------
before     1.00    619.9±35.60µs   17.7 MB/sec
after      1.24   769.0±185.44µs   14.3 MB/sec

decode_and_parse_document_with_namespaces/libreoffice_document.fodt
-------------------------------------------------------------------
before     1.00       2.2±0.13ms   24.7 MB/sec
after      1.08       2.4±0.62ms   22.8 MB/sec

decode_and_parse_document_with_namespaces/linescore.xml
-------------------------------------------------------
after      1.00     95.6±29.48µs   36.9 MB/sec
before     1.38    132.4±15.72µs   26.7 MB/sec

decode_and_parse_document_with_namespaces/players.xml
-----------------------------------------------------
after      1.00    568.8±94.68µs   25.5 MB/sec
before     1.33    758.9±94.28µs   19.1 MB/sec

decode_and_parse_document_with_namespaces/rpm_other.xml
-------------------------------------------------------
after      1.00   444.7±137.30µs   49.8 MB/sec
before     1.20   532.8±104.49µs   41.5 MB/sec

decode_and_parse_document_with_namespaces/rpm_primary.xml
---------------------------------------------------------
after      1.00   785.6±170.18µs   25.8 MB/sec
before     1.10   864.3±108.22µs   23.4 MB/sec

decode_and_parse_document_with_namespaces/rpm_primary2.xml
----------------------------------------------------------
after      1.00    128.7±52.72µs   55.7 MB/sec
before     2.24    288.2±38.70µs   24.9 MB/sec

decode_and_parse_document_with_namespaces/sample_1.xml
------------------------------------------------------
after      1.00      35.1±7.37µs   31.3 MB/sec
before     1.19      41.7±9.23µs   26.3 MB/sec

decode_and_parse_document_with_namespaces/sample_ns.xml
-------------------------------------------------------
before     1.00      43.0±5.07µs   16.8 MB/sec
after      1.11     47.7±10.96µs   15.1 MB/sec

decode_and_parse_document_with_namespaces/sample_rss.xml
--------------------------------------------------------
after      1.00       2.9±0.54ms   64.2 MB/sec
before     1.03       3.0±0.57ms   62.5 MB/sec

decode_and_parse_document_with_namespaces/test_writer_ident.xml
---------------------------------------------------------------
after      1.00     50.9±26.75µs   83.4 MB/sec
before     2.88    146.6±12.70µs   28.9 MB/sec

escape_text/escaped_chars_long
------------------------------
after      1.00  1802.4±127.81ns       ? ?/sec
before     5.00       9.0±1.51µs       ? ?/sec

escape_text/escaped_chars_short
-------------------------------
before     1.00  1945.6±253.72ns       ? ?/sec
after      1.21       2.4±0.32µs       ? ?/sec

escape_text/no_chars_to_escape_long
-----------------------------------
after      1.00    282.7±38.03ns       ? ?/sec
before    25.28       7.1±1.21µs       ? ?/sec

escape_text/no_chars_to_escape_short
------------------------------------
before     1.00      65.0±4.89ns       ? ?/sec
after      1.73     112.0±1.80ns       ? ?/sec

parse_document_nocopy/document.xml
----------------------------------
after      1.00    359.9±87.72µs   30.5 MB/sec
before     1.06    381.5±43.36µs   28.8 MB/sec

parse_document_nocopy/libreoffice_document.fodt
-----------------------------------------------
after      1.00  1296.6±231.55µs   42.1 MB/sec
before     1.05  1360.3±141.50µs   40.1 MB/sec

parse_document_nocopy/linescore.xml
-----------------------------------
after      1.00      23.1±2.63µs  152.7 MB/sec
before     4.18     96.7±12.06µs   36.5 MB/sec

parse_document_nocopy/players.xml
---------------------------------
after      1.00    199.5±17.16µs   72.6 MB/sec
before     2.96    591.6±92.83µs   24.5 MB/sec

parse_document_nocopy/rpm_other.xml
-----------------------------------
after      1.00    398.5±49.44µs   55.6 MB/sec
before     1.06    422.6±61.23µs   52.4 MB/sec

parse_document_nocopy/rpm_primary.xml
-------------------------------------
after      1.00    413.2±97.02µs   49.1 MB/sec
before     1.21    499.3±87.07µs   40.6 MB/sec

parse_document_nocopy/rpm_primary2.xml
--------------------------------------
after      1.00    149.8±41.54µs   47.9 MB/sec
before     1.43    213.5±19.55µs   33.6 MB/sec

parse_document_nocopy/sample_1.xml
----------------------------------
after      1.00      21.7±8.18µs   50.6 MB/sec
before     1.48      32.1±4.09µs   34.2 MB/sec

parse_document_nocopy/sample_ns.xml
-----------------------------------
after      1.00       8.0±3.02µs   89.9 MB/sec
before     3.35      27.0±3.00µs   26.8 MB/sec

parse_document_nocopy/sample_rss.xml
------------------------------------
after      1.00  1444.7±755.53µs  130.5 MB/sec
before     1.55       2.2±0.37ms   84.4 MB/sec

parse_document_nocopy/test_writer_ident.xml
-------------------------------------------
after      1.00     82.8±12.54µs   51.2 MB/sec
before     1.12     92.9±10.90µs   45.7 MB/sec

parse_document_nocopy_with_namespaces/document.xml
--------------------------------------------------
after      1.00    539.0±70.22µs   20.4 MB/sec
before     1.09    585.1±58.45µs   18.8 MB/sec

parse_document_nocopy_with_namespaces/libreoffice_document.fodt
---------------------------------------------------------------
after      1.00  1847.3±458.23µs   29.6 MB/sec
before     1.12       2.1±0.22ms   26.4 MB/sec

parse_document_nocopy_with_namespaces/linescore.xml
---------------------------------------------------
after      1.00      23.3±5.62µs  151.7 MB/sec
before     5.29    123.1±17.55µs   28.7 MB/sec

parse_document_nocopy_with_namespaces/players.xml
-------------------------------------------------
after      1.00   518.6±114.42µs   28.0 MB/sec
before     1.31   678.7±105.20µs   21.4 MB/sec

parse_document_nocopy_with_namespaces/rpm_filelists.xml
-------------------------------------------------------
before     1.00    374.3±38.16µs   29.3 MB/sec
after      1.06    395.6±41.30µs   27.8 MB/sec

parse_document_nocopy_with_namespaces/rpm_other.xml
---------------------------------------------------
after      1.00   511.8±102.05µs   43.3 MB/sec
before     1.12    573.7±67.94µs   38.6 MB/sec

parse_document_nocopy_with_namespaces/rpm_primary.xml
-----------------------------------------------------
after      1.00   610.4±236.10µs   33.2 MB/sec
before     1.41   859.5±100.28µs   23.6 MB/sec

parse_document_nocopy_with_namespaces/rpm_primary2.xml
------------------------------------------------------
after      1.00    163.3±97.45µs   43.9 MB/sec
before     1.58    257.8±42.44µs   27.8 MB/sec

parse_document_nocopy_with_namespaces/sample_1.xml
--------------------------------------------------
before     1.00      40.6±7.39µs   27.1 MB/sec
after      1.12     45.4±21.66µs   24.2 MB/sec

parse_document_nocopy_with_namespaces/sample_ns.xml
---------------------------------------------------
after      1.00      30.0±7.26µs   24.1 MB/sec
before     1.36      40.9±4.09µs   17.7 MB/sec

parse_document_nocopy_with_namespaces/sample_rss.xml
----------------------------------------------------
after      1.00       2.7±0.45ms   70.8 MB/sec
before     1.03       2.8±0.52ms   68.5 MB/sec

parse_document_nocopy_with_namespaces/test_writer_ident.xml
-----------------------------------------------------------
after      1.00     70.3±23.15µs   60.4 MB/sec
before     1.81    127.0±19.06µs   33.4 MB/sec

read_event/trim_text = false
----------------------------
after      1.00   520.7±240.59µs       ? ?/sec
before     1.62    846.1±92.54µs       ? ?/sec

read_event/trim_text = true
---------------------------
after      1.00    430.2±94.41µs       ? ?/sec
before     1.84    791.1±97.12µs       ? ?/sec

unescape_text/entity_reference
------------------------------
after      1.00  1302.7±146.69ns       ? ?/sec
before     1.12  1462.2±131.84ns       ? ?/sec

unescape_text/no_chars_to_unescape_long
---------------------------------------
after      1.00    132.6±11.71ns       ? ?/sec
before     1.05    139.7±17.07ns       ? ?/sec

unescape_text/no_chars_to_unescape_short
----------------------------------------
before     1.00      44.5±4.97ns       ? ?/sec
after      1.06      47.2±4.74ns       ? ?/sec

dralley · 2023-10-12T01:01:07Z

I tried this a while back with jetscii and didn't have a ton of luck. I'll take a closer look at this later this week, it looks interesting.

#405

#408

shepmaster/jetscii#54 (comment)

Dr-Emann · 2023-10-12T01:49:00Z

Yeah, I actually just posted a PR to jetscii, and found that even for cases that needed multiple memchr calls, memchr seemed to be faster, and based on the fact that the benchmarks were xml-related, came here to try. 😄

dralley · 2023-10-12T01:58:36Z

Yes, pcmpestrm seems to just be a neglected, minimum-effort instruction. Whether that's because it never really got adopted or whether nobody adopted it because it's "meh", I'm not sure.

I've seen some interesting ideas about using lookup tables and/or avx256 to get really good performance but haven't really investigated further.

shepmaster/jetscii#22

I believe this is quite a bit faster, because rust only has to verify that each string slicing operation starts/ends at character boundaries, none of the inner bytes need to be checked for UTF8-ness since they're already from a `&str` Also, when initially creating the escaped string, preallocate a little extra room, since we know the string will grow.

Dr-Emann · 2023-10-12T06:19:32Z

I was also playing with a bitset lookup table in master...Dr-Emann:quick-xml:bitset_escape

codecov-commenter · 2023-10-12T07:08:27Z

Codecov Report

Merging #664 (c6ec23a) into master (ca1c09a) will increase coverage by 0.05%.
Report is 15 commits behind head on master.
The diff coverage is 99.55%.

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

@@            Coverage Diff             @@
##           master     #664      +/-   ##
==========================================
+ Coverage   64.63%   64.68%   +0.05%     
==========================================
  Files          36       37       +1     
  Lines       17289    17618     +329     
==========================================
+ Hits        11175    11397     +222     
- Misses       6114     6221     +107

Flag	Coverage Δ
unittests	`64.68% <99.55%> (+0.05%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files	Coverage Δ
src/escapei.rs	`13.55% <100.00%> (+0.30%)`	⬆️
src/se/simple_type.rs	`98.22% <100.00%> (+0.19%)`	⬆️
src/utils.rs	`86.98% <98.03%> (+4.77%)`	⬆️

... and 4 files with indirect coverage changes

dralley · 2023-10-12T14:20:44Z

I was also playing with a bitset lookup table in master...Dr-Emann:quick-xml:bitset_escape

How did that go? Do you have results for that?

Dr-Emann · 2023-10-12T15:31:31Z

It looks like using a bitmap is better than master, but this PR is better than both nearly across the board, at least on my M1 mac.

Note, the results for this PR are a little better than what is in the original PR description, thanks to dcb2104.

master vs bitmap on M1 mac

group                                                              before                                 bitmask
-----                                                              ------                                 -------
One event/Comment                                                  1.00     66.2±0.42ns        ? ?/sec    1.01     67.0±0.64ns        ? ?/sec
One event/Start                                                    1.00     78.7±1.59ns        ? ?/sec    1.02     80.0±1.51ns        ? ?/sec
attributes/with_checks = true                                      1.01     47.5±0.47µs        ? ?/sec    1.00     47.0±0.37µs        ? ?/sec
decode_and_parse_document_with_namespaces/rpm_primary.xml          1.01     95.0±0.57µs   213.5 MB/sec    1.00     93.6±0.93µs   216.5 MB/sec
decode_and_parse_document_with_namespaces/rpm_primary2.xml         1.01     30.9±0.24µs   231.8 MB/sec    1.00     30.6±0.29µs   234.4 MB/sec
escape_text/escaped_chars_long                                     1.26    933.6±6.53ns        ? ?/sec    1.00   743.6±12.38ns        ? ?/sec
escape_text/escaped_chars_short                                    1.70    217.0±1.40ns        ? ?/sec    1.00    127.3±3.28ns        ? ?/sec
escape_text/no_chars_to_escape_long                                1.25    750.3±4.73ns        ? ?/sec    1.00    599.5±5.07ns        ? ?/sec
escape_text/no_chars_to_escape_short                               1.10      9.7±0.07ns        ? ?/sec    1.00      8.8±0.10ns        ? ?/sec
parse_document_nocopy/rpm_filelists.xml                            1.02     29.1±0.77µs   377.9 MB/sec    1.00     28.4±0.22µs   386.0 MB/sec
parse_document_nocopy_with_namespaces/libreoffice_document.fodt    1.02    239.8±3.68µs   227.7 MB/sec    1.00    234.2±2.32µs   233.1 MB/sec
parse_document_nocopy_with_namespaces/rpm_filelists.xml            1.02     42.0±0.35µs   261.3 MB/sec    1.00     41.0±0.30µs   267.6 MB/sec
parse_document_nocopy_with_namespaces/rpm_other.xml                1.02     68.1±1.18µs   325.2 MB/sec    1.00     66.8±0.46µs   331.2 MB/sec
parse_document_nocopy_with_namespaces/rpm_primary.xml              1.01     93.4±0.80µs   217.1 MB/sec    1.00     92.4±0.81µs   219.3 MB/sec
read_event/trim_text = false                                       1.00     94.3±1.00µs        ? ?/sec    1.01     95.4±0.85µs        ? ?/sec
read_event/trim_text = true                                        1.00     96.2±0.70µs        ? ?/sec    1.01     97.6±0.94µs        ? ?/sec
unescape_text/mixed                                                1.00    130.4±0.80ns        ? ?/sec    1.04    136.3±1.32ns        ? ?/sec

bitmap vs this PR, M1 mac

group                                                              after                                  bitmask
-----                                                              -----                                  -------
One event/Start                                                    1.00     79.1±0.78ns        ? ?/sec    1.01     80.0±1.51ns        ? ?/sec
decode_and_parse_document/libreoffice_document.fodt                1.00    136.4±3.07µs   400.2 MB/sec    1.02    139.1±2.38µs   392.5 MB/sec
decode_and_parse_document/linescore.xml                            1.00     10.5±0.07µs   334.8 MB/sec    1.02     10.8±0.10µs   327.5 MB/sec
decode_and_parse_document/players.xml                              1.00     63.6±0.65µs   227.9 MB/sec    1.02     64.8±0.52µs   223.8 MB/sec
decode_and_parse_document/rpm_filelists.xml                        1.00     27.4±0.48µs   400.4 MB/sec    1.01     27.8±0.26µs   395.6 MB/sec
decode_and_parse_document/rpm_other.xml                            1.00     46.8±0.58µs   472.8 MB/sec    1.02     47.7±0.35µs   463.8 MB/sec
decode_and_parse_document/rpm_primary.xml                          1.00     61.9±0.54µs   327.2 MB/sec    1.03     64.0±0.85µs   316.8 MB/sec
decode_and_parse_document/rpm_primary2.xml                         1.00     20.8±0.14µs   345.6 MB/sec    1.02     21.1±0.14µs   339.9 MB/sec
decode_and_parse_document/sample_1.xml                             1.00      3.5±0.02µs   311.7 MB/sec    1.02      3.6±0.03µs   304.9 MB/sec
decode_and_parse_document/sample_rss.xml                           1.00    249.6±3.90µs   755.7 MB/sec    1.01    252.2±2.48µs   747.8 MB/sec
decode_and_parse_document/test_writer_ident.xml                    1.00      9.0±0.07µs   472.4 MB/sec    1.01      9.1±0.08µs   465.8 MB/sec
decode_and_parse_document_with_namespaces/document.xml             1.01     64.9±0.56µs   169.3 MB/sec    1.00     64.0±0.70µs   171.8 MB/sec
decode_and_parse_document_with_namespaces/linescore.xml            1.00     13.3±0.12µs   266.3 MB/sec    1.03     13.6±0.19µs   259.8 MB/sec
decode_and_parse_document_with_namespaces/players.xml              1.00     79.4±0.68µs   182.5 MB/sec    1.02     80.6±0.95µs   179.8 MB/sec
decode_and_parse_document_with_namespaces/rpm_filelists.xml        1.00     40.4±0.28µs   272.1 MB/sec    1.02     41.4±0.36µs   265.6 MB/sec
decode_and_parse_document_with_namespaces/rpm_other.xml            1.00     64.4±0.67µs   343.8 MB/sec    1.02     65.5±0.66µs   338.1 MB/sec
decode_and_parse_document_with_namespaces/rpm_primary.xml          1.00     90.9±0.83µs   222.8 MB/sec    1.03     93.6±0.93µs   216.5 MB/sec
decode_and_parse_document_with_namespaces/rpm_primary2.xml         1.00     30.2±0.44µs   237.5 MB/sec    1.01     30.6±0.29µs   234.4 MB/sec
decode_and_parse_document_with_namespaces/sample_1.xml             1.00      4.9±0.06µs   223.3 MB/sec    1.02      5.0±0.04µs   219.7 MB/sec
decode_and_parse_document_with_namespaces/sample_ns.xml            1.00      4.0±0.04µs   180.0 MB/sec    1.03      4.1±0.05µs   174.7 MB/sec
decode_and_parse_document_with_namespaces/sample_rss.xml           1.00    385.7±5.79µs   489.0 MB/sec    1.03    395.4±5.26µs   477.0 MB/sec
escape_text/escaped_chars_long                                     1.00    157.4±1.90ns        ? ?/sec    4.72   743.6±12.38ns        ? ?/sec
escape_text/escaped_chars_short                                    1.24    157.7±1.43ns        ? ?/sec    1.00    127.3±3.28ns        ? ?/sec
escape_text/no_chars_to_escape_long                                1.00     67.5±0.73ns        ? ?/sec    8.88    599.5±5.07ns        ? ?/sec
escape_text/no_chars_to_escape_short                               1.00      8.5±0.05ns        ? ?/sec    1.03      8.8±0.10ns        ? ?/sec
parse_document_nocopy/document.xml                                 1.01     40.6±1.13µs   270.8 MB/sec    1.00     40.2±0.37µs   273.6 MB/sec
parse_document_nocopy/libreoffice_document.fodt                    1.00    138.7±1.38µs   393.7 MB/sec    1.02    141.7±1.40µs   385.2 MB/sec
parse_document_nocopy/linescore.xml                                1.00     10.4±0.07µs   340.7 MB/sec    1.02     10.5±0.08µs   334.9 MB/sec
parse_document_nocopy/players.xml                                  1.00     63.2±0.53µs   229.4 MB/sec    1.02     64.2±0.48µs   225.9 MB/sec
parse_document_nocopy/rpm_filelists.xml                            1.00     28.1±0.17µs   391.1 MB/sec    1.01     28.4±0.22µs   386.0 MB/sec
parse_document_nocopy/rpm_other.xml                                1.00     48.5±0.27µs   456.7 MB/sec    1.03     49.8±0.65µs   444.5 MB/sec
parse_document_nocopy/rpm_primary.xml                              1.00     61.9±0.36µs   327.6 MB/sec    1.03     63.9±0.78µs   317.2 MB/sec
parse_document_nocopy/rpm_primary2.xml                             1.00     20.6±0.13µs   348.3 MB/sec    1.02     21.0±0.16µs   341.6 MB/sec
parse_document_nocopy/sample_1.xml                                 1.00      3.4±0.04µs   325.7 MB/sec    1.03      3.5±0.04µs   316.4 MB/sec
parse_document_nocopy/sample_ns.xml                                1.00      2.5±0.02µs   292.8 MB/sec    1.02      2.5±0.03µs   287.7 MB/sec
parse_document_nocopy/sample_rss.xml                               1.00    257.4±2.05µs   732.7 MB/sec    1.01    260.8±2.06µs   723.3 MB/sec
parse_document_nocopy_with_namespaces/libreoffice_document.fodt    1.00    230.6±3.25µs   236.8 MB/sec    1.02    234.2±2.32µs   233.1 MB/sec
parse_document_nocopy_with_namespaces/linescore.xml                1.00     13.0±0.10µs   270.8 MB/sec    1.02     13.3±0.10µs   266.3 MB/sec
parse_document_nocopy_with_namespaces/players.xml                  1.00     78.5±0.52µs   184.7 MB/sec    1.02     79.9±0.91µs   181.5 MB/sec
parse_document_nocopy_with_namespaces/rpm_primary.xml              1.00     91.1±0.77µs   222.5 MB/sec    1.01     92.4±0.81µs   219.3 MB/sec
parse_document_nocopy_with_namespaces/rpm_primary2.xml             1.00     29.7±0.20µs   241.3 MB/sec    1.01     30.1±0.37µs   238.0 MB/sec
parse_document_nocopy_with_namespaces/sample_1.xml                 1.00      4.7±0.03µs   233.6 MB/sec    1.02      4.8±0.03µs   227.9 MB/sec
parse_document_nocopy_with_namespaces/sample_ns.xml                1.00      3.9±0.04µs   187.8 MB/sec    1.02      3.9±0.07µs   184.3 MB/sec
parse_document_nocopy_with_namespaces/sample_rss.xml               1.00    386.9±3.33µs   487.5 MB/sec    1.03    398.9±5.25µs   472.8 MB/sec
parse_document_nocopy_with_namespaces/test_writer_ident.xml        1.00     13.8±0.12µs   307.1 MB/sec    1.01     14.0±0.15µs   302.6 MB/sec
read_event/trim_text = false                                       1.02     97.6±1.36µs        ? ?/sec    1.00     95.4±0.85µs        ? ?/sec
unescape_text/char_reference                                       1.00    101.3±0.87ns        ? ?/sec    1.06    107.0±1.24ns        ? ?/sec
unescape_text/entity_reference                                     1.00    134.3±1.35ns        ? ?/sec    1.02    137.5±4.09ns        ? ?/sec
unescape_text/mixed                                                1.00    133.5±1.02ns        ? ?/sec    1.02    136.3±1.32ns        ? ?/sec

Combined results

group                                                                  after                                  before                                 bitmask
-----                                                                  -----                                  ------                                 -------
NsReader::read_resolved_event_into/trim_text = true                    1.01    197.0±2.01µs        ? ?/sec    1.00    194.5±1.38µs        ? ?/sec    1.01    196.1±2.68µs        ? ?/sec
One event/CData                                                        1.01     26.6±0.11ns        ? ?/sec    1.00     26.4±0.22ns        ? ?/sec    1.01     26.5±0.23ns        ? ?/sec
One event/Comment                                                      1.00     66.5±0.64ns        ? ?/sec    1.00     66.2±0.42ns        ? ?/sec    1.01     67.0±0.64ns        ? ?/sec
One event/Start                                                        1.01     79.1±0.78ns        ? ?/sec    1.00     78.7±1.59ns        ? ?/sec    1.02     80.0±1.51ns        ? ?/sec
attributes/with_checks = false                                         1.00     32.2±0.19µs        ? ?/sec    1.01     32.6±0.21µs        ? ?/sec    1.00     32.3±0.47µs        ? ?/sec
attributes/with_checks = true                                          1.01     47.4±1.02µs        ? ?/sec    1.01     47.5±0.47µs        ? ?/sec    1.00     47.0±0.37µs        ? ?/sec
decode_and_parse_document/libreoffice_document.fodt                    1.00    136.4±3.07µs   400.2 MB/sec    1.02    138.8±2.24µs   393.5 MB/sec    1.02    139.1±2.38µs   392.5 MB/sec
decode_and_parse_document/linescore.xml                                1.00     10.5±0.07µs   334.8 MB/sec    1.01     10.7±0.09µs   330.3 MB/sec    1.02     10.8±0.10µs   327.5 MB/sec
decode_and_parse_document/players.xml                                  1.00     63.6±0.65µs   227.9 MB/sec    1.02     64.7±0.47µs   224.1 MB/sec    1.02     64.8±0.52µs   223.8 MB/sec
decode_and_parse_document/rpm_filelists.xml                            1.00     27.4±0.48µs   400.4 MB/sec    1.02     27.9±0.30µs   393.4 MB/sec    1.01     27.8±0.26µs   395.6 MB/sec
decode_and_parse_document/rpm_other.xml                                1.00     46.8±0.58µs   472.8 MB/sec    1.02     47.8±0.83µs   463.2 MB/sec    1.02     47.7±0.35µs   463.8 MB/sec
decode_and_parse_document/rpm_primary.xml                              1.00     61.9±0.54µs   327.2 MB/sec    1.03     63.7±0.47µs   318.1 MB/sec    1.03     64.0±0.85µs   316.8 MB/sec
decode_and_parse_document/rpm_primary2.xml                             1.00     20.8±0.14µs   345.6 MB/sec    1.02     21.1±0.21µs   340.0 MB/sec    1.02     21.1±0.14µs   339.9 MB/sec
decode_and_parse_document/sample_1.xml                                 1.00      3.5±0.02µs   311.7 MB/sec    1.02      3.6±0.03µs   305.3 MB/sec    1.02      3.6±0.03µs   304.9 MB/sec
decode_and_parse_document/sample_ns.xml                                1.00      2.6±0.03µs   280.4 MB/sec    1.01      2.6±0.03µs   277.2 MB/sec    1.01      2.6±0.02µs   278.1 MB/sec
decode_and_parse_document/sample_rss.xml                               1.00    249.6±3.90µs   755.7 MB/sec    1.02    253.7±2.65µs   743.5 MB/sec    1.01    252.2±2.48µs   747.8 MB/sec
decode_and_parse_document/test_writer_ident.xml                        1.00      9.0±0.07µs   472.4 MB/sec    1.01      9.1±0.07µs   468.7 MB/sec    1.01      9.1±0.08µs   465.8 MB/sec
decode_and_parse_document_with_namespaces/document.xml                 1.01     64.9±0.56µs   169.3 MB/sec    1.00     64.3±0.54µs   171.0 MB/sec    1.00     64.0±0.70µs   171.8 MB/sec
decode_and_parse_document_with_namespaces/libreoffice_document.fodt    1.00    231.9±3.90µs   235.5 MB/sec    1.02    235.9±2.77µs   231.4 MB/sec    1.01    233.7±2.63µs   233.6 MB/sec
decode_and_parse_document_with_namespaces/linescore.xml                1.00     13.3±0.12µs   266.3 MB/sec    1.02     13.5±0.12µs   262.0 MB/sec    1.03     13.6±0.19µs   259.8 MB/sec
decode_and_parse_document_with_namespaces/players.xml                  1.00     79.4±0.68µs   182.5 MB/sec    1.02     80.6±0.65µs   179.8 MB/sec    1.02     80.6±0.95µs   179.8 MB/sec
decode_and_parse_document_with_namespaces/rpm_filelists.xml            1.00     40.4±0.28µs   272.1 MB/sec    1.03     41.5±0.33µs   264.9 MB/sec    1.02     41.4±0.36µs   265.6 MB/sec
decode_and_parse_document_with_namespaces/rpm_other.xml                1.00     64.4±0.67µs   343.8 MB/sec    1.03     66.1±1.34µs   334.8 MB/sec    1.02     65.5±0.66µs   338.1 MB/sec
decode_and_parse_document_with_namespaces/rpm_primary.xml              1.00     90.9±0.83µs   222.8 MB/sec    1.04     95.0±0.57µs   213.5 MB/sec    1.03     93.6±0.93µs   216.5 MB/sec
decode_and_parse_document_with_namespaces/rpm_primary2.xml             1.00     30.2±0.44µs   237.5 MB/sec    1.02     30.9±0.24µs   231.8 MB/sec    1.01     30.6±0.29µs   234.4 MB/sec
decode_and_parse_document_with_namespaces/sample_1.xml                 1.00      4.9±0.06µs   223.3 MB/sec    1.02      5.0±0.04µs   218.5 MB/sec    1.02      5.0±0.04µs   219.7 MB/sec
decode_and_parse_document_with_namespaces/sample_ns.xml                1.00      4.0±0.04µs   180.0 MB/sec    1.02      4.1±0.06µs   175.8 MB/sec    1.03      4.1±0.05µs   174.7 MB/sec
decode_and_parse_document_with_namespaces/sample_rss.xml               1.00    385.7±5.79µs   489.0 MB/sec    1.03    396.9±3.83µs   475.1 MB/sec    1.03    395.4±5.26µs   477.0 MB/sec
escape_text/escaped_chars_long                                         1.00    157.4±1.90ns        ? ?/sec    5.93    933.6±6.53ns        ? ?/sec    4.72   743.6±12.38ns        ? ?/sec
escape_text/escaped_chars_short                                        1.24    157.7±1.43ns        ? ?/sec    1.70    217.0±1.40ns        ? ?/sec    1.00    127.3±3.28ns        ? ?/sec
escape_text/no_chars_to_escape_long                                    1.00     67.5±0.73ns        ? ?/sec    11.12   750.3±4.73ns        ? ?/sec    8.88    599.5±5.07ns        ? ?/sec
escape_text/no_chars_to_escape_short                                   1.00      8.5±0.05ns        ? ?/sec    1.14      9.7±0.07ns        ? ?/sec    1.03      8.8±0.10ns        ? ?/sec
parse_document_nocopy/document.xml                                     1.01     40.6±1.13µs   270.8 MB/sec    1.00     40.2±0.54µs   273.6 MB/sec    1.00     40.2±0.37µs   273.6 MB/sec
parse_document_nocopy/libreoffice_document.fodt                        1.00    138.7±1.38µs   393.7 MB/sec    1.02    141.3±1.18µs   386.4 MB/sec    1.02    141.7±1.40µs   385.2 MB/sec
parse_document_nocopy/linescore.xml                                    1.00     10.4±0.07µs   340.7 MB/sec    1.02     10.5±0.09µs   335.5 MB/sec    1.02     10.5±0.08µs   334.9 MB/sec
parse_document_nocopy/players.xml                                      1.00     63.2±0.53µs   229.4 MB/sec    1.02     64.2±0.47µs   225.8 MB/sec    1.02     64.2±0.48µs   225.9 MB/sec
parse_document_nocopy/rpm_filelists.xml                                1.00     28.1±0.17µs   391.1 MB/sec    1.03     29.1±0.77µs   377.9 MB/sec    1.01     28.4±0.22µs   386.0 MB/sec
parse_document_nocopy/rpm_other.xml                                    1.00     48.5±0.27µs   456.7 MB/sec    1.02     49.6±0.33µs   446.8 MB/sec    1.03     49.8±0.65µs   444.5 MB/sec
parse_document_nocopy/rpm_primary.xml                                  1.00     61.9±0.36µs   327.6 MB/sec    1.03     64.0±0.85µs   316.8 MB/sec    1.03     63.9±0.78µs   317.2 MB/sec
parse_document_nocopy/rpm_primary2.xml                                 1.00     20.6±0.13µs   348.3 MB/sec    1.02     21.0±0.16µs   342.2 MB/sec    1.02     21.0±0.16µs   341.6 MB/sec
parse_document_nocopy/sample_1.xml                                     1.00      3.4±0.04µs   325.7 MB/sec    1.03      3.5±0.04µs   317.7 MB/sec    1.03      3.5±0.04µs   316.4 MB/sec
parse_document_nocopy/sample_ns.xml                                    1.00      2.5±0.02µs   292.8 MB/sec    1.02      2.5±0.03µs   287.3 MB/sec    1.02      2.5±0.03µs   287.7 MB/sec
parse_document_nocopy/sample_rss.xml                                   1.00    257.4±2.05µs   732.7 MB/sec    1.02    262.1±2.42µs   719.6 MB/sec    1.01    260.8±2.06µs   723.3 MB/sec
parse_document_nocopy_with_namespaces/libreoffice_document.fodt        1.00    230.6±3.25µs   236.8 MB/sec    1.04    239.8±3.68µs   227.7 MB/sec    1.02    234.2±2.32µs   233.1 MB/sec
parse_document_nocopy_with_namespaces/linescore.xml                    1.00     13.0±0.10µs   270.8 MB/sec    1.02     13.3±0.18µs   265.6 MB/sec    1.02     13.3±0.10µs   266.3 MB/sec
parse_document_nocopy_with_namespaces/players.xml                      1.00     78.5±0.52µs   184.7 MB/sec    1.02     80.0±0.67µs   181.2 MB/sec    1.02     79.9±0.91µs   181.5 MB/sec
parse_document_nocopy_with_namespaces/rpm_filelists.xml                1.00     40.8±0.40µs   269.5 MB/sec    1.03     42.0±0.35µs   261.3 MB/sec    1.01     41.0±0.30µs   267.6 MB/sec
parse_document_nocopy_with_namespaces/rpm_other.xml                    1.00     66.2±0.64µs   334.4 MB/sec    1.03     68.1±1.18µs   325.2 MB/sec    1.01     66.8±0.46µs   331.2 MB/sec
parse_document_nocopy_with_namespaces/rpm_primary.xml                  1.00     91.1±0.77µs   222.5 MB/sec    1.02     93.4±0.80µs   217.1 MB/sec    1.01     92.4±0.81µs   219.3 MB/sec
parse_document_nocopy_with_namespaces/rpm_primary2.xml                 1.00     29.7±0.20µs   241.3 MB/sec    1.02     30.2±0.26µs   237.2 MB/sec    1.01     30.1±0.37µs   238.0 MB/sec
parse_document_nocopy_with_namespaces/sample_1.xml                     1.00      4.7±0.03µs   233.6 MB/sec    1.03      4.8±0.04µs   226.9 MB/sec    1.02      4.8±0.03µs   227.9 MB/sec
parse_document_nocopy_with_namespaces/sample_ns.xml                    1.00      3.9±0.04µs   187.8 MB/sec    1.02      3.9±0.04µs   183.4 MB/sec    1.02      3.9±0.07µs   184.3 MB/sec
parse_document_nocopy_with_namespaces/sample_rss.xml                   1.00    386.9±3.33µs   487.5 MB/sec    1.04    401.8±3.33µs   469.4 MB/sec    1.03    398.9±5.25µs   472.8 MB/sec
parse_document_nocopy_with_namespaces/test_writer_ident.xml            1.00     13.8±0.12µs   307.1 MB/sec    1.02     14.1±0.15µs   300.9 MB/sec    1.01     14.0±0.15µs   302.6 MB/sec
read_event/trim_text = false                                           1.03     97.6±1.36µs        ? ?/sec    1.00     94.3±1.00µs        ? ?/sec    1.01     95.4±0.85µs        ? ?/sec
read_event/trim_text = true                                            1.02     97.7±0.95µs        ? ?/sec    1.00     96.2±0.70µs        ? ?/sec    1.01     97.6±0.94µs        ? ?/sec
unescape_text/char_reference                                           1.00    101.3±0.87ns        ? ?/sec    1.06    107.5±0.73ns        ? ?/sec    1.06    107.0±1.24ns        ? ?/sec
unescape_text/entity_reference                                         1.00    134.3±1.35ns        ? ?/sec    1.02    137.5±1.74ns        ? ?/sec    1.02    137.5±4.09ns        ? ?/sec
unescape_text/mixed                                                    1.02    133.5±1.02ns        ? ?/sec    1.00    130.4±0.80ns        ? ?/sec    1.04    136.3±1.32ns        ? ?/sec
unescape_text/no_chars_to_unescape_short                               1.01      4.4±0.02ns        ? ?/sec    1.00      4.4±0.04ns        ? ?/sec    1.01      4.4±0.03ns        ? ?/sec

Mingun

Some benchmarks shows regression about 2% in speed on two of my machines (Ubuntu 22.04 and Windows 7), mostly *_no_copy benchmarks, but other shows improvements about 2-7%. In summary, there are more improvements than regressions, so I think we can merge this.

Mingun · 2023-10-12T20:50:49Z

src/utils.rs

+pub(crate) struct MergeIter<It1, It2>
+where
+    It1: Iterator,
+    It2: Iterator<Item = It1::Item>,
+{
+    it1: Peekable<It1>,
+    it2: Peekable<It2>,
+}


It would be great if you add several tests to check that iterator work as expected

Added some unit tests

Nice! @dralley , feel free to merge when you ready.

dralley · 2023-10-14T02:48:46Z

My results are quite a bit more mixed, on an R5 3600 CPU. I'm actually seeing a few more whole-document regressions than improvements (but mostly it's just a wash). Generally speaking I'm seeing more of a penalty than you are on short strings and less of a benefit on long strings.

I'm going to keep looking at this tomorrow, do a bit more testing, and paste my results.

Dr-Emann · 2023-10-14T04:33:45Z

Note, things might get even muddier:

There are likely improvements to jetscii coming to enable some pretty significant speedups in the cases it can use simd, which will probably make jetscii the definite winner for x64 over memchr (though still will need to compare with the bitset impl), in which case, I'm also hoping to be able to speed up the fallback implementation in jetscii to also make it the better choice on the M1 mac.

dralley · 2023-10-14T04:55:58Z

Great to hear! Thanks for investigating this

I'll try out my PR with your jetscii changes tomorrow and see if it changes things

dralley · 2023-10-15T03:13:35Z

R5 3600

[dalley@localhost quick-xml]$ critcmp baseline jetscii-fixed mergeiter-memchr --filter escape
group                                       baseline                               jetscii-fixed                          mergeiter-memchr
-----                                       --------                               -------------                          ----------------
escape_text/escaped_chars_long              4.96    971.0±2.34ns        ? ?/sec    1.53    299.7±1.00ns        ? ?/sec    1.00    196.0±3.03ns        ? ?/sec
escape_text/escaped_chars_short             1.46    276.2±3.34ns        ? ?/sec    1.50    284.1±3.72ns        ? ?/sec    1.00    189.2±2.50ns        ? ?/sec
escape_text/no_chars_to_escape_long         17.76   781.5±5.12ns        ? ?/sec    2.15     94.8±0.02ns        ? ?/sec    1.00     44.0±0.13ns        ? ?/sec
escape_text/no_chars_to_escape_short        1.71     14.3±0.18ns        ? ?/sec    1.00      8.4±0.07ns        ? ?/sec    3.35     28.1±0.17ns        ? ?/sec

i7-8665U

[dalley@thinkpad quick-xml]$ critcmp baseline jetscii-fixed mergeiter-memchr --filter escape
group                                       baseline                               jetscii-fixed                          mergeiter-memchr
-----                                       --------                               -------------                          ----------------
escape_text/escaped_chars_long              3.93   825.1±28.48ns        ? ?/sec    1.86   389.5±20.32ns        ? ?/sec    1.00    209.9±7.55ns        ? ?/sec
escape_text/escaped_chars_short             1.47    276.2±6.51ns        ? ?/sec    2.02    379.2±8.76ns        ? ?/sec    1.00    187.8±6.94ns        ? ?/sec
escape_text/no_chars_to_escape_long         11.11  652.4±20.71ns        ? ?/sec    2.18    127.8±1.74ns        ? ?/sec    1.00     58.7±1.22ns        ? ?/sec
escape_text/no_chars_to_escape_short        1.39     13.4±0.21ns        ? ?/sec    1.00      9.6±0.73ns        ? ?/sec    3.74     35.9±1.27ns        ? ?/sec

This is a bit of a complex decision. I don't think jetscii does anything on ARM, so the mergeiter approach is probably still a winner there.

And in many cases it's the winner against jetscii on x86, but it might be completely document dependent. Short strings with no escaping is a very very common case in a lot of documents and maybe enough so that jetscii would come out ahead. mergeiter actually regresses from baseline there on x86, possibly by enough to end up worse overall?

We should probably test this somehow, but the macrobenchmarks that currently exist, aren't doing any "escaping" at all, just unescaping. I assume that is why those benchmarks show practically zero difference.

We should probably create a benchmark that parses those documents into an event stream (outside the benchmark) and then writes them back to a buffer, in such a way that the escaping / construction cost is captured.

BurntSushi · 2023-10-15T12:17:26Z

Some ideas:

It looks like you need to search for one of 5 possible bytes? If so, one could write a memchr5. I don't know if it makes sense to include it in the memchr crate, but if you did it and found a real world use case where memchr5 was the best option, then I'd be open to adding it to the memchr crate (along with memchr4). I believe I legislated this many moons ago and found memchr3 to be the point at which things really started to go downhill, but maybe that's changed.
Must easier: have you tried Teddy from the aho-corasick crate? Single byte search isn't its strong suit, but it is definitely worth a try.
Last ditch effort is looking at the specific bytes you need to escape and devising a SIMD algorithm specifically tailored to it. For example, if all 5 bytes share a common nybble or some other bit pattern, then your SIMD code can look for that as a way to quickly filter out false negatives. Whether it works or not is whether the bit pattern has a high false positive rate or not.

I think that's probably all I've got. Daniel Lemire's blog might be worth checking out.

@BurntSushi

Per suggestion from @BurntSushi [here](tafia/quick-xml#664 (comment)) On my M1, tt appears to be slower but competitive with memchr up to memchr3, then start being the from 5-16

Dr-Emann · 2023-10-18T02:40:27Z

The Teddy results are pretty promising looking in the jetscii benchmarks for me (faster than merging memchrs), although it does seem a fair bit slower for small haystacks, so we might need to do one algorithm for a small haystacks (or CPUs teddy doesn't support), and use teddy for longer ones

@BurntSushi

Per suggestion from @BurntSushi [here](tafia/quick-xml#664 (comment)) On my M1, tt appears to be slower but competitive with memchr up to memchr3, then start being the from 5-16

Dr-Emann added 2 commits October 12, 2023 01:55

Use memchr to search for characters to escape

041113c

Dr-Emann force-pushed the memchr_escape branch from 56841c9 to dcb2104 Compare October 12, 2023 05:55

Mingun approved these changes Oct 12, 2023

View reviewed changes

Add unit tests for MergeIter

c6ec23a

Dr-Emann force-pushed the memchr_escape branch from e6889ff to c6ec23a Compare October 13, 2023 01:48

Dr-Emann mentioned this pull request Oct 13, 2023

Misc Updates shepmaster/jetscii#57

Open

Dr-Emann closed this Apr 27, 2025

Use memchr to search for characters to escape #664

Use memchr to search for characters to escape #664

Uh oh!

Conversation

Dr-Emann commented Oct 12, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dralley commented Oct 12, 2023

Uh oh!

Dr-Emann commented Oct 12, 2023

Uh oh!

dralley commented Oct 12, 2023

Uh oh!

Dr-Emann commented Oct 12, 2023

Uh oh!

codecov-commenter commented Oct 12, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

dralley commented Oct 12, 2023

Uh oh!

Dr-Emann commented Oct 12, 2023

Uh oh!

Mingun left a comment

Choose a reason for hiding this comment

Uh oh!

Mingun Oct 12, 2023

Choose a reason for hiding this comment

Uh oh!

Dr-Emann Oct 12, 2023

Choose a reason for hiding this comment

Uh oh!

Mingun Oct 13, 2023

Choose a reason for hiding this comment

Uh oh!

dralley commented Oct 14, 2023

Uh oh!

Dr-Emann commented Oct 14, 2023

Uh oh!

dralley commented Oct 14, 2023

Uh oh!

dralley commented Oct 15, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

BurntSushi commented Oct 15, 2023

Uh oh!

Dr-Emann commented Oct 18, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Dr-Emann commented Oct 12, 2023 •

edited

Loading

codecov-commenter commented Oct 12, 2023 •

edited

Loading

dralley commented Oct 15, 2023 •

edited

Loading