Skip to content

feat(expriment): support sve2#209

Draft
hsqStephenZhang wants to merge 7 commits intocloudwego:mainfrom
hsqStephenZhang:feat/sve2
Draft

feat(expriment): support sve2#209
hsqStephenZhang wants to merge 7 commits intocloudwego:mainfrom
hsqStephenZhang:feat/sve2

Conversation

@hsqStephenZhang
Copy link
Collaborator

@hsqStephenZhang hsqStephenZhang commented Feb 15, 2026

What type of PR is this?

feat: sve2

TODO

  • support get_nonspace_bits on sve2
  • support skip_space
  • support skip_container
  • support skip_string
  • support skip_number
  • perf
  • update document
  • clean up code

Current SVE2 approach is pretty native when dealing with skip_xxx, it does not provide a uniform abstraction, it could be optimized.

@hsqStephenZhang hsqStephenZhang marked this pull request as draft February 15, 2026 14:44
@hsqStephenZhang
Copy link
Collaborator Author

hsqStephenZhang commented Feb 16, 2026

first round perf result.

Env

AWS Graviton4 AWS Graviton4 CPU @ 2.8GHz. 2 core, 8G mem. support both neon and sve2.

Result

数据集 (Dataset) 测试方法 (Method) NEON 耗时 NEON-no-reuse 耗时 SVE 耗时 SVE 较 NEON 性能变化 SVE 较 NEON-no-reuse 性能变化
canada dom::from_slice 3.2093 ms 3.1694 ms 3.1925 ms 🚀 提升 0.52% 📉 下降 0.73%
  dom::from_slice_use_rawnum 2.0917 ms 1.9499 ms 1.9067 ms 🚀 提升 8.84% 🚀 提升 2.22%
  dom::from_slice_unchecked 3.1483 ms 3.1147 ms 3.1395 ms 🚀 提升 0.28% 📉 下降 0.80%
  to_serde_json_value... 7.9253 ms 8.0397 ms 7.7627 ms 🚀 提升 2.05% 🚀 提升 3.45%
  to_simd_json_value... 5.8398 ms 5.8148 ms 5.6302 ms 🚀 提升 3.59% 🚀 提升 3.17%
citm_catalog dom::from_slice 955.62 µs 1.2900 ms 984.04 µs 📉 下降 2.97% 🚀 提升 23.72%
  dom::from_slice_use_rawnum 954.32 µs 1.3087 ms 1.0123 ms 📉 下降 6.08% 🚀 提升 22.65%
  dom::from_slice_unchecked 913.30 µs 1.2417 ms 944.36 µs 📉 下降 3.40% 🚀 提升 23.95%
  to_serde_json_value... 2.9651 ms 3.1686 ms 2.8944 ms 🚀 提升 2.38% 🚀 提升 8.65%
  to_simd_json_value... 2.4182 ms 2.6329 ms 2.4045 ms 🚀 提升 0.57% 🚀 提升 8.67%
twitter dom::from_slice 431.21 µs 496.89 µs 399.92 µs 🚀 提升 7.26% 🚀 提升 19.52%
  dom::from_slice_use_rawnum 430.05 µs 499.49 µs 404.47 µs 🚀 提升 5.95% 🚀 提升 19.02%
  dom::from_slice_unchecked 404.89 µs 470.50 µs 373.06 µs 🚀 提升 7.86% 🚀 提升 20.71%
  to_serde_json_value... 1.3289 ms 1.3923 ms 1.3140 ms 🚀 提升 1.12% 🚀 提升 5.62%
  to_simd_json_value... 1.1346 ms 1.1916 ms 1.1292 ms 🚀 提升 0.48% 🚀 提升 5.24%
github_events dom::from_slice 40.167 µs 46.907 µs 39.226 µs 🚀 提升 2.34% 🚀 提升 16.37%
  dom::from_slice_use_rawnum 40.317 µs 47.112 µs 39.343 µs 🚀 提升 2.42% 🚀 提升 16.49%
  dom::from_slice_unchecked 38.745 µs 45.386 µs 37.850 µs 🚀 提升 2.31% 🚀 提升 16.60%
  to_serde_json_value... 103.10 µs 108.33 µs 103.52 µs 📉 下降 0.41% 🚀 提升 4.44%
  to_simd_json_value... 85.798 µs 91.738 µs 87.726 µs 📉 下降 2.25% 🚀 提升 4.37%
book dom::from_slice 661.91 ns 619.99 ns 631.25 ns 🚀 提升 4.63% 📉 下降 1.82%
  dom::from_slice_use_rawnum 683.71 ns 629.37 ns 639.21 ns 🚀 提升 6.51% 📉 下降 1.56%
  dom::from_slice_unchecked 640.34 ns 600.24 ns 612.07 ns 🚀 提升 4.41% 📉 下降 1.97%
  to_serde_json_value... 1.7753 µs 1.7690 µs 1.8290 µs 📉 下降 3.02% 📉 下降 3.39%
  to_simd_json_value... 1.2165 µs 1.2083 µs 1.2566 µs 📉 下降 3.30% 📉 下降 4.00%

Explanation

  1. There are three round, NEON-no-reuse standards for the cpp version where the skip_space did not reuse the bitmask. the NEON standards for the current approach, and SVE is what this PR does.
  2. The current performance is only affected by implementing skip_space on sve2.
  3. The performance is very similar to the one in arm: optimize decoder on Arm SVE2 platform bytedance/sonic-cpp#92.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

1 participant