Rmsnorm cpu #69

sywangyi · 2025-11-14T03:02:12Z

No description provided.

sywangyi · 2025-11-14T03:12:56Z

perf 5x for 1024x1024 BF16 input

sywangyi · 2025-11-14T03:13:10Z

@danieldk @MekkCyber please help review

Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>

IlyasMoutawwakil · 2025-11-19T06:02:44Z

rmsnorm/tests/test_rmsnorm.py

+    ref_out =  x * rmsnorm_layer.weight
+    torch.testing.assert_close(output, ref_out.to(torch.bfloat16))


might be a dumb question but what if bf16 is not supported ? (hasAVX2 is true but hasAVX512BF16 false)

AVX2 path also support BF16, hasAVX512BF16 is just to make sure we could use instruction like _mm256_cvtneps_pbh which is AVX512BF16 instruction.

the compile cxx-flag is cxx-flags = ["-mavx2", "-mfma", "-fopenmp", "-mf16c", "-mavx512f", "-mavx512bf16", "-mavx512vl"], but since compile and runtime env may be different, so we need to check if avx512bf16 is support in the runtime env, or else will fallback to avx2.

IlyasMoutawwakil · 2025-11-19T06:04:59Z

@sywangyi please add more details in the PR description 🤗

IlyasMoutawwakil

LGTM !

danieldk · 2025-11-20T14:52:28Z

rmsnorm/build.toml

+    "rmsnorm_cpu/cpu_types_avx512.hpp",
+]
+include = ["rmsnorm_cpu"]
+cxx-flags = ["-mavx2", "-mfma", "-fopenmp", "-mf16c", "-mavx512f", "-mavx512bf16", "-mavx512vl"]


I think using the -mavx512.* flags also licenses the compiler to use AVX512 instructions for rmsnorm_cpu/rmsnorm_avx2.cpp? Same for -mavx2 and -mavx512.* for rmsnorm_cpu/rmsnorm_cpu.cpp and rmsnorm_cpu/rmsnorm_cpu_torch.cpp.

I think both rmsnorm_cpu/rmsnorm_avx2.cpp and rmsnorm_cpu/rmsnorm_avx512.cpp should be compiled separately (something like [kernel.rmsnorm_cpu_avx512]).

sywangyi requested a review from MekkCyber as a code owner November 14, 2025 03:02

sywangyi marked this pull request as draft November 14, 2025 05:06

sywangyi marked this pull request as ready for review November 14, 2025 05:18

sywangyi added 5 commits November 14, 2025 10:46

add cpu optimization kernel

d75411c

Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>

fix avx2 difference with avx512 and add test

0d54f87

Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>

refine code

8d384f5

Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>

minor fix

dc6bf70

Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>

optimize bw

53af38e

Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>

IlyasMoutawwakil reviewed Nov 19, 2025

View reviewed changes

IlyasMoutawwakil approved these changes Nov 19, 2025

View reviewed changes

danieldk reviewed Nov 20, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Rmsnorm cpu #69

Rmsnorm cpu #69

sywangyi commented Nov 14, 2025

Uh oh!

sywangyi commented Nov 14, 2025

Uh oh!

sywangyi commented Nov 14, 2025

Uh oh!

IlyasMoutawwakil Nov 19, 2025 •

edited

Loading

Uh oh!

sywangyi Nov 19, 2025

Uh oh!

sywangyi Nov 19, 2025

Uh oh!

IlyasMoutawwakil commented Nov 19, 2025

Uh oh!

IlyasMoutawwakil left a comment

Uh oh!

danieldk Nov 20, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		ref_out = x * rmsnorm_layer.weight
		torch.testing.assert_close(output, ref_out.to(torch.bfloat16))

Rmsnorm cpu #69

Are you sure you want to change the base?

Rmsnorm cpu #69

Conversation

sywangyi commented Nov 14, 2025

Uh oh!

sywangyi commented Nov 14, 2025

Uh oh!

sywangyi commented Nov 14, 2025

Uh oh!

IlyasMoutawwakil Nov 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sywangyi Nov 19, 2025

Choose a reason for hiding this comment

Uh oh!

sywangyi Nov 19, 2025

Choose a reason for hiding this comment

Uh oh!

IlyasMoutawwakil commented Nov 19, 2025

Uh oh!

IlyasMoutawwakil left a comment

Choose a reason for hiding this comment

Uh oh!

danieldk Nov 20, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

IlyasMoutawwakil Nov 19, 2025 •

edited

Loading