Skip to content

Conversation

@sywangyi
Copy link
Contributor

No description provided.

@sywangyi sywangyi requested a review from MekkCyber as a code owner November 14, 2025 03:02
@sywangyi
Copy link
Contributor Author

perf 5x for 1024x1024 BF16 input

@sywangyi
Copy link
Contributor Author

@danieldk @MekkCyber please help review

@sywangyi sywangyi marked this pull request as draft November 14, 2025 05:06
@sywangyi sywangyi marked this pull request as ready for review November 14, 2025 05:18
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
Comment on lines +21 to +22
ref_out = x * rmsnorm_layer.weight
torch.testing.assert_close(output, ref_out.to(torch.bfloat16))
Copy link
Member

@IlyasMoutawwakil IlyasMoutawwakil Nov 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

might be a dumb question but what if bf16 is not supported ? (hasAVX2 is true but hasAVX512BF16 false)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AVX2 path also support BF16, hasAVX512BF16 is just to make sure we could use instruction like _mm256_cvtneps_pbh which is AVX512BF16 instruction.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the compile cxx-flag is cxx-flags = ["-mavx2", "-mfma", "-fopenmp", "-mf16c", "-mavx512f", "-mavx512bf16", "-mavx512vl"], but since compile and runtime env may be different, so we need to check if avx512bf16 is support in the runtime env, or else will fallback to avx2.

@IlyasMoutawwakil
Copy link
Member

@sywangyi please add more details in the PR description 🤗

Copy link
Member

@IlyasMoutawwakil IlyasMoutawwakil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM !

"rmsnorm_cpu/cpu_types_avx512.hpp",
]
include = ["rmsnorm_cpu"]
cxx-flags = ["-mavx2", "-mfma", "-fopenmp", "-mf16c", "-mavx512f", "-mavx512bf16", "-mavx512vl"]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think using the -mavx512.* flags also licenses the compiler to use AVX512 instructions for rmsnorm_cpu/rmsnorm_avx2.cpp? Same for -mavx2 and -mavx512.* for rmsnorm_cpu/rmsnorm_cpu.cpp and rmsnorm_cpu/rmsnorm_cpu_torch.cpp.

I think both rmsnorm_cpu/rmsnorm_avx2.cpp and rmsnorm_cpu/rmsnorm_avx512.cpp should be compiled separately (something like [kernel.rmsnorm_cpu_avx512]).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants