-
Notifications
You must be signed in to change notification settings - Fork 10
Rmsnorm cpu #69
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Rmsnorm cpu #69
Conversation
|
perf 5x for 1024x1024 BF16 input |
|
@danieldk @MekkCyber please help review |
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
| ref_out = x * rmsnorm_layer.weight | ||
| torch.testing.assert_close(output, ref_out.to(torch.bfloat16)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
might be a dumb question but what if bf16 is not supported ? (hasAVX2 is true but hasAVX512BF16 false)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AVX2 path also support BF16, hasAVX512BF16 is just to make sure we could use instruction like _mm256_cvtneps_pbh which is AVX512BF16 instruction.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the compile cxx-flag is cxx-flags = ["-mavx2", "-mfma", "-fopenmp", "-mf16c", "-mavx512f", "-mavx512bf16", "-mavx512vl"], but since compile and runtime env may be different, so we need to check if avx512bf16 is support in the runtime env, or else will fallback to avx2.
|
@sywangyi please add more details in the PR description 🤗 |
IlyasMoutawwakil
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM !
| "rmsnorm_cpu/cpu_types_avx512.hpp", | ||
| ] | ||
| include = ["rmsnorm_cpu"] | ||
| cxx-flags = ["-mavx2", "-mfma", "-fopenmp", "-mf16c", "-mavx512f", "-mavx512bf16", "-mavx512vl"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think using the -mavx512.* flags also licenses the compiler to use AVX512 instructions for rmsnorm_cpu/rmsnorm_avx2.cpp? Same for -mavx2 and -mavx512.* for rmsnorm_cpu/rmsnorm_cpu.cpp and rmsnorm_cpu/rmsnorm_cpu_torch.cpp.
I think both rmsnorm_cpu/rmsnorm_avx2.cpp and rmsnorm_cpu/rmsnorm_avx512.cpp should be compiled separately (something like [kernel.rmsnorm_cpu_avx512]).
No description provided.