-
Notifications
You must be signed in to change notification settings - Fork 585
Open
Labels
feature requestNew feature or requestNew feature or request
Description
FI has fp8 and fp4 gemm implementations. But there is no bf16 one.
Original issue was found in vllm and described in vllm-project/vllm#27173.
In short, torch.nn.functional.linear is not optimal for small batch sizes. Torch team said that they just call cuBLAS.
It makes sense to support bp16 gemm and do tuning through cuBLAS, cutlass, cuDNN, internal FI implementation as it done for fp8 and fp4 cases.
Performance result and script for measurement are in vllm-project/vllm#27173.
Metadata
Metadata
Assignees
Labels
feature requestNew feature or requestNew feature or request