-
Notifications
You must be signed in to change notification settings - Fork 31.3k
Open
Labels
Description
System Info
device: A100-PCIE-40GB
torch: 2.7.1
kernels: main
transformers: main
Who can help?
Information
- The official example scripts
- My own modified scripts
Tasks
- An officially supported task in the
examplesfolder (such as GLUE/SQuAD, ...) - My own task or dataset (give details below)
Reproduction
-
First, the
_KERNEL_MAPPINGin hub_kernels.py must be cleared to ensure kernels are specified entirely throughkernel_mapping. -
Run script
from transformers import AutoModelForCausalLM, AutoTokenizer, KernelConfig
import torch
model_id = "meta-llama/Llama-3.2-1B"
kernel_mapping = {
"RMSNorm": {
"cuda":
"kernels-community/layer_norm:LlamaRMSNorm"
"rocm":
"kernels-community/layer_norm:LlamaRMSNorm"
}
}
kernel_config = KernelConfig(kernel_mapping)
model = AutoModelForCausalLM.from_pretrained(
model_id,
device_map="auto",
torch_dtype=torch.bfloat16,
use_kernels=True,
kernel_config=kernel_config
)When multiple device exist in kernel_mapping, calling add_to_mapping() will repeatedly overwrite the repo_name.
Since ROCm is configured as the second device while I'm currently working in an A100 env, the RMSNorm kernels remain inactive.
Expected behavior
Does the repeated invocation of add_to_mapping() causing overwrites in kernel_mapping represent an issue that needs to be addressed?
# In transformers/src/transformers/utils/kernel_config.py
elif isinstance(kernel, dict):
for device, repo_name in kernel.items():
add_to_mapping(layer_name, device, repo_name, mode, compatible_mapping)