You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
1. I have searched related issues but cannot get the expected help.
2. The bug has not been fixed in the latest version.
3. Please note that if the bug-related issue you submitted lacks corresponding environment info and a minimal reproducible demo, it will be challenging for us to reproduce and resolve the issue, reducing the likelihood of receiving feedback.
Describe the bug
I use docker from aliyun, and i run llm such as Qwen2.5-VL-7B-Instruct is OK, But when i switch to AWQ series, It's failed.
Error Informations:
2025-05-07 07:55:31,127 - lmdeploy - ERROR - model_agent.py:391 - Task failed
Traceback (most recent call last):
File "/opt/lmdeploy/lmdeploy/pytorch/engine/model_agent.py", line 386, in _on_finish_callback
task.result()
File "/opt/lmdeploy/lmdeploy/pytorch/engine/model_agent.py", line 374, in _async_loop_background
await self._async_step_background(
File "/opt/lmdeploy/lmdeploy/pytorch/engine/model_agent.py", line 322, in _async_step_background
output = await self._async_model_forward(inputs,
File "/opt/lmdeploy/lmdeploy/pytorch/engine/model_agent.py", line 243, in _async_model_forward
ret = await __forward(inputs)
File "/opt/lmdeploy/lmdeploy/pytorch/engine/model_agent.py", line 220, in __forward
return await self.async_forward(inputs, swap_in_map=swap_in_map, swap_out_map=swap_out_map)
File "/opt/lmdeploy/lmdeploy/pytorch/engine/model_agent.py", line 538, in async_forward
output = self._forward_impl(inputs, swap_in_map=swap_in_map, swap_out_map=swap_out_map)
File "/opt/lmdeploy/lmdeploy/pytorch/engine/model_agent.py", line 521, in _forward_impl
output = model_forward(
File "/usr/local/python3.10.5/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/opt/lmdeploy/lmdeploy/pytorch/engine/model_agent.py", line 75, in model_forward
output = model(**input_dict)
File "/opt/lmdeploy/lmdeploy/pytorch/backends/graph_runner.py", line 24, in call
return self.model(**kwargs)
File "/usr/local/python3.10.5/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/usr/local/python3.10.5/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
File "/opt/lmdeploy/lmdeploy/pytorch/models/qwen2_5_vl.py", line 439, in forward
hidden_states = self.model(
File "/usr/local/python3.10.5/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/usr/local/python3.10.5/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
File "/opt/lmdeploy/lmdeploy/pytorch/models/qwen2_vl.py", line 295, in forward
hidden_states, residual = decoder_layer(
File "/usr/local/python3.10.5/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/usr/local/python3.10.5/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
File "/opt/lmdeploy/lmdeploy/pytorch/models/qwen2_vl.py", line 214, in forward
hidden_states = self.self_attn(
File "/usr/local/python3.10.5/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/usr/local/python3.10.5/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
File "/opt/lmdeploy/lmdeploy/pytorch/models/qwen2_vl.py", line 96, in forward
qkv_states = self.qkv_proj(hidden_states)
File "/usr/local/python3.10.5/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/usr/local/python3.10.5/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
File "/opt/lmdeploy/lmdeploy/pytorch/nn/linear.py", line 512, in forward
out = self.impl.forward(x, self.qweight, self.scales, self.qzeros, self.bias, False)
File "/opt/lmdeploy/lmdeploy/pytorch/backends/dlinfer/awq_modules.py", line 28, in forward
out = awq_linear(x, qweight, scales, qzeros, bias, all_reduce, self.group_size)
File "/opt/lmdeploy/lmdeploy/pytorch/kernels/dlinfer/awq_kernels.py", line 15, in awq_linear
return ext_ops.weight_quant_matmul(x.squeeze(0),
File "/usr/local/python3.10.5/lib/python3.10/site-packages/dlinfer/ops/llm.py", line 542, in weight_quant_matmul
return vendor_ops_registry["weight_quant_matmul"](
File "/usr/local/python3.10.5/lib/python3.10/site-packages/dlinfer/vendor/ascend/torch_npu_ops.py", line 404, in weight_quant_matmul
return torch.ops.npu.npu_weight_quant_batchmatmul(
File "/usr/local/python3.10.5/lib/python3.10/site-packages/torch/ops.py", line 854, in call
return self._op(*args, **(kwargs or {}))
RuntimeError: call aclnnWeightQuantBatchMatmulV2 failed, detail:EZ1001: [PID: 38901] 2025-05-07-07:55:31.121.354 antiquantScale's dtype must be DT_UINT64 or DT_INT64 when antiquantOffset's dtype is DT_INT32, actual antiquantScale's dtype is [DT_FLOAT16].
Checklist
Describe the bug
I use docker from aliyun, and i run llm such as Qwen2.5-VL-7B-Instruct is OK, But when i switch to AWQ series, It's failed.
My command:
lmdeploy serve api_server
--backend turbomind
--device ascend
--eager-mode
--server-port 12000
--tp 4
--max-batch-size 32
--cache-max-entry-count 0.6
--cache-block-seq-len 64
--model-format awq
/root/Qwen2.5-VL-7B-Instruct
Error Informations:
2025-05-07 07:55:31,127 - lmdeploy - ERROR - model_agent.py:391 - Task failed
Traceback (most recent call last):
File "/opt/lmdeploy/lmdeploy/pytorch/engine/model_agent.py", line 386, in _on_finish_callback
task.result()
File "/opt/lmdeploy/lmdeploy/pytorch/engine/model_agent.py", line 374, in _async_loop_background
await self._async_step_background(
File "/opt/lmdeploy/lmdeploy/pytorch/engine/model_agent.py", line 322, in _async_step_background
output = await self._async_model_forward(inputs,
File "/opt/lmdeploy/lmdeploy/pytorch/engine/model_agent.py", line 243, in _async_model_forward
ret = await __forward(inputs)
File "/opt/lmdeploy/lmdeploy/pytorch/engine/model_agent.py", line 220, in __forward
return await self.async_forward(inputs, swap_in_map=swap_in_map, swap_out_map=swap_out_map)
File "/opt/lmdeploy/lmdeploy/pytorch/engine/model_agent.py", line 538, in async_forward
output = self._forward_impl(inputs, swap_in_map=swap_in_map, swap_out_map=swap_out_map)
File "/opt/lmdeploy/lmdeploy/pytorch/engine/model_agent.py", line 521, in _forward_impl
output = model_forward(
File "/usr/local/python3.10.5/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/opt/lmdeploy/lmdeploy/pytorch/engine/model_agent.py", line 75, in model_forward
output = model(**input_dict)
File "/opt/lmdeploy/lmdeploy/pytorch/backends/graph_runner.py", line 24, in call
return self.model(**kwargs)
File "/usr/local/python3.10.5/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/usr/local/python3.10.5/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
File "/opt/lmdeploy/lmdeploy/pytorch/models/qwen2_5_vl.py", line 439, in forward
hidden_states = self.model(
File "/usr/local/python3.10.5/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/usr/local/python3.10.5/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
File "/opt/lmdeploy/lmdeploy/pytorch/models/qwen2_vl.py", line 295, in forward
hidden_states, residual = decoder_layer(
File "/usr/local/python3.10.5/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/usr/local/python3.10.5/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
File "/opt/lmdeploy/lmdeploy/pytorch/models/qwen2_vl.py", line 214, in forward
hidden_states = self.self_attn(
File "/usr/local/python3.10.5/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/usr/local/python3.10.5/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
File "/opt/lmdeploy/lmdeploy/pytorch/models/qwen2_vl.py", line 96, in forward
qkv_states = self.qkv_proj(hidden_states)
File "/usr/local/python3.10.5/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/usr/local/python3.10.5/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
File "/opt/lmdeploy/lmdeploy/pytorch/nn/linear.py", line 512, in forward
out = self.impl.forward(x, self.qweight, self.scales, self.qzeros, self.bias, False)
File "/opt/lmdeploy/lmdeploy/pytorch/backends/dlinfer/awq_modules.py", line 28, in forward
out = awq_linear(x, qweight, scales, qzeros, bias, all_reduce, self.group_size)
File "/opt/lmdeploy/lmdeploy/pytorch/kernels/dlinfer/awq_kernels.py", line 15, in awq_linear
return ext_ops.weight_quant_matmul(x.squeeze(0),
File "/usr/local/python3.10.5/lib/python3.10/site-packages/dlinfer/ops/llm.py", line 542, in weight_quant_matmul
return vendor_ops_registry["weight_quant_matmul"](
File "/usr/local/python3.10.5/lib/python3.10/site-packages/dlinfer/vendor/ascend/torch_npu_ops.py", line 404, in weight_quant_matmul
return torch.ops.npu.npu_weight_quant_batchmatmul(
File "/usr/local/python3.10.5/lib/python3.10/site-packages/torch/ops.py", line 854, in call
return self._op(*args, **(kwargs or {}))
RuntimeError: call aclnnWeightQuantBatchMatmulV2 failed, detail:EZ1001: [PID: 38901] 2025-05-07-07:55:31.121.354 antiquantScale's dtype must be DT_UINT64 or DT_INT64 when antiquantOffset's dtype is DT_INT32, actual antiquantScale's dtype is [DT_FLOAT16].
Reproduction
lmdeploy serve api_server
--backend turbomind
--device ascend
--eager-mode
--server-port 12000
--tp 4
--max-batch-size 32
--cache-max-entry-count 0.6
--cache-block-seq-len 64
--model-format awq
/root/Qwen2.5-VL-7B-Instruct
Environment
Error traceback
The text was updated successfully, but these errors were encountered: