Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
31 commits
Select commit Hold shift + click to select a range
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -143,6 +143,8 @@ LMDeploy is a toolkit for compressing, deploying, and serving LLM, developed by
<li>DeepSeek-MoE (16B)</li>
<li>DeepSeek-V2 (16B, 236B)</li>
<li>DeepSeek-V2.5 (236B)</li>
<li>DeepSeek-V3 (685B)</li>
<li>DeepSeek-V3.2 (685B)</li>
<li>Mixtral (8x7B, 8x22B)</li>
<li>Gemma (2B - 7B)</li>
<li>StarCoder2 (3B - 15B)</li>
Expand Down
2 changes: 2 additions & 0 deletions README_ja.md
Original file line number Diff line number Diff line change
Expand Up @@ -130,6 +130,8 @@ LMDeploy TurboMindエンジンは卓越した推論能力を持ち、さまざ
<li>DeepSeek-MoE (16B)</li>
<li>DeepSeek-V2 (16B, 236B)</li>
<li>DeepSeek-V2.5 (236B)</li>
<li>DeepSeek-V3 (685B)</li>
<li>DeepSeek-V3.2 (685B)</li>
<li>Mixtral (8x7B, 8x22B)</li>
<li>Gemma (2B - 7B)</li>
<li>StarCoder2 (3B - 15B)</li>
Expand Down
2 changes: 2 additions & 0 deletions README_zh-CN.md
Original file line number Diff line number Diff line change
Expand Up @@ -144,6 +144,8 @@ LMDeploy TurboMind 引擎拥有卓越的推理能力,在各种规模的模型
<li>DeepSeek-MoE (16B)</li>
<li>DeepSeek-V2 (16B, 236B)</li>
<li>DeepSeek-V2.5 (236B)</li>
<li>DeepSeek-V3 (685B)</li>
<li>DeepSeek-V3.2 (685B)</li>
<li>Mixtral (8x7B, 8x22B)</li>
<li>Gemma (2B - 7B)</li>
<li>StarCoder2 (3B - 15B)</li>
Expand Down
2 changes: 2 additions & 0 deletions docs/en/supported_models/supported_models.md
Original file line number Diff line number Diff line change
Expand Up @@ -92,6 +92,8 @@ The following tables detail the models supported by LMDeploy's TurboMind engine
| DeepSeek-MoE | 16B | LLM | Yes | No | No | No | No |
| DeepSeek-V2 | 16B, 236B | LLM | Yes | No | No | No | No |
| DeepSeek-V2.5 | 236B | LLM | Yes | No | No | No | No |
| DeepSeek-V3 | 685B | LLM | Yes | No | No | No | No |
| DeepSeek-V3.2 | 685B | LLM | Yes | No | No | No | No |
| DeepSeek-VL2 | 3B - 27B | MLLM | Yes | No | No | No | No |
| MiniCPM3 | 4B | LLM | Yes | Yes | Yes | No | No |
| MiniCPM-V-2_6 | 8B | LLM | Yes | No | No | No | Yes |
Expand Down
2 changes: 2 additions & 0 deletions docs/zh_cn/supported_models/supported_models.md
Original file line number Diff line number Diff line change
Expand Up @@ -92,6 +92,8 @@
| DeepSeek-MoE | 16B | LLM | Yes | No | No | No | No |
| DeepSeek-V2 | 16B, 236B | LLM | Yes | No | No | No | No |
| DeepSeek-V2.5 | 236B | LLM | Yes | No | No | No | No |
| DeepSeek-V3 | 685B | LLM | Yes | No | No | No | No |
| DeepSeek-V3.2 | 685B | LLM | Yes | No | No | No | No |
| DeepSeek-VL2 | 3B - 27B | MLLM | Yes | No | No | No | No |
| MiniCPM3 | 4B | LLM | Yes | Yes | Yes | No | No |
| MiniCPM-V-2_6 | 8B | LLM | Yes | No | No | No | Yes |
Expand Down
3 changes: 3 additions & 0 deletions lmdeploy/pytorch/backends/attention.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,8 @@ class AttentionMetadata:
q_seqlens: torch.Tensor = None
kv_seqlens: torch.Tensor = None
fill_seqlens: torch.Tensor = None
cu_seqlens_q: torch.Tensor = None
cu_seqlens_k: torch.Tensor = None
quant_policy: Literal[0, 4, 8] = 0


Expand Down Expand Up @@ -70,6 +72,7 @@ def forward(
k_scales_zeros: torch.Tensor = None,
v_scales_zeros: torch.Tensor = None,
learnable_sink: torch.Tensor = None,
nsa_indices: torch.Tensor = None,
inplace: bool = False,
) -> torch.Tensor:
"""forward."""
Expand Down
1 change: 1 addition & 0 deletions lmdeploy/pytorch/backends/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,7 @@ class OpType(Enum):
FusedMoEW8A8 = auto()
LinearBlockedF8 = auto()
FusedMoEBlockedF8 = auto()
NSAIndexFP8 = auto()


class OpsBackend(ABC):
Expand Down
Loading