modelscope · Jintao-Huang · Apr 6, 2026 · Apr 6, 2026
diff --git a/docs/source/BestPractices/Qwen3_5-Best-Practice.md b/docs/source/BestPractices/Qwen3_5-Best-Practice.md
@@ -311,6 +311,7 @@ Megatron-SWIFT训练Qwen3.5的提示：
 - 全参数训练：参考[这个例子](https://github.com/modelscope/ms-swift/tree/main/examples/models/qwen3_5/mcore_full.sh)。
 - 关于MTP训练："mcore-bridge>=1.1.0"支持了多模态MTP的训练（暂时需安装[main分支](https://github.com/modelscope/mcore-bridge/pull/14)），请安装对应版本。
 - TP 限制解除：使用 "megatron-core>=0.16" 可解除 TP 受到的 `num_query_groups` 限制。
+- CP支持："mcore-bridge>=1.1.0"支持了GDN的CP训练（暂时需安装[main分支](https://github.com/modelscope/mcore-bridge/pull/16)），此外需安装megatron-core dev分支。
 - 默认 `GatedDeltaNet` 使用 Megatron 实现，需使用 "megatron-core>=0.16"（ms-swift>=4.1.0，之前版本默认使用transformers实现）。设置环境变量 `USE_MCORE_GDN=0`可切换至 transformers 实现，transformers实现不支持packing和GDN的TP。
 - padding_free/packing的支持：packing可以提升训练速度。参考[这个例子](https://github.com/modelscope/ms-swift/tree/main/examples/models/qwen3_5/packing.sh)。
 - apply_wd_to_qk_layernorm：对 qk layernorm 应用权重衰减。默认为False。

diff --git a/docs/source_en/BestPractices/Qwen3_5-Best-Practice.md b/docs/source_en/BestPractices/Qwen3_5-Best-Practice.md
@@ -309,6 +309,7 @@ Tips for training Qwen3.5 with Megatron-SWIFT:
 - Full parameter training: Refer to [this example](https://github.com/modelscope/ms-swift/tree/main/examples/models/qwen3_5/mcore_full.sh).
 - Regarding MTP training: `mcore-bridge>=1.1.0` supports multimodal MTP training (currently requires installing the [main branch](https://github.com/modelscope/mcore-bridge/pull/14)). Please install the corresponding version.
 - TP Limitation Removed: Using `megatron-core>=0.16` removes the `num_query_groups` limitation on TP.
+- CP support: "mcore-bridge>=1.1.0" supports CP training for GDN (currently requires installing the [main branch](https://github.com/modelscope/mcore-bridge/pull/16)). Additionally, the megatron-core dev branch needs to be installed.
 - By default, `GatedDeltaNet` uses the Megatron implementation, which requires "megatron-core>=0.16" (ms-swift>=4.1.0; previous versions defaulted to the transformers implementation). Set the environment variable `USE_MCORE_GDN=0` to switch to the transformers implementation. Note that the transformers implementation does not support packing and GDN's TP.
 - Support for padding_free/packing: Packing can improve training speed. Refer to [this example](https://github.com/modelscope/ms-swift/tree/main/examples/models/qwen3_5/packing.sh).
 - apply_wd_to_qk_layernorm: Apply weight decay to qk layernorm. Default is False.