[model] Support GLM-4.7-Flash and remove redundant param `--sglang-enable-draft-weights-cpu-backup` by guapisolo · Pull Request #566 · radixark/miles

guapisolo · 2026-02-05T23:56:20Z

Support GLM 4.7 flash with fix in mbridge.
Remove outdated --sglang-enable-draft-weights-cpu-backup param.

gemini-code-assist · 2026-02-05T23:56:40Z

Summary of Changes

Hello @guapisolo, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request primarily focuses on extending the system's capabilities to support the GLM-4.7-Flash model. It involves integrating the new model into the existing conversion and bridging infrastructure, along with refining the speculative decoding configurations for better efficiency. The changes also include the removal of an obsolete configuration flag, contributing to a cleaner and more up-to-date codebase.

Highlights

GLM-4.7-Flash Model Support: Introduced comprehensive support for the GLM-4.7-Flash model, including its specific model configuration and a dedicated run script.
Model Bridging Integration: Added a new GLM4MoELiteBridge to handle GLM-4.7-Flash and DeepSeek-V3 models, integrating them into the existing model conversion framework.
Speculative Decoding Parameter Updates: Adjusted speculative decoding parameters in various run scripts, increasing num-steps and num-draft-tokens for improved performance.
Removed CPU Backup Flag: The --sglang-enable-draft-weights-cpu-backup flag has been removed from documentation and run scripts, indicating a streamlining or deprecation of this feature.

Changelog

docs/en/advanced/speculative-decoding.md
- Removed the --sglang-enable-draft-weights-cpu-backup flag from the speculative decoding documentation.
miles/backends/megatron_utils/megatron_to_hf/init.py
- Modified the _convert_to_hf_core function to include glm4moelite and deepseekv3 model names, mapping them to the convert_deepseekv3_to_hf function.
- Reordered the conditional checks for model names in the conversion logic.
miles_plugins/mbridge/init.py
- Imported GLM4MoELiteBridge.
- Added GLM4MoELiteBridge to the __all__ export list.
miles_plugins/mbridge/glm4moe_lite.py
- Added a new file defining GLM4MoELiteBridge which inherits from DeepseekV3Bridge and is registered as glm4_moe_lite.
scripts/models/glm4.7-30B-A3B.sh
- Added a new script defining model arguments for GLM-4.7-Flash, including MoE parameters, hidden sizes, attention heads, and other configurations.
scripts/run-glm4.5-355B-A32B.sh
- Updated --sglang-speculative-num-steps from 1 to 2.
- Updated --sglang-speculative-num-draft-tokens from 2 to 3.
- Removed the --sglang-enable-draft-weights-cpu-backup flag.
- Changed the save-debug-rollout-data path from an absolute path to a relative path.
scripts/run-glm4.7-30B-A3B.sh
- Added a new comprehensive run script for GLM-4.7-Flash, including Ray setup, checkpoint, rollout, evaluation, performance, optimizer, SGLang, and miscellaneous arguments.
scripts/run-qwen3-next-80B-A3B.sh
- Removed the --sglang-enable-draft-weights-cpu-backup flag.

Activity

The pull request was opened by guapisolo with the title "[feat] Support GLM-4.7-Flash".

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request adds support for the GLM-4.7-Flash model. This includes adding a new model bridge that reuses the DeepseekV3 logic, a new model configuration script, and a run script for training. The changes also involve cleaning up an obsolete speculative decoding flag (--sglang-enable-draft-weights-cpu-backup) from various scripts. My review focuses on the new files. I've found a potential bug in the model name check and several areas for improvement in the new run script to reduce hardcoded values and redundant commands, enhancing its maintainability and portability. Overall, the changes look good and logically implement support for the new model.

gemini-code-assist · 2026-02-05T23:59:04Z

miles/backends/megatron_utils/megatron_to_hf/__init__.py

 # TODO optimize code details
 def _convert_to_hf_core(args, model_name, name, param):
-    if "glm4moe" in model_name:
+    if "glm4moelite" in model_name or "deepseekv3" in model_name:


There appears to be a typo in the model name check. The model is registered as "glm4_moe_lite" in miles_plugins/mbridge/glm4moe_lite.py, but the check here uses "glm4moelite". This mismatch will likely cause the condition to fail and prevent the correct model conversion from being used. Please correct the name to match the registered one.

Suggested change

if "glm4moelite" in model_name or "deepseekv3" in model_name:

if "glm4_moe_lite" in model_name or "deepseekv3" in model_name:

gemini-code-assist · 2026-02-05T23:59:04Z

scripts/run-glm4.7-flash.sh

+sleep 3
+pkill -9 ray
+pkill -9 python


This block of commands is redundant. The pkill commands on lines 10-11 are repeats of lines 7-8. A single set of pkill commands after ray stop --force should be sufficient to clean up processes. The extra sleep and pkills add clutter and can be removed.

gemini-code-assist · 2026-02-05T23:59:04Z

scripts/run-glm4.7-30B-A3B.sh

+export no_proxy="127.0.0.1,${MASTER_ADDR}"
+ray start --head --node-ip-address ${MASTER_ADDR} --num-gpus 8 --disable-usage-stats
+
+for WORKER_IP in $(awk '{print $1}' /root/mpi_rack_hostfile); do


The path to the host file /root/mpi_rack_hostfile is hardcoded. This makes the script less portable. It's better to use an environment variable with a default value to allow for easier configuration in different environments.

Suggested change

for WORKER_IP in $(awk '{print $1}' /root/mpi_rack_hostfile); do

for WORKER_IP in $(awk '{print $1}' "${MPI_RACK_HOSTFILE:-/root/mpi_rack_hostfile}"); do

gemini-code-assist · 2026-02-05T23:59:04Z

scripts/run-glm4.7-30B-A3B.sh

+    continue
+  fi
+  echo "Starting Ray worker on ${WORKER_IP}"
+  ssh root@"${WORKER_IP}" \


The root user is hardcoded for the SSH connection. This is not ideal for security and flexibility. Consider parameterizing the username with an environment variable.

Suggested change

ssh root@"${WORKER_IP}" \

ssh "${REMOTE_USER:-root}"@"${WORKER_IP}" \

gemini-code-assist · 2026-02-05T23:59:04Z

scripts/run-glm4.7-30B-A3B.sh

+        "GLOO_SOCKET_IFNAME": "${MLP_SOCKET_IFNAME}",
+        "TP_SOCKET_IFNAME": "${MLP_SOCKET_IFNAME}",
+        "MASTER_ADDR": "${MLP_WORKER_0_HOST}",
+        "PYTHONPATH": "/root/Megatron-LM/",


The PYTHONPATH is hardcoded to /root/Megatron-LM/. This limits the script's portability. Please consider making this configurable via an environment variable to allow the script to run in different setups.

Suggested change

"PYTHONPATH": "/root/Megatron-LM/",

"PYTHONPATH": "${MEGATRON_LM_PATH:-/root/Megatron-LM/}",

Replace the old env-var-driven RoutingReplay with a cleaner BaseReplayManager / RoutingReplayManager pattern. This removes os.environ usage for replay stage control in favor of direct manager state, generalizes fill_routing_replay into _fill_replay_data, and extracts layer registration logic into replay_utils. Co-authored-by: fzyzcjy <ch271828n@outlook.com> Co-authored-by: Yueming Yuan <yy28@illinois.edu>

guapisolo requested a review from maocheng23 as a code owner February 5, 2026 23:56

gemini-code-assist bot reviewed Feb 5, 2026

View reviewed changes

guapisolo changed the title ~~[feat] Support GLM-4.7-Flash~~ [feat] Support GLM-4.7-Flash and remove redundant param --sglang-enable-draft-weights-cpu-backup Feb 6, 2026

guapisolo added run-ci-fsdp run-ci-megatron labels Feb 6, 2026

guapisolo requested review from fzyzcjy and yueming-yuan as code owners February 7, 2026 03:37

guapisolo changed the title ~~[feat] Support GLM-4.7-Flash and remove redundant param --sglang-enable-draft-weights-cpu-backup~~ [model] Support GLM-4.7-Flash and remove redundant param --sglang-enable-draft-weights-cpu-backup Feb 12, 2026

guapisolo mentioned this pull request Feb 13, 2026

[CI] Reorg test file and fix moonlight oom #593

Merged

yueming-yuan and others added 19 commits February 13, 2026 19:47

Merge branch 'main' into feature/r3-upstream

48bdf04

update megatron.patch to use new api

8cda072

[CI] reapply patch and install megatron for each run

f966048

simplify code & add replay check when use arg.ci_test

c3a547e

use smaller threshold

783227c

fmt

c4a33c8

[CI] fix megatron patch apply in ci

578bf09

[CI] try fix megatron install in ci

1cfc17d

fix attribute not exist issue in generate_endpoint

ca9a066

release r3 check to overlap != 0

d540e8d

add bshd process

c9930ae

fix padding

96ef9ec

fix mtp + r3

a8d598f

[model] Add support for GLM4.7 Flash

f6272b7

[model] Add support for GLM4.7 Flash

4a0f268

fix save debug rollout data

33a562d

remove mtp cpu backup

63c6e8c

fix hf config loading bug

3085c9f

format

8a9ded6

guapisolo added 9 commits February 15, 2026 20:46

add single node job submission

73ae7c3

upd script

cb2fd6f

rename glm 4.7 30b a3b to glm 4.7 flash to avoid confusion

75f0c68

upd scritp

2411b44

upd bridge

cefb2e5

add comments

c4c46ed

tiny fix

aa0edb5

fix mcore ckpt convertion for mtp layer

30f90a9

tmp

2b40a7a

guapisolo force-pushed the auto/20260205235101 branch from ac05af4 to 2b40a7a Compare February 15, 2026 21:02

guapisolo requested a review from yushengsu-thu as a code owner February 15, 2026 21:02

add mtp

2fbdffe

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[model] Support GLM-4.7-Flash and remove redundant param `--sglang-enable-draft-weights-cpu-backup`#566

[model] Support GLM-4.7-Flash and remove redundant param `--sglang-enable-draft-weights-cpu-backup`#566
guapisolo wants to merge 30 commits intoradixark:mainfrom
guapisolo:auto/20260205235101

guapisolo commented Feb 5, 2026 •

edited

Loading

Uh oh!

gemini-code-assist bot commented Feb 5, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Feb 5, 2026

Uh oh!

gemini-code-assist bot Feb 5, 2026

Uh oh!

gemini-code-assist bot Feb 5, 2026

Uh oh!

gemini-code-assist bot Feb 5, 2026

Uh oh!

gemini-code-assist bot Feb 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

	if "glm4moelite" in model_name or "deepseekv3" in model_name:
	if "glm4_moe_lite" in model_name or "deepseekv3" in model_name:

	for WORKER_IP in $(awk '{print $1}' /root/mpi_rack_hostfile); do
	for WORKER_IP in $(awk '{print $1}' "${MPI_RACK_HOSTFILE:-/root/mpi_rack_hostfile}"); do

	ssh root@"${WORKER_IP}" \
	ssh "${REMOTE_USER:-root}"@"${WORKER_IP}" \

	"PYTHONPATH": "/root/Megatron-LM/",
	"PYTHONPATH": "${MEGATRON_LM_PATH:-/root/Megatron-LM/}",

Conversation

guapisolo commented Feb 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gemini-code-assist bot commented Feb 5, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Feb 5, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Feb 5, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Feb 5, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Feb 5, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Feb 5, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

guapisolo commented Feb 5, 2026 •

edited

Loading