Add initial Qwen3.5-4B support and SGLang text-runtime patch#39
Open
ShenAC-SAC wants to merge 6 commits intoGen-Verse:mainfrom
Open
Add initial Qwen3.5-4B support and SGLang text-runtime patch#39ShenAC-SAC wants to merge 6 commits intoGen-Verse:mainfrom
ShenAC-SAC wants to merge 6 commits intoGen-Verse:mainfrom
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This PR adds initial Qwen3.5 support for the personal OpenClaw training paths:
The goal of this PR is to wire Qwen3.5 into the current OpenClaw/SLIME/Megatron stack and verify that the model can reach a real startup smoke path in the existing training/runtime flow.
Main Changes
Added Qwen3.5 integration across the current stack:
slime/slime_plugins/models/qwen3_5.pyslime/slime_plugins/mbridge/qwen3_5.pyslime/slime/backends/megatron_utils/megatron_to_hf/qwen3_5.pyAdded or updated Qwen3.5 launch scripts:
slime/scripts/models/qwen3.5-4B.shopenclaw-combine/run_qwen35_4b_openclaw_combine.shopenclaw-rl/run_qwen35_4b_openclaw_rl.shopenclaw-opd/run_qwen35_4b_openclaw_opd.shAdded repo-side SGLang compatibility work for Qwen3.5 rollout serving:
slime/slime/backends/sglang_utils/qwen3_5.pyslime/slime/backends/sglang_utils/sglang_engine.pyAdded a low-resource debug-rollout startup fix for PRM-based methods:
slime/slime/ray/placement_group.pyDependency / Runtime Notes
Qwen3.5 support does not work with the current repo pin
transformers==4.57.1.During isolated validation, Qwen3.5 required a newer Transformers build that provides
transformers.models.qwen3_5.The validation environment used:
transformers==5.2.0For runtime validation, I also used a newer SGLang-related stack in Docker:
sglang==0.5.9flashinfer-python==0.6.6sgl-kernel==0.3.21For the reduced-GPU combine validation path, the environment also needed:
numpy<2(Megatron currently rejects NumPy 2.x)pylatexencwandbI have not updated the repo-wide dependency pin in this PR yet, because I wanted to first confirm the Qwen3.5 path can actually run through the current training/runtime stack before proposing a broader dependency change.
Validation Completed
Completed checks:
transformerssupportsqwen3_5Qwen3.5-4Bopenclaw-combinestartup smoke with PRM enabledSingle-GPU serving smoke result:
/health_generatereturned200/model_inforeturned200/generatereturned200Reduced-GPU combine smoke result:
--debug-rollout-onlystartup pathdebug_rollout_onlycan still reserve PRM GPUslanguage_only -> encoder_urlsvalidation/health_generate,/server_info, and/model_infopolicy server readymodel is readybannerThis smoke run was intentionally bounded with a timeout after the stack became ready; it did not stop because of a Qwen3.5 startup crash.
Remaining Notes
Still not completed in this PR validation:
openclaw-testwas not run here. It depends on a separately running OpenClaw gateway plus an external user-model endpoint/token, which was outside the repo-local Qwen3.5 training validation setup used for this PR.Why This PR Is Ready For Review
At this point the main repo-side goal is met:
Follow-up work such as broader dependency pinning or more exhaustive training validation can still be done in subsequent PRs if preferred.