Skip to content

Add initial Qwen3.5-4B support and SGLang text-runtime patch#39

Open
ShenAC-SAC wants to merge 6 commits intoGen-Verse:mainfrom
ShenAC-SAC:sac/qwen35-support
Open

Add initial Qwen3.5-4B support and SGLang text-runtime patch#39
ShenAC-SAC wants to merge 6 commits intoGen-Verse:mainfrom
ShenAC-SAC:sac/qwen35-support

Conversation

@ShenAC-SAC
Copy link
Copy Markdown

@ShenAC-SAC ShenAC-SAC commented Mar 19, 2026

Summary

This PR adds initial Qwen3.5 support for the personal OpenClaw training paths:

  • OpenClaw Combine
  • OpenClaw RL
  • OpenClaw OPD

The goal of this PR is to wire Qwen3.5 into the current OpenClaw/SLIME/Megatron stack and verify that the model can reach a real startup smoke path in the existing training/runtime flow.

Main Changes

Added Qwen3.5 integration across the current stack:

  • slime/slime_plugins/models/qwen3_5.py
  • slime/slime_plugins/mbridge/qwen3_5.py
  • slime/slime/backends/megatron_utils/megatron_to_hf/qwen3_5.py

Added or updated Qwen3.5 launch scripts:

  • slime/scripts/models/qwen3.5-4B.sh
  • openclaw-combine/run_qwen35_4b_openclaw_combine.sh
  • openclaw-rl/run_qwen35_4b_openclaw_rl.sh
  • openclaw-opd/run_qwen35_4b_openclaw_opd.sh

Added repo-side SGLang compatibility work for Qwen3.5 rollout serving:

  • slime/slime/backends/sglang_utils/qwen3_5.py
  • slime/slime/backends/sglang_utils/sglang_engine.py

Added a low-resource debug-rollout startup fix for PRM-based methods:

  • slime/slime/ray/placement_group.py

Dependency / Runtime Notes

Qwen3.5 support does not work with the current repo pin transformers==4.57.1.

During isolated validation, Qwen3.5 required a newer Transformers build that provides transformers.models.qwen3_5.
The validation environment used:

  • transformers==5.2.0

For runtime validation, I also used a newer SGLang-related stack in Docker:

  • sglang==0.5.9
  • flashinfer-python==0.6.6
  • sgl-kernel==0.3.21

For the reduced-GPU combine validation path, the environment also needed:

  • numpy<2 (Megatron currently rejects NumPy 2.x)
  • pylatexenc
  • wandb

I have not updated the repo-wide dependency pin in this PR yet, because I wanted to first confirm the Qwen3.5 path can actually run through the current training/runtime stack before proposing a broader dependency change.

Validation Completed

Completed checks:

  • Python syntax check for Qwen3.5 integration files
  • Shell syntax check for Qwen3.5 launch scripts
  • Qwen3.5 model spec consistency check
  • isolated environment verification that transformers supports qwen3_5
  • local checkpoint load validation for Qwen3.5-4B
  • Dockerized single-GPU SGLang smoke test for the Qwen3.5 text rollout path
  • reduced-GPU openclaw-combine startup smoke with PRM enabled

Single-GPU serving smoke result:

  • server startup succeeded
  • /health_generate returned 200
  • /model_info returned 200
  • /generate returned 200

Reduced-GPU combine smoke result:

  • validated with a reduced-GPU --debug-rollout-only startup path
  • fixed placement-group allocation so debug_rollout_only can still reserve PRM GPUs
  • fixed SGLang server-arg handling so a Qwen3.5 shadow text-only checkpoint is not incorrectly forced back through language_only -> encoder_urls validation
  • verified startup now progresses through:
    • Ray placement-group creation
    • RolloutManager creation
    • rollout router startup
    • PRM router startup
    • Qwen3.5 shadow text-only checkpoint preparation
    • rollout SGLang engine launch and weight loading
    • PRM SGLang engine launch and weight loading
    • /health_generate, /server_info, and /model_info
    • OpenClaw OPD proxy startup
    • policy server ready
    • final model is ready banner

This smoke run was intentionally bounded with a timeout after the stack became ready; it did not stop because of a Qwen3.5 startup crash.

Remaining Notes

Still not completed in this PR validation:

  • full multi-step training verification beyond startup smoke
  • broader multi-GPU validation
  • repo-wide dependency pin update decision

openclaw-test was not run here. It depends on a separately running OpenClaw gateway plus an external user-model endpoint/token, which was outside the repo-local Qwen3.5 training validation setup used for this PR.

Why This PR Is Ready For Review

At this point the main repo-side goal is met:

  • Qwen3.5 is wired into the existing stack
  • the runtime compatibility blockers in this branch have been addressed
  • the model now reaches a real reduced-resource OpenClaw Combine startup smoke path instead of failing immediately during import, conversion, or SGLang initialization

Follow-up work such as broader dependency pinning or more exhaustive training validation can still be done in subsequent PRs if preferred.

@ShenAC-SAC ShenAC-SAC changed the title [Draft] Add initial Qwen3.5-4B support and SGLang text-runtime patch Add initial Qwen3.5-4B support and SGLang text-runtime patch Mar 19, 2026
@ShenAC-SAC ShenAC-SAC marked this pull request as ready for review March 19, 2026 07:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant