Skip to content

[contrib] Add Qwen3.5 4B and 9B hybrid DeltaNet contrib models#152

Open
m-deepankar-singh wants to merge 4 commits intoaws-neuron:mainfrom
m-deepankar-singh:codex/qwen35-4b-9b-contrib-main
Open

[contrib] Add Qwen3.5 4B and 9B hybrid DeltaNet contrib models#152
m-deepankar-singh wants to merge 4 commits intoaws-neuron:mainfrom
m-deepankar-singh:codex/qwen35-4b-9b-contrib-main

Conversation

@m-deepankar-singh
Copy link
Copy Markdown

@m-deepankar-singh m-deepankar-singh commented Apr 30, 2026

Description

Adds Qwen3.5-4B and Qwen3.5-9B contrib model implementations for the dense hybrid DeltaNet + GQA architecture. This builds on Jim Burtoft's Qwen3.5/Qwen3.6 hybrid DeltaNet contrib work and keeps the proven dummy-KV plus side-channel DeltaNet state pattern.

This PR is intentionally scoped to Qwen3.5-4B and Qwen3.5-9B only. Qwen3.5-2B and Qwen3.6-27B fixes will be handled in separate follow-up PRs based on Jim's PR 141/140 work.

Model Information

Model Name: Qwen3.5-4B and Qwen3.5-9B

Model Architecture: Dense decoder-only hybrid architecture: repeating 3 DeltaNet layers + 1 GQA softmax attention layer

Purpose: Text generation

Checklist

Required Components

  • Accuracy Test (ex. test/integration/test_model.py)
    • Integration tests compile and run the models on Neuron.
    • Includes factual/coherence generation checks and an Olympics invalid-token regression for the NaN-logit failure mode.
  • README.md with the following sections:
    • Usage Example
    • Compatibility Matrix
    • Example Checkpoints
    • Testing Instructions
  • Source Code (src/)
    • Modeling code follows the contrib folder hierarchy and NxDI patterns.

Optional Components

  • Unit Tests (CPU or Neuron-based)
    • CPU unit tests under test/unit/, including DeltaNet decay bounding coverage and hybrid cache manager coverage.

Folder Structure

/contrib/models/Qwen3.5-4B/
  README.md
  /src
  /test
    /unit
    /integration/test_model.py

/contrib/models/Qwen3.5-9B/
  README.md
  /src
  /test
    /unit
    /integration/test_model.py

Testing

Tested on trn2.48xlarge, SDK PyTorch 2.9 / NxDI inference environment, TP=4, BF16, seq_len=160.

Qwen3.5-4B

  • Unit: 45 passed
  • Integration: 9 passed
  • TTFT: 83.2 ms
  • Throughput: 68.1 tok/s

Qwen3.5-9B

  • Unit: 44 passed
  • Integration: 9 passed
  • TTFT: 88.1 ms
  • Throughput: 49.6 tok/s

Compatibility

Tested with:

  • Neuron SDK Version(s): PyTorch 2.9 / NxDI inference environment, NKI 0.3-compatible SDK
  • Instance Type(s): trn2.48xlarge
  • PyTorch Version: 2.9.0
  • Python Version: 3.12.3

Additional Information

Known limitations:

  • DeltaNet weights are replicated across TP ranks in this first contrib version.
  • DeltaNet layers still support the original dummy-KV path; the hybrid cache manager is opt-in for validation.
  • TP=2, Trn1, long-context HBM limits, quantization, speculative decoding, and MoE are out of scope for this PR.

Follow-up work may cover Qwen3.5-2B, Qwen3.6-27B, full stable-solve propagation, hybrid cache hardening, and quantization support.

Related Issues

This PR builds on Jim Burtoft's Qwen3.5/Qwen3.6 hybrid DeltaNet contrib work.

vLLM Integration

  • This model/feature is intended for use with vLLM
  • Documentation includes vLLM registration instructions

By submitting this PR, I confirm that:

  • I have read and followed the contributing guidelines
  • This is a community contribution and may have limited testing compared to officially-supported models
  • The code follows best practices and is well-documented
  • All required components listed above are included

Adds Qwen3.5 dense hybrid DeltaNet/GQA contrib model variants for 4B and 9B, including NKI DeltaNet kernels, weight conversion tests, and Trainium integration tests.

This builds on Jim Burtoft's Qwen3.5/Qwen3.6 contrib work in PR aws-neuron#141 and PR aws-neuron#140; his dummy-KV plus side-channel DeltaNet state pattern is the baseline used here.
@m-deepankar-singh m-deepankar-singh marked this pull request as ready for review May 4, 2026 12:11
@m-deepankar-singh m-deepankar-singh changed the title [codex] Add Qwen3.5 4B and 9B hybrid DeltaNet contrib models [contrib] Add Qwen3.5 4B and 9B hybrid DeltaNet contrib models May 4, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant