[WIP][models][deepseek][qwen2.5] Add support for Qwen2.5 and Deepseek-Distilled-Qwen models #40

orionpapadakis · 2025-08-01T18:16:30Z

No description provided.

Copilot

Pull Request Overview

This PR adds support for Qwen2.5 and Deepseek-Distilled-Qwen models by implementing the Qwen2 architecture. The implementation reuses the Qwen3 tokenizer and includes special handling for Deepseek-R1-Distill-Qwen models.

Implements Qwen2 model architecture with configuration, state management, and inference
Adds model type detection and loading for QWEN_2 and DEEPSEEK_R1_DISTILL_QWEN
Introduces model-specific behavior flags for system prompts, reasoning tokens, and begin-of-text handling

Reviewed Changes

Copilot reviewed 12 out of 12 changed files in this pull request and generated 5 comments.

Show a summary per file

File	Description
TornadoVMMasterPlan.java	Adds unsupported operation handling for new model types
Qwen3Tokenizer.java	Updates token display logic to include reasoning tokens
Qwen3.java	Adds shouldAddBeginOfText override
Qwen2Configuration.java	Defines configuration structure for Qwen2 models
Qwen2.java	Implements main Qwen2 model class with inference logic
Qwen2ModelLoader.java	Handles loading and weight parsing for Qwen2 models
ModelLoader.java	Adds model type detection for new models
ModelType.java	Defines new model types and loading logic
Model.java	Adds behavior flags and updates interactive/instruction methods
Qwen2StandardWeights.java	Extends StandardWeights with Qwen2-specific bias tensors
Qwen2State.java	Implements state management for Qwen2 models
InferenceCore.java	Adds forwardJavaQwen2 inference implementation

Comments suppressed due to low confidence (1)

src/main/java/com/example/model/Model.java:226

This line appears to be in the wrong location. It seems to belong in the InferenceCore implementation rather than in the Model interface default method.

                if (tokenizer().shouldDisplayToken(token)) {

Copilot · 2025-08-06T13:03:48Z

src/main/java/com/example/model/qwen2/Qwen2Configuration.java

+
+    @Override
+    public int kvMul() {
+        throw new UnsupportedOperationException("Not supported for Qwen2.");


The error message should be more specific about why kvMul() is not supported for Qwen2 models or what alternative should be used.

Suggested change

throw new UnsupportedOperationException("Not supported for Qwen2.");

throw new UnsupportedOperationException("kvMul() is not supported for Qwen2 models because they do not use a key-value head multiplier. If you need key-value head information, use numberOfKeyValueHeads instead.");

Copilot · 2025-08-06T13:03:48Z

src/main/java/com/example/model/qwen2/Qwen2.java

+    }
+
+    /**
+     * No <|beginoftext|> needed for Qwen models.


Copilot · 2025-08-06T13:03:48Z

src/main/java/com/example/inference/state/Qwen2State.java

+    //Qwen2 specific fields TODO
+


The TODO comment indicates incomplete implementation. Either implement the Qwen2-specific fields or remove the comment if no additional fields are needed.

Suggested change

//Qwen2 specific fields TODO

Copilot · 2025-08-06T13:03:49Z

src/main/java/com/example/inference/state/Qwen2State.java

+
+    public Qwen2State(Configuration config, int batchsize) {
+        super(config, batchsize);
+        // Initialize Qwen2-specific fields TODO


Another TODO comment indicating incomplete initialization. Either implement the initialization logic or remove the comment if no additional initialization is required.

Copilot · 2025-08-06T13:03:49Z

src/main/java/com/example/model/loader/Qwen2ModelLoader.java

+                    new ChatTokens( "<｜begin▁of▁sentence｜>", "", "", "<｜end▁of▁sentence｜>", "") :
+                    new ChatTokens( "<|im_start|>", "<|im_end|>", "", "<|end_of_text|>", "<|endoftext|>");


The chat tokens configuration uses different Unicode characters for DeepSeek ('｜' vs '|'). This should be documented or extracted to constants to avoid confusion about the character differences.

Suggested change

new ChatTokens( "<｜begin▁of▁sentence｜>", "", "", "<｜end▁of▁sentence｜>", "") :

new ChatTokens( "<|im_start|>", "<|im_end|>", "", "<|end_of_text|>", "<|endoftext|>");

new ChatTokens(DEEPSEEK_BEGIN_OF_SENTENCE, "", "", DEEPSEEK_END_OF_SENTENCE, "") :

new ChatTokens(QWEN2_IM_START, QWEN2_IM_END, "", QWEN2_END_OF_TEXT, QWEN2_ENDOFTEXT);

orionpapadakis requested a review from mikepapadim August 1, 2025 18:16

orionpapadakis added the models label Aug 1, 2025

mikepapadim assigned orionpapadakis Aug 1, 2025

mikepapadim changed the title ~~[WIP] Add support for Qwen2.5 and Deepseek-Distilled-Qwen models~~ [WIP][models][deepseek][qwen2.5] Add support for Qwen2.5 and Deepseek-Distilled-Qwen models Aug 1, 2025

mikepapadim requested a review from Copilot August 4, 2025 10:50

This comment was marked as outdated.

Sign in to view

mikepapadim requested a review from Copilot August 6, 2025 13:02

Copilot AI reviewed Aug 6, 2025

View reviewed changes

orionpapadakis added 10 commits August 6, 2025 16:25

Add model loader for qwen2

c878541

Add weights for qwen2

177476a

Add state for qwen2

5a3ab76

Add class for qwen2

6adc02c

Add configuration class for qwen2

f96ddc4

Add forward method for Qwen2

c4562ad

Extend logic for Qwen2

35b993e

Distinct Deepseek-R1-Distill-Qwen from Qwen2

d1239eb

Fix reasoning management in Deepseek-R1-Distill-Qwen and Qwen models

1fba5bf

Fix refactor issues

3dd3474

orionpapadakis force-pushed the feat/qwen2 branch from 36ddcd6 to 3dd3474 Compare August 6, 2025 13:53

orionpapadakis added 5 commits August 7, 2025 14:31

Remove redundant if conditions

4ef777c

Add loadWeights method for Qwen2

4029373

Fix Load XXX model message in model loader

1e1ec8a

Introduce TornadoWeights for Qwen2

09f1b4d

Initial commit for Qwen2TornadoVMLayerPlanner

abc1b2b

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[WIP][models][deepseek][qwen2.5] Add support for Qwen2.5 and Deepseek-Distilled-Qwen models #40

[WIP][models][deepseek][qwen2.5] Add support for Qwen2.5 and Deepseek-Distilled-Qwen models #40

Uh oh!

orionpapadakis commented Aug 1, 2025

Uh oh!

This comment was marked as outdated.

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Aug 6, 2025

Uh oh!

Copilot AI Aug 6, 2025

Uh oh!

Copilot AI Aug 6, 2025

Uh oh!

Copilot AI Aug 6, 2025

Uh oh!

Copilot AI Aug 6, 2025

Uh oh!

Uh oh!

	throw new UnsupportedOperationException("Not supported for Qwen2.");
	throw new UnsupportedOperationException("kvMul() is not supported for Qwen2 models because they do not use a key-value head multiplier. If you need key-value head information, use numberOfKeyValueHeads instead.");

	* No <\|beginoftext\|> needed for Qwen models.
	* No <\|begin_of_text\|> needed for Qwen models.

		new ChatTokens( "<｜begin▁of▁sentence｜>", "", "", "<｜end▁of▁sentence｜>", "") :
		new ChatTokens( "<\|im_start\|>", "<\|im_end\|>", "", "<\|end_of_text\|>", "<\|endoftext\|>");

[WIP][models][deepseek][qwen2.5] Add support for Qwen2.5 and Deepseek-Distilled-Qwen models #40

Are you sure you want to change the base?

[WIP][models][deepseek][qwen2.5] Add support for Qwen2.5 and Deepseek-Distilled-Qwen models #40

Uh oh!

Conversation

orionpapadakis commented Aug 1, 2025

Uh oh!

This comment was marked as outdated.

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Copilot AI Aug 6, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Aug 6, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Aug 6, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Aug 6, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Aug 6, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!