Skip to content

[WIP][models][deepseek][qwen2.5] Add support for Qwen2.5 and Deepseek-Distilled-Qwen models #40

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 15 commits into
base: main
Choose a base branch
from

Conversation

orionpapadakis
Copy link
Collaborator

No description provided.

@mikepapadim mikepapadim changed the title [WIP] Add support for Qwen2.5 and Deepseek-Distilled-Qwen models [WIP][models][deepseek][qwen2.5] Add support for Qwen2.5 and Deepseek-Distilled-Qwen models Aug 1, 2025
@mikepapadim mikepapadim requested a review from Copilot August 4, 2025 10:50
Copilot

This comment was marked as outdated.

@mikepapadim mikepapadim requested a review from Copilot August 6, 2025 13:02
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds support for Qwen2.5 and Deepseek-Distilled-Qwen models by implementing the Qwen2 architecture. The implementation reuses the Qwen3 tokenizer and includes special handling for Deepseek-R1-Distill-Qwen models.

  • Implements Qwen2 model architecture with configuration, state management, and inference
  • Adds model type detection and loading for QWEN_2 and DEEPSEEK_R1_DISTILL_QWEN
  • Introduces model-specific behavior flags for system prompts, reasoning tokens, and begin-of-text handling

Reviewed Changes

Copilot reviewed 12 out of 12 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
TornadoVMMasterPlan.java Adds unsupported operation handling for new model types
Qwen3Tokenizer.java Updates token display logic to include reasoning tokens
Qwen3.java Adds shouldAddBeginOfText override
Qwen2Configuration.java Defines configuration structure for Qwen2 models
Qwen2.java Implements main Qwen2 model class with inference logic
Qwen2ModelLoader.java Handles loading and weight parsing for Qwen2 models
ModelLoader.java Adds model type detection for new models
ModelType.java Defines new model types and loading logic
Model.java Adds behavior flags and updates interactive/instruction methods
Qwen2StandardWeights.java Extends StandardWeights with Qwen2-specific bias tensors
Qwen2State.java Implements state management for Qwen2 models
InferenceCore.java Adds forwardJavaQwen2 inference implementation
Comments suppressed due to low confidence (1)

src/main/java/com/example/model/Model.java:226

  • This line appears to be in the wrong location. It seems to belong in the InferenceCore implementation rather than in the Model interface default method.
                if (tokenizer().shouldDisplayToken(token)) {


@Override
public int kvMul() {
throw new UnsupportedOperationException("Not supported for Qwen2.");
Copy link
Preview

Copilot AI Aug 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error message should be more specific about why kvMul() is not supported for Qwen2 models or what alternative should be used.

Suggested change
throw new UnsupportedOperationException("Not supported for Qwen2.");
throw new UnsupportedOperationException("kvMul() is not supported for Qwen2 models because they do not use a key-value head multiplier. If you need key-value head information, use numberOfKeyValueHeads instead.");

Copilot uses AI. Check for mistakes.

}

/**
* No <|beginoftext|> needed for Qwen models.
Copy link
Preview

Copilot AI Aug 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment contains inconsistent formatting. It should be either '<|beginoftext|>' or '<|begin_of_text|>' to match the actual token format used elsewhere in the codebase.

Suggested change
* No <|beginoftext|> needed for Qwen models.
* No <|begin_of_text|> needed for Qwen models.

Copilot uses AI. Check for mistakes.

Comment on lines 12 to 13
//Qwen2 specific fields TODO

Copy link
Preview

Copilot AI Aug 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The TODO comment indicates incomplete implementation. Either implement the Qwen2-specific fields or remove the comment if no additional fields are needed.

Suggested change
//Qwen2 specific fields TODO

Copilot uses AI. Check for mistakes.


public Qwen2State(Configuration config, int batchsize) {
super(config, batchsize);
// Initialize Qwen2-specific fields TODO
Copy link
Preview

Copilot AI Aug 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another TODO comment indicating incomplete initialization. Either implement the initialization logic or remove the comment if no additional initialization is required.

Copilot uses AI. Check for mistakes.

Comment on lines 74 to 75
new ChatTokens( "<|begin▁of▁sentence|>", "", "", "<|end▁of▁sentence|>", "") :
new ChatTokens( "<|im_start|>", "<|im_end|>", "", "<|end_of_text|>", "<|endoftext|>");
Copy link
Preview

Copilot AI Aug 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The chat tokens configuration uses different Unicode characters for DeepSeek ('|' vs '|'). This should be documented or extracted to constants to avoid confusion about the character differences.

Suggested change
new ChatTokens( "<|begin▁of▁sentence|>", "", "", "<|end▁of▁sentence|>", "") :
new ChatTokens( "<|im_start|>", "<|im_end|>", "", "<|end_of_text|>", "<|endoftext|>");
new ChatTokens(DEEPSEEK_BEGIN_OF_SENTENCE, "", "", DEEPSEEK_END_OF_SENTENCE, "") :
new ChatTokens(QWEN2_IM_START, QWEN2_IM_END, "", QWEN2_END_OF_TEXT, QWEN2_ENDOFTEXT);

Copilot uses AI. Check for mistakes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant