-
Notifications
You must be signed in to change notification settings - Fork 16
[WIP][models][deepseek][qwen2.5] Add support for Qwen2.5 and Deepseek-Distilled-Qwen models #40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR adds support for Qwen2.5 and Deepseek-Distilled-Qwen models by implementing the Qwen2 architecture. The implementation reuses the Qwen3 tokenizer and includes special handling for Deepseek-R1-Distill-Qwen models.
- Implements Qwen2 model architecture with configuration, state management, and inference
- Adds model type detection and loading for QWEN_2 and DEEPSEEK_R1_DISTILL_QWEN
- Introduces model-specific behavior flags for system prompts, reasoning tokens, and begin-of-text handling
Reviewed Changes
Copilot reviewed 12 out of 12 changed files in this pull request and generated 5 comments.
Show a summary per file
File | Description |
---|---|
TornadoVMMasterPlan.java | Adds unsupported operation handling for new model types |
Qwen3Tokenizer.java | Updates token display logic to include reasoning tokens |
Qwen3.java | Adds shouldAddBeginOfText override |
Qwen2Configuration.java | Defines configuration structure for Qwen2 models |
Qwen2.java | Implements main Qwen2 model class with inference logic |
Qwen2ModelLoader.java | Handles loading and weight parsing for Qwen2 models |
ModelLoader.java | Adds model type detection for new models |
ModelType.java | Defines new model types and loading logic |
Model.java | Adds behavior flags and updates interactive/instruction methods |
Qwen2StandardWeights.java | Extends StandardWeights with Qwen2-specific bias tensors |
Qwen2State.java | Implements state management for Qwen2 models |
InferenceCore.java | Adds forwardJavaQwen2 inference implementation |
Comments suppressed due to low confidence (1)
src/main/java/com/example/model/Model.java:226
- This line appears to be in the wrong location. It seems to belong in the InferenceCore implementation rather than in the Model interface default method.
if (tokenizer().shouldDisplayToken(token)) {
|
||
@Override | ||
public int kvMul() { | ||
throw new UnsupportedOperationException("Not supported for Qwen2."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The error message should be more specific about why kvMul() is not supported for Qwen2 models or what alternative should be used.
throw new UnsupportedOperationException("Not supported for Qwen2."); | |
throw new UnsupportedOperationException("kvMul() is not supported for Qwen2 models because they do not use a key-value head multiplier. If you need key-value head information, use numberOfKeyValueHeads instead."); |
Copilot uses AI. Check for mistakes.
} | ||
|
||
/** | ||
* No <|beginoftext|> needed for Qwen models. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment contains inconsistent formatting. It should be either '<|beginoftext|>' or '<|begin_of_text|>' to match the actual token format used elsewhere in the codebase.
* No <|beginoftext|> needed for Qwen models. | |
* No <|begin_of_text|> needed for Qwen models. |
Copilot uses AI. Check for mistakes.
//Qwen2 specific fields TODO | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The TODO comment indicates incomplete implementation. Either implement the Qwen2-specific fields or remove the comment if no additional fields are needed.
//Qwen2 specific fields TODO |
Copilot uses AI. Check for mistakes.
|
||
public Qwen2State(Configuration config, int batchsize) { | ||
super(config, batchsize); | ||
// Initialize Qwen2-specific fields TODO |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another TODO comment indicating incomplete initialization. Either implement the initialization logic or remove the comment if no additional initialization is required.
Copilot uses AI. Check for mistakes.
new ChatTokens( "<|begin▁of▁sentence|>", "", "", "<|end▁of▁sentence|>", "") : | ||
new ChatTokens( "<|im_start|>", "<|im_end|>", "", "<|end_of_text|>", "<|endoftext|>"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The chat tokens configuration uses different Unicode characters for DeepSeek ('|' vs '|'). This should be documented or extracted to constants to avoid confusion about the character differences.
new ChatTokens( "<|begin▁of▁sentence|>", "", "", "<|end▁of▁sentence|>", "") : | |
new ChatTokens( "<|im_start|>", "<|im_end|>", "", "<|end_of_text|>", "<|endoftext|>"); | |
new ChatTokens(DEEPSEEK_BEGIN_OF_SENTENCE, "", "", DEEPSEEK_END_OF_SENTENCE, "") : | |
new ChatTokens(QWEN2_IM_START, QWEN2_IM_END, "", QWEN2_END_OF_TEXT, QWEN2_ENDOFTEXT); |
Copilot uses AI. Check for mistakes.
36ddcd6
to
3dd3474
Compare
No description provided.