-
Notifications
You must be signed in to change notification settings - Fork 247
Support Kimi-K2.5 PTQ #820
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: Zhiyu <zhiyuc@nvidia.com>
|
Important Review skippedDraft detected. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the
✨ Finishing touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #820 +/- ##
==========================================
- Coverage 74.13% 73.38% -0.75%
==========================================
Files 192 193 +1
Lines 19263 19893 +630
==========================================
+ Hits 14280 14598 +318
- Misses 4983 5295 +312 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Signed-off-by: Zhiyu <zhiyuc@nvidia.com>
Signed-off-by: Zhiyu <zhiyuc@nvidia.com>
Signed-off-by: Zhiyu <zhiyuc@nvidia.com>
cjluo-nv
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
qq: if you just load kimi k2.5 using HF and do a generation call (not using modeopt) Were you able to do it?
| return dtype | ||
|
|
||
|
|
||
| def _patch_compressed_linear_init(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can it be a transformers version issue? I was able to load kimi k2 thinking int4 without an issue. Is this specific to kimi k2.5?
| print("Patched CompressedLinear for transformers compatibility") | ||
|
|
||
|
|
||
| def _unpack_compressed_linear_weights(model, ckpt_path=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we do not need it. We should be able to unpack on the fly with logics in the quantization plugins
| ): | ||
| torch_dtype = getattr(hf_config, "torch_dtype", torch.bfloat16) | ||
| elif has_pack_quantized_config(hf_config): | ||
| # Patch CompressedLinear before loading to handle missing weight attribute |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think you need this
|
|
||
| if self.quantization_status == QuantizationStatus.COMPRESSED: | ||
| weight_data = self.compressor.decompress_module(self) | ||
| # Check if we should use decompress_module or manual decompress_weight |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this specific to kimi k2.5?
Signed-off-by: Zhiyu <zhiyuc@nvidia.com>
What does this PR do?
Type of change: ?
Overview: ?
Usage
# Add a code snippet demonstrating how to use thisTesting
Before your PR is "Ready for review"
Additional Information