Support Kimi-K2.5 PTQ #820

Edwardf0t1 · 2026-01-27T07:51:23Z

What does this PR do?

Type of change: ?

Overview: ?

Usage

# Add a code snippet demonstrating how to use this

Testing

Before your PR is "Ready for review"

Make sure you read and follow Contributor guidelines and your commits are signed.
Is this change backward compatible?: Yes/No
Did you write any new necessary tests?: Yes/No
Did you add or update any necessary documentation?: Yes/No
Did you update Changelog?: Yes/No

Additional Information

Signed-off-by: Zhiyu <zhiyuc@nvidia.com>

copy-pr-bot · 2026-01-27T07:51:27Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

coderabbitai · 2026-01-27T07:51:34Z

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

🔍 Trigger a full review

✨ Finishing touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch zhiyu/support-kimi-k2.5-ptq

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

codecov · 2026-01-27T08:03:40Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 73.38%. Comparing base (5cc2a54) to head (99912fb).
⚠️ Report is 13 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #820      +/-   ##
==========================================
- Coverage   74.13%   73.38%   -0.75%     
==========================================
  Files         192      193       +1     
  Lines       19263    19893     +630     
==========================================
+ Hits        14280    14598     +318     
- Misses       4983     5295     +312

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Signed-off-by: Zhiyu <zhiyuc@nvidia.com>

cjluo-nv

qq: if you just load kimi k2.5 using HF and do a generation call (not using modeopt) Were you able to do it?

cjluo-nv · 2026-01-29T21:32:10Z

examples/llm_ptq/example_utils.py

    return dtype


+def _patch_compressed_linear_init():


Can it be a transformers version issue? I was able to load kimi k2 thinking int4 without an issue. Is this specific to kimi k2.5?

cjluo-nv · 2026-01-29T21:32:50Z

examples/llm_ptq/example_utils.py

+    print("Patched CompressedLinear for transformers compatibility")
+
+
+def _unpack_compressed_linear_weights(model, ckpt_path=None):


we do not need it. We should be able to unpack on the fly with logics in the quantization plugins

cjluo-nv · 2026-01-29T21:33:15Z

examples/llm_ptq/example_utils.py

-        ):
-            torch_dtype = getattr(hf_config, "torch_dtype", torch.bfloat16)
+        elif has_pack_quantized_config(hf_config):
+            # Patch CompressedLinear before loading to handle missing weight attribute


I don't think you need this

cjluo-nv · 2026-01-29T21:35:11Z

modelopt/torch/quantization/plugins/huggingface.py


        if self.quantization_status == QuantizationStatus.COMPRESSED:
-            weight_data = self.compressor.decompress_module(self)
+            # Check if we should use decompress_module or manual decompress_weight


is this specific to kimi k2.5?

Signed-off-by: Zhiyu <zhiyuc@nvidia.com>

support kimi k2.5 ptq

ef2c2c4

Signed-off-by: Zhiyu <zhiyuc@nvidia.com>

Edwardf0t1 added 4 commits January 27, 2026 18:06

update

1fa9847

Signed-off-by: Zhiyu <zhiyuc@nvidia.com>

updates, fix weight_shape tensor issue

338b22b

Signed-off-by: Zhiyu <zhiyuc@nvidia.com>

decompress on CPU

efb3948

Signed-off-by: Zhiyu <zhiyuc@nvidia.com>

decompress experts on-the-fly to avoid OOTM

e407028

Signed-off-by: Zhiyu <zhiyuc@nvidia.com>

cjluo-nv reviewed Jan 29, 2026

View reviewed changes

resolve issues when use large calib size

99912fb

Signed-off-by: Zhiyu <zhiyuc@nvidia.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support Kimi-K2.5 PTQ #820

Support Kimi-K2.5 PTQ #820

Uh oh!

Edwardf0t1 commented Jan 27, 2026

Uh oh!

copy-pr-bot bot commented Jan 27, 2026

Uh oh!

coderabbitai bot commented Jan 27, 2026 •

edited

Loading

Review skipped

Uh oh!

codecov bot commented Jan 27, 2026 •

edited

Loading

Uh oh!

cjluo-nv left a comment

Uh oh!

cjluo-nv Jan 29, 2026

Uh oh!

cjluo-nv Jan 29, 2026

Uh oh!

cjluo-nv Jan 29, 2026

Uh oh!

cjluo-nv Jan 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		print("Patched CompressedLinear for transformers compatibility")


		def _unpack_compressed_linear_weights(model, ckpt_path=None):

Support Kimi-K2.5 PTQ #820

Are you sure you want to change the base?

Support Kimi-K2.5 PTQ #820

Uh oh!

Conversation

Edwardf0t1 commented Jan 27, 2026

What does this PR do?

Usage

Testing

Before your PR is "Ready for review"

Additional Information

Uh oh!

copy-pr-bot bot commented Jan 27, 2026

Uh oh!

coderabbitai bot commented Jan 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Uh oh!

codecov bot commented Jan 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

cjluo-nv left a comment

Choose a reason for hiding this comment

Uh oh!

cjluo-nv Jan 29, 2026

Choose a reason for hiding this comment

Uh oh!

cjluo-nv Jan 29, 2026

Choose a reason for hiding this comment

Uh oh!

cjluo-nv Jan 29, 2026

Choose a reason for hiding this comment

Uh oh!

cjluo-nv Jan 29, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

coderabbitai bot commented Jan 27, 2026 •

edited

Loading

codecov bot commented Jan 27, 2026 •

edited

Loading