[MXFP4] Add calibration support #509

dsikka · 2025-11-04T19:52:16Z

Summary

Add option to generate mxfp4 scales when calculating qparams depending on qargs
Add preset schemes for MXFP4 and MXFP4A16
Update mxfp4_packed_compressor to additionally compress the generated scales during compression time. Decompression is not yet supported as general decompression of qparams does not work right now, until this fix lands: [WIP] fix qparams decompression #514

Testing:

See: [MXFp4] Add E2E Example Script for Llama3 llm-compressor#2042

Signed-off-by: Dipika Sikka <dipikasikka1@gmail.com>

# Summary - Requires: vllm-project/compressed-tensors#509 - Add script to generate an mxfp4 quantized model - This feature is currently experimental as support has not landed or tested in vLLM # Testing: Sample Model: - nm-testing/Meta-Llama-3-8B-Instruct-MXFP4 Sample Generation (Transformers): ```bash ========== SAMPLE GENERATION ============== <|begin_of_text|>Hello my name is Sophia and I am a 3rd year student at the University of California, Berkeley. I am a double major in Linguistics and Psychology, with a minor in Education. I am very interested in the way that language and culture interact, and I believe that education is the key to creating a more just and equitable society. I am a native speaker of English, and I have also studied Spanish, French, and Mandarin Chinese. I am very interested in the way that language can be used to bring ========================================== ``` Sample Config: ```yaml "quantization_config": { "config_groups": { "group_0": { "format": "mxfp4-pack-quantized", "input_activations": { "actorder": null, "block_structure": null, "dynamic": true, "group_size": 32, "num_bits": 4, "observer": null, "observer_kwargs": {}, "scale_dtype": "torch.uint8", "strategy": "group", "symmetric": true, "type": "float", "zp_dtype": null }, "output_activations": null, "targets": [ "Linear" ], "weights": { "actorder": null, "block_structure": null, "dynamic": false, "group_size": 32, "num_bits": 4, "observer": "minmax", "observer_kwargs": {}, "scale_dtype": "torch.uint8", "strategy": "group", "symmetric": true, "type": "float", "zp_dtype": null } } }, "format": "mxfp4-pack-quantized", } ``` --------- Signed-off-by: Dipika Sikka <ds3822@columbia.edu> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

dsikka added 20 commits October 28, 2025 20:56

update

1211455

add back test

41aa0fc

update

de9f16a

update

c02000d

fix serialization

fbccd40

fix condition

2a2f2a3

update

cbd6d66

update

6fca61f

update

e53bf78

update

dec2b2c

update

8b7181c

remove torch

9bd9040

update

ecb7d7f

update

933c624

update tests

ee742c0

update

e7475d2

update

e7d6b52

fix comment

1970b26

update

e8107e5

update

e571a36

This was referenced Nov 4, 2025

[MXFP4] Support MXFp4 Scheme #439

Closed

[MXFP4] Add calibration support #440

Closed

dsikka added 4 commits November 4, 2025 14:57

update

68771b1

fix typo

bf0d0c6

update

da3ad9f

updatE

94dbb58

Base automatically changed from quant_args_dtype to main November 10, 2025 16:05

dsikka added 3 commits November 10, 2025 18:23

Merge branch 'main' into mxfp4_calibration_support

f1e8110

Signed-off-by: Dipika Sikka <dipikasikka1@gmail.com>

rebase fixes

204a3de

more rebase fix

34bc8df

dsikka added 5 commits November 10, 2025 19:32

update

af074e5

update

a6dc025

update

c3b1e95

Merge branch 'main' into mxfp4_calibration_support

db4dd63

update

9dfb31c

This was referenced Nov 17, 2025

[MXFp4] Add E2E Example Script for Llama3 vllm-project/llm-compressor#2042

Merged

[WIP] fix qparams decompression #514

Open

dequant scales not support

7a2d7c8

dsikka marked this pull request as ready for review November 17, 2025 21:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[MXFP4] Add calibration support #509

[MXFP4] Add calibration support #509

Uh oh!

dsikka commented Nov 4, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[MXFP4] Add calibration support #509

Are you sure you want to change the base?

[MXFP4] Add calibration support #509

Uh oh!

Conversation

dsikka commented Nov 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

dsikka commented Nov 4, 2025 •

edited

Loading