Commit 148efda
[MXFp4] Add E2E Example Script for Llama3 (#2042)
# Summary
- Requires: vllm-project/compressed-tensors#509
- Add script to generate an mxfp4 quantized model
- This feature is currently experimental as support has not landed or
tested in vLLM
# Testing:
Sample Model:
- nm-testing/Meta-Llama-3-8B-Instruct-MXFP4
Sample Generation (Transformers):
```bash
========== SAMPLE GENERATION ==============
<|begin_of_text|>Hello my name is Sophia and I am a 3rd year student at the University of California, Berkeley. I am a double major in Linguistics and Psychology, with a minor in Education. I am very interested in the way that language and culture interact, and I believe that education is the key to creating a more just and equitable society.
I am a native speaker of English, and I have also studied Spanish, French, and Mandarin Chinese. I am very interested in the way that language can be used to bring
==========================================
```
Sample Config:
```yaml
"quantization_config": {
"config_groups": {
"group_0": {
"format": "mxfp4-pack-quantized",
"input_activations": {
"actorder": null,
"block_structure": null,
"dynamic": true,
"group_size": 32,
"num_bits": 4,
"observer": null,
"observer_kwargs": {},
"scale_dtype": "torch.uint8",
"strategy": "group",
"symmetric": true,
"type": "float",
"zp_dtype": null
},
"output_activations": null,
"targets": [
"Linear"
],
"weights": {
"actorder": null,
"block_structure": null,
"dynamic": false,
"group_size": 32,
"num_bits": 4,
"observer": "minmax",
"observer_kwargs": {},
"scale_dtype": "torch.uint8",
"strategy": "group",
"symmetric": true,
"type": "float",
"zp_dtype": null
}
}
},
"format": "mxfp4-pack-quantized",
}
```
---------
Signed-off-by: Dipika Sikka <ds3822@columbia.edu>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>1 parent 560bb9c commit 148efda
2 files changed
+38
-0
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
0 commit comments