Feat: Pre-quantized LLM model support #3740

keehyuna · 2025-08-01T00:02:11Z

Description

Support pre-quantized HF models and post-training quantization (PTQ) option for run_llm.py

Fixes # (issue)

Type of change

New feature (non-breaking change which adds functionality)

Checklist:

My code follows the style guidelines of this project (You can use the linters)
I have performed a self-review of my own code
I have commented my code, particularly in hard-to-understand areas and hacks
I have made corresponding changes to the documentation
I have added tests to verify my fix or my feature
New and existing unit tests pass locally with my changes
I have added the relevant labels to my PR in so that relevant reviewers are notified

narendasan · 2025-08-29T18:50:27Z

tools/llm/quantize_utils.py

+    return model
+
+
+class TensorRTQuantizedLinear(torch.nn.Module):


@peri044 Is this something we might want to upstream to ModelOpt in the future?

Or pull into main torch-tensorrt as a pass?

I guess its somewhat HF specific, so remaining in this tool would make sense but are there some parts we could make generic for any sort of quantization workflow (e.g. torchao)?

Thanks. I think quantize_model() can be moved to function like torch_tensorrt.dynamo.quantize(). Currently investigating how to separate the calibration data path from the quantization logic

narendasan · 2025-08-29T18:51:12Z

tools/llm/quantize_utils.py

+
+        hf_quant_algo = hf_quant_config.pop("quant_algo", None)
+        if hf_quant_algo != "FP8" and hf_quant_algo != "NVFP4":
+            raise RuntimeError("Only FP8 or NVFP4 quantization is supported")


How would it be different for MXFP4?

looked at quantization cfg in modelopt

NVFP4_DEFAULT_CFG NVFP4 has E4M3 scales and a block size is 16.

MXFP4_DEFAULT_CFG MXFP4 has E8M0 scales and a block size is 32.

tools/llm/run_llm.py

tools/llm/README.md

lanluo-nvidia · 2025-09-19T16:07:26Z

modelopt has changed their code structure in 0.35.0:
please make the same changes as here: 9c520f8

lanluo-nvidia · 2025-09-19T16:42:12Z

tools/llm/quantize_utils.py

+                input_amax = tensors.pop(input_scale_name) * 448.0
+
+                # Dequantize the weight using the scale factor
+                dequantized_weight_data = module.weight.to(torch.float32) * weight_scale


should we check if precison is fp16 then .to(torch.float16) otherwise float32?

Thanks, that makes sense. I've updated it to use the same model precision.

peri044

Functionality looks good to me. Posted some comments on code restructuring

py/torch_tensorrt/dynamo/_quantization.py

tools/llm/quantize_utils.py

tools/llm/README.md

peri044 · 2025-09-19T16:59:15Z

tools/llm/run_llm.py

+    hf_quant_config = load_quantization_config(args.model)
+    if hf_quant_config:
+        model = convert_linear_to_tensorrt_quantized(model, hf_quant_config).cuda()
+        print(f"Model converted to TensorRT quantized")


Consider changing this to a more informative message

peri044

LGTM pending CI failures

meta-cla bot added the cla signed label Aug 1, 2025

keehyuna self-assigned this Aug 6, 2025

keehyuna changed the title ~~fp8 pre-quantized model support~~ Pre-quantized model support Aug 7, 2025

keehyuna changed the title ~~Pre-quantized model support~~ Feat: Pre-quantized LLM model support Aug 7, 2025

keehyuna marked this pull request as ready for review August 7, 2025 12:39

keehyuna requested review from narendasan and peri044 and removed request for narendasan August 8, 2025 06:44

keehyuna force-pushed the quant_llm_merged branch from 62cb0f2 to 1f1cf7f Compare August 29, 2025 08:26

narendasan reviewed Aug 29, 2025

View reviewed changes

keehyuna force-pushed the quant_llm_merged branch from 19a4070 to 303775e Compare September 2, 2025 13:13

github-actions bot added component: api [Python] Issues re: Python API component: dynamo Issues relating to the `torch.compile` or `torch._dynamo.export` paths labels Sep 4, 2025

lanluo-nvidia reviewed Sep 19, 2025

View reviewed changes

tools/llm/README.md Outdated Show resolved Hide resolved

lanluo-nvidia reviewed Sep 19, 2025

View reviewed changes

peri044 requested changes Sep 19, 2025

View reviewed changes

github-actions bot removed component: api [Python] Issues re: Python API component: dynamo Issues relating to the `torch.compile` or `torch._dynamo.export` paths labels Sep 20, 2025

keehyuna force-pushed the quant_llm_merged branch 2 times, most recently from efa63db to 3670829 Compare September 22, 2025 03:15

peri044 approved these changes Sep 23, 2025

View reviewed changes

keehyuna force-pushed the quant_llm_merged branch from 1b5f7c0 to eeef1c4 Compare October 1, 2025 23:08

github-actions bot added the component: tests Issues re: Tests label Oct 2, 2025

keehyuna added 5 commits October 10, 2025 12:14

fp8/nvfp4 quantization support

ad54e80

chore: Detect pre-quantized hf model

57e37ca

feat: Expose quantization API in torch_tensorrt.dynamo

6b3f7f9

chore: address reviews

1dd7761

chore: api change in modelopt 0.35

83c5df7

chore: precision -> model_precision option

95f8e5b

keehyuna force-pushed the quant_llm_merged branch from b319eae to 95f8e5b Compare October 10, 2025 03:15

Feat: Pre-quantized LLM model support #3740

Are you sure you want to change the base?

Feat: Pre-quantized LLM model support #3740

Uh oh!

Conversation

keehyuna commented Aug 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Type of change

Checklist:

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

lanluo-nvidia commented Sep 19, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

peri044 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

peri044 left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

keehyuna commented Aug 1, 2025 •

edited

Loading