Integrate Automated QDQ placement tool - Part 3 #703

willg-nv · 2025-12-17T06:56:58Z

What does this PR do?

Type of change: new feature

Overview: This PR integrates automated QDQ placement tool to ModelOpt. This PR is 3/4 of the change. This PR contains the following changes:

Implements QDQAutotuner and Autotuner CLI interface
Implements Benchmark to measure E2E time of QDQ models.
unit tests for QDQ Autotuner and config.

Part 1: #701
Part 2: #702
Part 3: #703
Part 4: #704

Usage

python -m modelopt.onnx.quantization.autotune --model model.onnx

Testing

Implemented unit tests for QDQAutotuner and Config classes.

Before your PR is "Ready for review"

Make sure you read and follow Contributor guidelines and your commits are signed.
Is this change backward compatible?: Yes
Did you write any new necessary tests?: Yes
Did you add or update any necessary documentation?: No, document will be in part 4.
Did you update Changelog?: No. change log will be in part 4.

Additional Information

copy-pr-bot · 2025-12-17T06:57:01Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

willg-nv · 2025-12-22T03:24:50Z

@vishalpandya1990 could you help me review this PR? thanks!

vishalpandya1990 · 2025-12-22T06:34:13Z

@vishalpandya1990 could you help me review this PR? thanks!

Sorry for the delay. Added Ajinkya for review.

modelopt/onnx/quantization/autotune/__init__.py

gcunhase · 2026-01-08T20:45:18Z

modelopt/onnx/quantization/autotune/cli.py

+        "--output",
+        "-o",
+        type=str,
+        default=DEFAULT_OUTPUT_DIR,


Can we update this behavior to match the ONNX quantization and Autocast workflow?

In there, if output_path is not given, the resulting model is saved in the same path as the input model with a name extension. For ex: the quantized model.onnx is saved as model.quant.onnx and the converted model is saved as model.fp16.onnx.

See more details in:

Model-Optimizer/modelopt/onnx/quantization/quantize.py

Line 391 in 6f18490

if not output_path:

Suggestion: rename this as output_path to match other ONNX workflows.

FYI: I plan to create another PR to integrate autotuner as a sub-command of modelopt.onnx.quantization. User could 1) directly run autotuner, 2) or autotune based on PTQ model. After that, I think cli.py could be removed.

In the meanwhile, could we move cli.py this to __main__.py and remove cli.py? Then the workflow could match quantization and offer autotune as a standalone feature separate from quantization for debugging purposes.

modelopt/onnx/quantization/autotune/cli.py

gcunhase · 2026-01-08T20:54:25Z

tests/unit/onnx/quantization/autotune/test_autotuner.py

+from modelopt.onnx.quantization.autotune.common import PatternCache, RegionType
+
+
+def create_simple_conv_model():


Can we move this to tests/_test_utils/onnx/lib_test_models.py? Alternatively, NonSimplifiedModel or build_resnet_block could be used here instead?

@ajrasane WDYT?

I agree, lets keep this in a central place.

Signed-off-by: Will Guo <willg@nvidia.com>

gcunhase · 2026-01-23T22:18:40Z

modelopt/onnx/quantization/autotune/__init__.py

+    ResolvedInsertionPoint,
+)
+from .region_pattern import RegionPattern
+from .region_search import CombinedRegionSearch


Missing:

from .autotuner import QDQAutotuner, QDQAutotunerBase

gcunhase · 2026-01-23T22:20:07Z

modelopt/onnx/quantization/autotune/__init__.py

+    "PatternCache",
+    "PatternSchemes",
+    "Region",
+    "RegionError",


Missing implementation of RegionError

gcunhase · 2026-01-23T22:25:55Z

modelopt/onnx/quantization/autotune/workflows.py

+
+    for region_idx, region in enumerate(regions):
+        logger.info(
+            f"Region {region_idx + 1}/{len(regions)} (ID={region.id}, level={region.get_level()})"


Error:

modelopt/onnx/quantization/autotune/autotuner.py:193: error: "Region" has no attribute "get_level" [attr-defined] modelopt/onnx/quantization/autotune/autotuner.py:194: error: "Region" has no attribute "get_size" [attr-defined] modelopt/onnx/quantization/autotune/workflows.py:335: error: "Region" has no attribute "get_level" [attr-defined] modelopt/onnx/quantization/autotune/workflows.py:384: error: "Region" has no attribute "get_level" [attr-defined]

gcunhase · 2026-01-24T05:11:10Z

tests/unit/onnx/quantization/autotune/test_autotuner.py

+            else:
+                print("✓ QDQAutotuner (no schemes to test tracking)")
+        else:
+            self.skipTest("No regions discovered")


Can you also add a function to test the full autotuner workflow? ONNX -> InsertionPoints -> Quantized ONNX. Thanks.

gcunhase · 2026-01-26T20:37:30Z

modelopt/onnx/quantization/autotune/autotuner.py

+
+        if needs_fp8_conversion:
+            logger.debug("Converting INT8 to FP8")
+            model = int8_to_fp8(model)


Is this conversion function needed or can we insert Q/DQ nodes already at the correct precision?

Tried the following test code:

def test_export_quantized_model(self): """Test exporting quantized model with Q/DQ.""" model = create_simple_conv_model() autotuner = QDQAutotuner(model) config = self._create_test_config() autotuner.initialize(config) with open("/tmp/autotuner_model.quant.onnx", "w") as f: # tempfile.NamedTemporaryFile(suffix=".onnx", delete=False) as f: output_path = f.name try: # Export baseline without Q/DQ insertion autotuner.export_onnx(output_path, insert_qdq=True) # Verify file was created assert os.path.exists(output_path) # Verify it's a valid ONNX model exported_model = onnx.load(output_path) assert exported_model is not None # Verify that it contains Q/DQ nodes qdq_nodes = [n for n in exported_model.graph.node if n.op_type in ["QuantizeLinear", "DequantizeLinear"]] assert qdq_nodes, "Q/DQ nodes not found in quantized model" print("✓ QDQAutotuner export quantized model") finally: print() # if os.path.exists(output_path): # os.unlink(output_path)

But the simple Conv->Relu model didn't get quantized. Is this expected?

[modelopt][onnx] - DEBUG - Region 0 (level 0) [modelopt][onnx] - DEBUG - → Pattern signature: Conv->Relu [modelopt][onnx] - DEBUG - → No scheme available, skipping [modelopt][onnx] - DEBUG - Matched 0/1 regions, total 0 unique insertion points [modelopt][onnx] - DEBUG - Inserting 0 Q/DQ pairs into graph [modelopt][onnx] - DEBUG - Serializing to ONNX format [modelopt][onnx] - INFO - Exported INT8 model with 0 Q/DQ pairs → /tmp/autotuner_model.quant.onnx ✓ QDQAutotuner export quantized model

gcunhase · 2026-01-26T20:49:51Z

tests/unit/onnx/quantization/autotune/test_autotuner.py

+            scheme_idx = autotuner.generate()
+
+            # Should return a valid index (>= 0) or -1 if no more unique schemes
+            assert isinstance(scheme_idx, int)


What's the expected scheme_idx for create_simple_conv_model()? Please update this assert accordingly. Thanks.

gcunhase · 2026-01-27T18:57:22Z

Can we add a test file for workflows.py and potentially benchmark.py?

cli.py can be moved to __main__.py if the plan is for it to be available as a standalone feature and/or for debugging purposes. Unittests might also need to be added for that.

gcunhase · 2026-01-27T21:16:45Z

modelopt/onnx/quantization/autotune/cli.py

+    # TensorRT Benchmark
+    trt_group = parser.add_argument_group("TensorRT Benchmark")
+    trt_group.add_argument(
+        "--use_trtexec",


The following CLI fails to perform benchmark / quantize the model (this uses TensorRTPyBenchmark):

$ python -m modelopt.onnx.quantization.autotune --onnx_path=conv_relu.onnx

Error:

[modelopt][onnx] - ERROR - Benchmark instance not initialized [modelopt][onnx] - INFO - Results: 3.73 ms → failed (invalid measurement)

This failure happens because pycuda was not installed. After installing that dependency, no error is thrown but the model is not quantized.

@ajrasane should we create another optional_dep in setup.py with autotune's dependencies?

If --use_trtexec is used, autotune does not fail but also doesn't generate a quantized model.

This is due to Latency being used as a measurement instead of GPU Compute Time.

If it is just pycuda, we can probably just include this in the modelopt onnx dependencies. But if we have more dependencies, then it would be better to create a new section in setup.py with autotune dependencies.

@willg-nv how should we approach the tensorrt / trtexec requirements for autotune? Are we just adding a disclaimer for the user in the README or adding that in setup.py?

gcunhase · 2026-01-27T22:07:45Z

modelopt/onnx/quantization/autotune/benchmark.py

+            # "[I] Latency: min = X ms, max = Y ms, mean = Z ms, median = W ms, ..."
+            output = result.stdout
+
+            # Look for median latency in the main "[I] Latency:" line
+            pattern = r"\[I\]\s+Latency:.*?median\s*=\s*([\d.]+)\s*ms"


For measurements equivalent to TensorRTPyBenchmark in TrtExecBenchmark:

Suggested change

# "[I] Latency: min = X ms, max = Y ms, mean = Z ms, median = W ms, ..."

output = result.stdout

# Look for median latency in the main "[I] Latency:" line

pattern = r"\[I\]\s+Latency:.*?median\s*=\s*([\d.]+)\s*ms"

# "[I] GPU Compute Time: min = X ms, max = Y ms, mean = Z ms, median = W ms, ..."

output = result.stdout

# Look for median GPU Compute Time in the main "[I] GPU Compute Time:" line

pattern = r"\[I\]\s+GPU Compute Time:.*?median\s*=\s*([\d.]+)\s*ms"

gcunhase · 2026-01-28T14:52:40Z

Can we add a test file for workflows.py and potentially benchmark.py?

cli.py can be moved to __main__.py if the plan is for it to be available as a standalone feature and/or for debugging purposes. Unittests might also need to be added for that.

Suggestion for test_workflows.py: test_workflows.py

gcunhase · 2026-01-28T21:24:21Z

modelopt/onnx/quantization/autotune/autotuner.py

+        output_shape: tuple | None,
+        output_dtype: np.dtype,
+        quant_dtype: np.dtype,
+        quant_type: str,


Is this being used somewhere?

willg-nv requested a review from a team as a code owner December 17, 2025 06:56

willg-nv requested a review from vishalpandya1990 December 17, 2025 06:56

This was referenced Dec 17, 2025

Integrate Automated QDQ placement tool - Part 4 #704

Open

Integrate Automated QDQ placement tool - Part 2 #702

Open

Integrate Automated QDQ placement tool - Part 1 #701

Open

vishalpandya1990 requested a review from ajrasane December 22, 2025 06:31

willg-nv force-pushed the dev-willg-integrate-auto-qdq-placement-part3 branch from 3454bba to 4b9d789 Compare December 31, 2025 02:09

gcunhase reviewed Jan 8, 2026

View reviewed changes

modelopt/onnx/quantization/autotune/__init__.py Outdated Show resolved Hide resolved

gcunhase reviewed Jan 8, 2026

View reviewed changes

modelopt/onnx/quantization/autotune/cli.py Outdated Show resolved Hide resolved

gcunhase reviewed Jan 8, 2026

View reviewed changes

modelopt/onnx/quantization/autotune/cli.py Outdated Show resolved Hide resolved

gcunhase reviewed Jan 8, 2026

View reviewed changes

willg-nv force-pushed the dev-willg-integrate-auto-qdq-placement-part3 branch 2 times, most recently from 20ae533 to 99d3c0d Compare January 12, 2026 03:11

willg-nv force-pushed the dev-willg-integrate-auto-qdq-placement-part3 branch 4 times, most recently from 202b3e2 to 8674964 Compare January 21, 2026 02:26

Integrate Automated QDQ placement tool - part 3

b348350

Signed-off-by: Will Guo <willg@nvidia.com>

willg-nv force-pushed the dev-willg-integrate-auto-qdq-placement-part3 branch from 8674964 to b348350 Compare January 23, 2026 07:50

gcunhase reviewed Jan 23, 2026

View reviewed changes

gcunhase reviewed Jan 24, 2026

View reviewed changes

gcunhase reviewed Jan 26, 2026

View reviewed changes

gcunhase reviewed Jan 27, 2026

View reviewed changes

gcunhase reviewed Jan 28, 2026

View reviewed changes

		from modelopt.onnx.quantization.autotune.common import PatternCache, RegionType


		def create_simple_conv_model():

Integrate Automated QDQ placement tool - Part 3 #703

Are you sure you want to change the base?

Integrate Automated QDQ placement tool - Part 3 #703

Uh oh!

Conversation

willg-nv commented Dec 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Usage

Testing

Before your PR is "Ready for review"

Additional Information

Uh oh!

copy-pr-bot bot commented Dec 17, 2025

Uh oh!

willg-nv commented Dec 22, 2025

Uh oh!

vishalpandya1990 commented Dec 22, 2025

Uh oh!

Uh oh!

gcunhase Jan 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gcunhase Jan 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gcunhase commented Jan 27, 2026

Uh oh!

gcunhase Jan 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gcunhase Jan 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gcunhase Jan 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gcunhase commented Jan 28, 2026

Uh oh!

gcunhase Jan 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

willg-nv commented Dec 17, 2025 •

edited

Loading

gcunhase Jan 8, 2026 •

edited

Loading

gcunhase Jan 23, 2026 •

edited

Loading

gcunhase Jan 27, 2026 •

edited

Loading

gcunhase Jan 27, 2026 •

edited

Loading

gcunhase Jan 27, 2026 •

edited

Loading

gcunhase Jan 28, 2026 •

edited

Loading