Integrate Automated QDQ placement tool - Part 2 #702

willg-nv · 2025-12-17T06:29:51Z

What does this PR do?

Type of change: new feature

Overview: This PR integrate automated Q/DQ placement tool to ModelOpt. This PR is 2/4 parts of the cahnges.

Part 1: #701
Part 2: #702
Part 3: #703
Part 4: #704

This PR contains the following changes:

Implement RegionPattern to represent the topology structure of Regions. InsertionPoints are also defined on RegionPattern. Regions with same pattern are optimized at the same time
Implement RegionSearch class to divide ONNX graph into small regions
RegionSearch python file also provides an entry point to print out the region structures.
Unit tests for new classse.

Usage

python -m modelopt.onnx.quantization.autotune.region_search --model model.onnx --verbose

Example output:

    ├─ Region 212 (Level 0, Type: COMPOSITE)
    │  ├─ Direct nodes: 0
    │  ├─ Total nodes (recursive): 9
    │  ├─ Children: 1
    │  ├─ Inputs: 3 tensors
    │  │    - xxx
    │  │    - xxx
    │  │    - xxx
    │  └─ Outputs: 1 tensors
    │       - xxx
    │
    │  Child regions:
    │
      ├─ Region 209 (Level 2, Type: LEAF) 
      │  ├─ Direct nodes: 9
      │  ├─ Total nodes (recursive): 9
      │  ├─ Children: 0
      │  ├─ Inputs: 11 tensors
      │  │    - xxx

Testing

Implemented unit tests for new classes. All unit tests could get pass locally.

Before your PR is "Ready for review"

Make sure you read and follow Contributor guidelines and your commits are signed.
Is this change backward compatible?: Yes
Did you write any new necessary tests?: Yes
Did you add or update any necessary documentation?: No, document change will be in part 4.
Did you update Changelog?: No. Change log will be included in part 4.

Additional Information

Summary by CodeRabbit

New Features
- Added comprehensive ONNX quantization autotuning framework with pattern-based Quantize/Dequantize insertion optimization for improved TensorRT latency.
- Added region inspection tool to analyze graph structure and quantizable patterns.
- Added pattern caching and matching to identify and reuse structurally similar regions.
Chores
- Added utility functions for graph analysis and symmetric operation detection.
- Added extensive unit test coverage for autotuning components.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

copy-pr-bot · 2025-12-17T06:29:55Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

willg-nv · 2025-12-22T01:47:10Z

Hi @ajrasane , could you help me review this PR, thanks!

modelopt/onnx/quantization/autotune/common.py

modelopt/onnx/quantization/autotune/region_pattern.py

modelopt/onnx/quantization/autotune/__init__.py

modelopt/onnx/quantization/autotune/region_pattern.py

modelopt/onnx/quantization/autotune/region_search.py

tests/unit/onnx/quantization/autotune/test_pattern_cache.py

modelopt/onnx/quantization/qdq_utils.py

modelopt/onnx/quantization/autotune/region_search.py

tests/unit/onnx/quantization/autotune/test_region_search.py

Signed-off-by: Will Guo <willg@nvidia.com>

coderabbitai · 2026-01-26T05:57:22Z

Important

Review skipped

Auto incremental reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

🔍 Trigger a full review

📝 Walkthrough

Walkthrough

This PR introduces a comprehensive Q/DQ autotuning framework for ONNX quantization. It adds hierarchical region discovery, pattern-based region grouping, insertion point scheme generation, caching mechanisms, and supporting utilities for analyzing ONNX graphs and identifying quantized tensors. The framework enables systematic exploration of different Q/DQ placement strategies optimized for TensorRT latency.

Changes

Cohort / File(s)	Summary
ONNX Autotuning Core Framework `modelopt/onnx/quantization/autotune/common.py` `modelopt/onnx/quantization/autotune/region_pattern.py` `modelopt/onnx/quantization/autotune/region_search.py`	Foundational data structures (Region, InsertionScheme, PatternSchemes, PatternCache, Config, exception hierarchy); region pattern signature generation with matching and insertion scheme utilities; hierarchical region discovery via two-phase search (bottom-up partitioning + top-down refinement). Dense logic with graph traversal, signature computation, and pattern matching.
Autotuning Package Interface `modelopt/onnx/quantization/autotune/__init__.py`	Package-level re-exports of 16+ public symbols (error types, region classes, insertion points, pattern utilities, config). Establishes public API surface for the autotuning framework.
Region Analysis & Inspection `modelopt/onnx/quantization/autotune/region_inspect.py`	CLI and programmatic API for inspecting region search results; supports model loading, region discovery, filtering, and detailed logging with coverage statistics.
Quantization Utilities `modelopt/onnx/quantization/graph_utils.py` `modelopt/onnx/quantization/qdq_utils.py`	New utilities: `get_tensor_consumer_node_indices()` for building consumer mappings; `get_quantized_tensors()` for identifying Q/DQ-affected tensors in models.
ONNX Operations `modelopt/onnx/op_types.py`	New `get_symmetric_ops()` function returning set of commutative/symmetric ONNX operations.
Unit Tests `tests/unit/onnx/quantization/autotune/test_pattern_cache.py` `tests/unit/onnx/quantization/autotune/test_region_pattern.py` `tests/unit/onnx/quantization/autotune/test_region_search.py`	Comprehensive test suites for PatternCache (serialization, merging, scheme selection), RegionPattern (matching, tree formatting, symmetry handling), and region search classes (partitioning, composite building, print tree utilities).

Sequence Diagram(s)

sequenceDiagram
    participant Client as Client Code
    participant CRS as CombinedRegionSearch
    participant Partitioner as RegionPartitioner
    participant Builder as TopDownRegionBuilder
    participant Pattern as RegionPattern
    participant Cache as PatternCache
    participant Config as Config

    Client->>Config: create Config(max_seq_size, thresholds)
    Config-->>Client: config instance

    Client->>CRS: create CombinedRegionSearch(graph, config)
    activate CRS
    
    Client->>CRS: search_regions()
    activate Partitioner
    Partitioner->>Partitioner: partition_graph() - Phase 1
    Note over Partitioner: Bottom-up: identify leaf regions<br/>from divergence/convergence
    Partitioner-->>CRS: initial regions
    deactivate Partitioner

    activate Builder
    Builder->>Builder: build_composite_region() - Phase 2
    Note over Builder: Top-down: create hierarchy,<br/>split sequences, merge converged
    Builder-->>CRS: refined region tree
    deactivate Builder
    
    CRS-->>Client: list[Region]
    deactivate CRS

    Client->>Pattern: RegionPattern.from_region(region, graph)
    activate Pattern
    Pattern->>Pattern: compute_structural_signature(graph)
    Pattern-->>Client: RegionPattern
    deactivate Pattern

    Client->>Cache: PatternCache()
    activate Cache
    Cache-->>Client: cache instance
    deactivate Cache

    Client->>Cache: add_pattern_from_region(region, graph)
    Cache->>Pattern: (implicitly uses from_region)
    Note over Cache: Store pattern with<br/>insertion schemes
    Cache-->>Client: void

    Client->>Cache: get_pattern_schemes(pattern_signature)
    Cache-->>Client: PatternSchemes | None

    Note over Client,Cache: Workflow: Search → Pattern → Cache → Config

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~70 minutes

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The PR title clearly identifies the primary change: a new automated QDQ placement tool being integrated. It is part 2 of a multi-part feature, directly aligned with the substantial additions to the codebase (Region pattern matching, region search, insertion points, and comprehensive configuration framework).
Docstring Coverage	✅ Passed	Docstring coverage is 97.40% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 8

🤖 Fix all issues with AI agents

In `@modelopt/onnx/quantization/autotune/__init__.py`:
- Around line 78-95: The __all__ list in modelopt.onnx.quantization.autotune
includes "RegionError" which does not exist; remove "RegionError" from the
__all__ declaration so the public export list only references actual symbols
(e.g., keep entries like "Region", "RegionType", "RegionPattern" but delete
"RegionError"). Verify the __all__ variable (named __all__) is updated
accordingly and no other code expects a RegionError symbol.

In `@modelopt/onnx/quantization/autotune/common.py`:
- Around line 556-575: The current similarity-pruning loop around
self.minimum_distance can leave multiple near-duplicates because it breaks out
after removing only the first similar existing_scheme; update the logic in the
block handling filtered_schemes so that when a new scheme is compared you
continue scanning all existing entries (removing any worse existing schemes
whose distance < self.minimum_distance) before deciding to append or skip the
new scheme. Concretely, in the loop that iterates sorted_schemes, replace the
single-break behavior with logic that either (a) iterates over a copy of
filtered_schemes and removes all existing_scheme where
scheme.distance(existing_scheme) < self.minimum_distance and scheme.latency_ms <
existing_scheme.latency_ms, or (b) after removing an existing_scheme, continue
checking the remaining filtered_schemes (do not break) and set a flag to skip
appending only if any remaining existing_scheme is better; use the existing
symbols filtered_schemes, scheme.distance(...), and latency_ms to implement
this.
- Around line 314-320: The repository is missing the
modelopt.onnx.quantization.autotune.insertion_points module which defines
NodeInputInsertionPoint, ChildRegionInputInsertionPoint, and
RegionOutputInsertionPoint, causing imports in common.py, region_pattern.py,
region_inspect.py and the package __init__.py to fail; implement a new
insertion_points module that declares these three classes (with the expected
attributes and equality/hash behavior used by InsertionScheme.distance and the
rest of the autotune code), export them from the autotune package __init__, and
ensure the classes provide the iterable/identifier properties used in set()
operations so existing code (e.g., InsertionScheme.distance) can compute
symmetric differences without further changes.

In `@modelopt/onnx/quantization/autotune/region_inspect.py`:
- Around line 140-142: The function currently computes a filtered list named
all_regions but returns the original regions variable, leading callers to
receive unfiltered data; modify the function to return all_regions instead of
regions (replace the final "return regions" with "return all_regions") so
callers get the filtered/child-trimmed set, and update any surrounding logging
(e.g., the logger.debug line) or docstring if you intentionally want to keep the
original behavior.
- Around line 91-94: The loop in region_inspect.py iterates
region.get_children() and calls region.remove_child(child), which mutates the
children collection; change the iteration to traverse a stable copy (e.g.,
iterate over list(region.get_children()) or similar) so removals don't affect
the ongoing iteration, leaving the logic using has_quantizable_operations(child,
graph) and region.remove_child(child) unchanged.

In `@modelopt/onnx/quantization/autotune/region_search.py`:
- Around line 451-458: The code removes start_node_idx from visited_nodes but
never adds it back, leaving the divergent start node unassigned; change the
logic to explicitly include start_node_idx into the region (call
_append_node_to_region(start_node_idx) and
self.visited_nodes.add(start_node_idx)) before processing the rest, and replace
visited_nodes.remove(start_node_idx) with visited_nodes.discard(start_node_idx)
to avoid KeyError; keep the existing checks for
_is_node_divergent(converge_node_idx) and the subsequent
_append_node_to_region(converge_node_idx) and
_build_sequence_from_node(converge_node_idx,
max_nodes=MAX_PROBE_STEPS_AFTER_CONVERGE).

In `@modelopt/onnx/quantization/graph_utils.py`:
- Around line 305-319: The function get_tensor_consumer_node_indices mishandles
inputs when graph is an onnx.GraphProto because node.input yields string names,
not tensor objects; update the implementation to handle both cases by extracting
the tensor name with something like name = tensor if isinstance(tensor, str)
else tensor.name (or use getattr(tensor, "name", tensor)), and use that name as
the key when populating tensor_consumer_map; also update the docstring to
mention that the function accepts both onnx.GraphProto and onnx-graphsurgeon
gs.Graph.

In `@modelopt/onnx/quantization/qdq_utils.py`:
- Around line 1041-1078: The docstring for get_quantized_tensors incorrectly
says it identifies QuantizeLinear nodes while the implementation scans
DequantizeLinear; update the docstring text (description, Args/Returns and any
examples) to state that it identifies DequantizeLinear nodes and that the
returned set contains tensor names that are inputs to DequantizeLinear nodes
(i.e., tensors being dequantized), and adjust any example wording that mentions
QuantizeLinear accordingly so docstring matches the actual behavior of
get_quantized_tensors.

🧹 Nitpick comments (8)

tests/unit/onnx/quantization/autotune/test_pattern_cache.py (1)

27-28: Consider removing sys.path manipulation.

The sys.path.insert is typically unnecessary when the package is properly installed (e.g., via pip install -e .). This pattern can cause import issues in different environments and makes tests less portable. If tests run via pytest with proper package installation, this line can be removed.
tests/unit/onnx/quantization/autotune/test_region_pattern.py (2)
29-30: Consider removing sys.path manipulation.

Same as in other test files - this pattern is unnecessary with proper package installation.

310-314: Consider removing or guarding debug print statements.

These print statements produce output during test runs which can clutter test output. Consider either removing them or guarding with a verbosity flag if the visual output is needed for debugging.
Suggested change
-        print("\n" + "=" * 60)
-        print("Region Tree Structure:")
-        print("=" * 60)
-        print(tree_output)
-        print("=" * 60)
+        # Uncomment for debugging:
+        # print("\n" + "=" * 60)
+        # print("Region Tree Structure:")
+        # print("=" * 60)
+        # print(tree_output)
+        # print("=" * 60)
tests/unit/onnx/quantization/autotune/test_region_search.py (2)
28-29: Consider removing sys.path manipulation.

Same pattern as other test files - unnecessary with proper package installation.

413-417: Consider removing debug print statements.

These print statements produce output during test runs. Consider removing or commenting them out to keep test output clean.
Suggested change
         result = output.getvalue()
-        print("\n" + "=" * 60)
-        print("Region Tree Structure:")
-        print("=" * 60)
-        print(result)
-        print("=" * 60)
 
         assert "Region" in result
modelopt/onnx/quantization/autotune/region_pattern.py (3)
204-214: Repeated list(graph.nodes) conversion in recursive function.

list(graph.nodes) is called on every recursive invocation of _compute_signature_recursive. This creates a new list each time, which is inefficient for deep hierarchies. Consider passing the nodes list as a parameter or computing it once at the entry point.
Proposed refactor
+    `@classmethod`
+    def from_region(cls, region: Region, graph: gs.Graph) -> "RegionPattern":
+        """Compute a structural pattern for a region."""
+        nodes_list = list(graph.nodes)
+        signature_str = cls._compute_signature_recursive(region, graph, nodes_list)
+        total_size = len(region.get_region_nodes_and_descendants())
+        return cls(signature_str, total_size)

     `@staticmethod`
-    def _compute_signature_recursive(region: Region, graph: gs.Graph) -> str:
+    def _compute_signature_recursive(
+        region: Region, graph: gs.Graph, nodes_list: list | None = None
+    ) -> str:
         """Recursively compute structural signature for a region."""
-        nodes_list = list(graph.nodes)
+        if nodes_list is None:
+            nodes_list = list(graph.nodes)
         node_indices_set = set(region.get_nodes())
Then update the recursive call on line 225:
         child_sigs = "+".join(
-            [RegionPattern._compute_signature_recursive(child, graph) for child in sorted_children]
+            [RegionPattern._compute_signature_recursive(child, graph, nodes_list) for child in sorted_children]
         )
166-169: Using assert for runtime validation may be stripped in optimized mode.

The assert statement on line 169 validates pattern matching, but asserts are stripped when Python runs with -O flag. For production validation that should always run, consider raising an explicit exception.
Proposed fix
     def get_full_insertion_scheme(self, region: Region, graph: gs.Graph) -> InsertionScheme:
         """Get all possible insertion points for a region in a single InsertionScheme."""
         region_pattern = RegionPattern.from_region(region, graph)
-        assert self == region_pattern, "Region pattern mismatch"
+        if self != region_pattern:
+            raise ValueError(
+                f"Region pattern mismatch: expected {self.signature}, got {region_pattern.signature}"
+            )
224-231: Minor redundancy in signature building.

When there are no node_ops, line 231 rebuilds the joined string with '+'.join(child_sigs) even though child_sigs was already joined on line 224. This is a minor inefficiency.
Proposed fix
-        child_sigs = "+".join(
-            [RegionPattern._compute_signature_recursive(child, graph) for child in sorted_children]
-        )
+        child_sig_list = [
+            RegionPattern._compute_signature_recursive(child, graph) for child in sorted_children
+        ]
+        child_sigs = "+".join(child_sig_list)

         if node_ops:
             node_sig = "->".join(node_ops)
             return f"COMPOSITE({node_sig}|{child_sigs})"
-        return f"COMPOSITE({'+'.join(child_sigs)})"
+        return f"COMPOSITE({child_sigs})"

modelopt/onnx/quantization/autotune/__init__.py

modelopt/onnx/quantization/autotune/common.py

modelopt/onnx/quantization/autotune/region_inspect.py

modelopt/onnx/quantization/autotune/region_search.py

modelopt/onnx/quantization/graph_utils.py

modelopt/onnx/quantization/qdq_utils.py

Signed-off-by: Will Guo <willg@nvidia.com>

modelopt/onnx/quantization/autotune/common.py

Signed-off-by: gcunhase <4861122+gcunhase@users.noreply.github.com>

Signed-off-by: Will Guo <willg@nvidia.com>

modelopt/onnx/quantization/autotune/common.py

modelopt/onnx/quantization/autotune/region_pattern.py

modelopt/onnx/quantization/autotune/common.py

modelopt/onnx/quantization/autotune/region_search.py

tests/unit/onnx/quantization/autotune/test_region_pattern.py

modelopt/onnx/quantization/autotune/region_pattern.py

Signed-off-by: Will Guo <willg@nvidia.com>

willg-nv requested a review from a team as a code owner December 17, 2025 06:29

willg-nv requested a review from ajrasane December 17, 2025 06:29

willg-nv changed the title ~~Dev willg integrate auto qdq placement part2~~ Integrate Automated QDQ placement tool - Part 2 Dec 17, 2025

This was referenced Dec 17, 2025

Integrate Automated QDQ placement tool - Part 3 #703

Open

Integrate Automated QDQ placement tool - Part 4 #704

Open

Integrate Automated QDQ placement tool - Part 1 #701

Open

willg-nv force-pushed the dev-willg-integrate-auto-qdq-placement-part2 branch from 3f7ff31 to d3a6765 Compare December 31, 2025 02:16

ajrasane reviewed Jan 8, 2026

View reviewed changes

willg-nv force-pushed the dev-willg-integrate-auto-qdq-placement-part2 branch 2 times, most recently from 616285d to c95939a Compare January 8, 2026 08:35

gcunhase reviewed Jan 8, 2026

View reviewed changes

modelopt/onnx/quantization/autotune/__init__.py Outdated Show resolved Hide resolved

gcunhase reviewed Jan 9, 2026

View reviewed changes

modelopt/onnx/quantization/autotune/region_pattern.py Outdated Show resolved Hide resolved

gcunhase reviewed Jan 9, 2026

View reviewed changes

modelopt/onnx/quantization/autotune/region_pattern.py Outdated Show resolved Hide resolved

gcunhase reviewed Jan 9, 2026

View reviewed changes

modelopt/onnx/quantization/autotune/region_search.py Outdated Show resolved Hide resolved

gcunhase reviewed Jan 9, 2026

View reviewed changes

modelopt/onnx/quantization/autotune/region_search.py Outdated Show resolved Hide resolved

gcunhase reviewed Jan 9, 2026

View reviewed changes

tests/unit/onnx/quantization/autotune/test_pattern_cache.py Outdated Show resolved Hide resolved

gcunhase reviewed Jan 9, 2026

View reviewed changes

modelopt/onnx/quantization/qdq_utils.py Outdated Show resolved Hide resolved

willg-nv force-pushed the dev-willg-integrate-auto-qdq-placement-part2 branch 3 times, most recently from 4468ca2 to bc87ca7 Compare January 9, 2026 05:02

ajrasane reviewed Jan 13, 2026

View reviewed changes

willg-nv force-pushed the dev-willg-integrate-auto-qdq-placement-part2 branch 2 times, most recently from 6f809d7 to a933107 Compare January 15, 2026 09:27

ajrasane reviewed Jan 15, 2026

View reviewed changes

willg-nv force-pushed the dev-willg-integrate-auto-qdq-placement-part2 branch 4 times, most recently from 8c8e685 to dc3ef86 Compare January 16, 2026 07:32

willg-nv force-pushed the dev-willg-integrate-auto-qdq-placement-part2 branch 5 times, most recently from 3de8e6f to 4d11c3d Compare January 23, 2026 03:35

gcunhase reviewed Jan 24, 2026

View reviewed changes

tests/unit/onnx/quantization/autotune/test_region_search.py Outdated Show resolved Hide resolved

Integrate Automated QDQ placement tool - part 2

345a3dc

Signed-off-by: Will Guo <willg@nvidia.com>

willg-nv force-pushed the dev-willg-integrate-auto-qdq-placement-part2 branch from 4d11c3d to 29f08db Compare January 26, 2026 05:57

coderabbitai bot reviewed Jan 26, 2026

View reviewed changes

willg-nv force-pushed the dev-willg-integrate-auto-qdq-placement-part2 branch 2 times, most recently from 4e0f4e1 to af26a85 Compare January 26, 2026 06:49

Part-2 recent refactor changes

559d12c

Signed-off-by: Will Guo <willg@nvidia.com>

willg-nv force-pushed the dev-willg-integrate-auto-qdq-placement-part2 branch from af26a85 to 559d12c Compare January 26, 2026 07:00

gcunhase reviewed Jan 26, 2026

View reviewed changes

modelopt/onnx/quantization/autotune/common.py Show resolved Hide resolved

gcunhase added a commit to gcunhase/TensorRT-Model-Optimizer that referenced this pull request Jan 26, 2026

Refactor: PR NVIDIA#702

c0f9ddd

Signed-off-by: gcunhase <4861122+gcunhase@users.noreply.github.com>

resolve comment

dd41ca2

Signed-off-by: Will Guo <willg@nvidia.com>