-
Notifications
You must be signed in to change notification settings - Fork 247
Integrate Automated QDQ placement tool - Part 2 #702
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Integrate Automated QDQ placement tool - Part 2 #702
Conversation
|
Hi @ajrasane , could you help me review this PR, thanks! |
3f7ff31 to
d3a6765
Compare
616285d to
c95939a
Compare
4468ca2 to
bc87ca7
Compare
6f809d7 to
a933107
Compare
8c8e685 to
dc3ef86
Compare
3de8e6f to
4d11c3d
Compare
Signed-off-by: Will Guo <willg@nvidia.com>
4d11c3d to
29f08db
Compare
|
Important Review skippedAuto incremental reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the
📝 WalkthroughWalkthroughThis PR introduces a comprehensive Q/DQ autotuning framework for ONNX quantization. It adds hierarchical region discovery, pattern-based region grouping, insertion point scheme generation, caching mechanisms, and supporting utilities for analyzing ONNX graphs and identifying quantized tensors. The framework enables systematic exploration of different Q/DQ placement strategies optimized for TensorRT latency. Changes
Sequence Diagram(s)sequenceDiagram
participant Client as Client Code
participant CRS as CombinedRegionSearch
participant Partitioner as RegionPartitioner
participant Builder as TopDownRegionBuilder
participant Pattern as RegionPattern
participant Cache as PatternCache
participant Config as Config
Client->>Config: create Config(max_seq_size, thresholds)
Config-->>Client: config instance
Client->>CRS: create CombinedRegionSearch(graph, config)
activate CRS
Client->>CRS: search_regions()
activate Partitioner
Partitioner->>Partitioner: partition_graph() - Phase 1
Note over Partitioner: Bottom-up: identify leaf regions<br/>from divergence/convergence
Partitioner-->>CRS: initial regions
deactivate Partitioner
activate Builder
Builder->>Builder: build_composite_region() - Phase 2
Note over Builder: Top-down: create hierarchy,<br/>split sequences, merge converged
Builder-->>CRS: refined region tree
deactivate Builder
CRS-->>Client: list[Region]
deactivate CRS
Client->>Pattern: RegionPattern.from_region(region, graph)
activate Pattern
Pattern->>Pattern: compute_structural_signature(graph)
Pattern-->>Client: RegionPattern
deactivate Pattern
Client->>Cache: PatternCache()
activate Cache
Cache-->>Client: cache instance
deactivate Cache
Client->>Cache: add_pattern_from_region(region, graph)
Cache->>Pattern: (implicitly uses from_region)
Note over Cache: Store pattern with<br/>insertion schemes
Cache-->>Client: void
Client->>Cache: get_pattern_schemes(pattern_signature)
Cache-->>Client: PatternSchemes | None
Note over Client,Cache: Workflow: Search → Pattern → Cache → Config
Estimated code review effort🎯 4 (Complex) | ⏱️ ~70 minutes 🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 8
🤖 Fix all issues with AI agents
In `@modelopt/onnx/quantization/autotune/__init__.py`:
- Around line 78-95: The __all__ list in modelopt.onnx.quantization.autotune
includes "RegionError" which does not exist; remove "RegionError" from the
__all__ declaration so the public export list only references actual symbols
(e.g., keep entries like "Region", "RegionType", "RegionPattern" but delete
"RegionError"). Verify the __all__ variable (named __all__) is updated
accordingly and no other code expects a RegionError symbol.
In `@modelopt/onnx/quantization/autotune/common.py`:
- Around line 556-575: The current similarity-pruning loop around
self.minimum_distance can leave multiple near-duplicates because it breaks out
after removing only the first similar existing_scheme; update the logic in the
block handling filtered_schemes so that when a new scheme is compared you
continue scanning all existing entries (removing any worse existing schemes
whose distance < self.minimum_distance) before deciding to append or skip the
new scheme. Concretely, in the loop that iterates sorted_schemes, replace the
single-break behavior with logic that either (a) iterates over a copy of
filtered_schemes and removes all existing_scheme where
scheme.distance(existing_scheme) < self.minimum_distance and scheme.latency_ms <
existing_scheme.latency_ms, or (b) after removing an existing_scheme, continue
checking the remaining filtered_schemes (do not break) and set a flag to skip
appending only if any remaining existing_scheme is better; use the existing
symbols filtered_schemes, scheme.distance(...), and latency_ms to implement
this.
- Around line 314-320: The repository is missing the
modelopt.onnx.quantization.autotune.insertion_points module which defines
NodeInputInsertionPoint, ChildRegionInputInsertionPoint, and
RegionOutputInsertionPoint, causing imports in common.py, region_pattern.py,
region_inspect.py and the package __init__.py to fail; implement a new
insertion_points module that declares these three classes (with the expected
attributes and equality/hash behavior used by InsertionScheme.distance and the
rest of the autotune code), export them from the autotune package __init__, and
ensure the classes provide the iterable/identifier properties used in set()
operations so existing code (e.g., InsertionScheme.distance) can compute
symmetric differences without further changes.
In `@modelopt/onnx/quantization/autotune/region_inspect.py`:
- Around line 140-142: The function currently computes a filtered list named
all_regions but returns the original regions variable, leading callers to
receive unfiltered data; modify the function to return all_regions instead of
regions (replace the final "return regions" with "return all_regions") so
callers get the filtered/child-trimmed set, and update any surrounding logging
(e.g., the logger.debug line) or docstring if you intentionally want to keep the
original behavior.
- Around line 91-94: The loop in region_inspect.py iterates
region.get_children() and calls region.remove_child(child), which mutates the
children collection; change the iteration to traverse a stable copy (e.g.,
iterate over list(region.get_children()) or similar) so removals don't affect
the ongoing iteration, leaving the logic using has_quantizable_operations(child,
graph) and region.remove_child(child) unchanged.
In `@modelopt/onnx/quantization/autotune/region_search.py`:
- Around line 451-458: The code removes start_node_idx from visited_nodes but
never adds it back, leaving the divergent start node unassigned; change the
logic to explicitly include start_node_idx into the region (call
_append_node_to_region(start_node_idx) and
self.visited_nodes.add(start_node_idx)) before processing the rest, and replace
visited_nodes.remove(start_node_idx) with visited_nodes.discard(start_node_idx)
to avoid KeyError; keep the existing checks for
_is_node_divergent(converge_node_idx) and the subsequent
_append_node_to_region(converge_node_idx) and
_build_sequence_from_node(converge_node_idx,
max_nodes=MAX_PROBE_STEPS_AFTER_CONVERGE).
In `@modelopt/onnx/quantization/graph_utils.py`:
- Around line 305-319: The function get_tensor_consumer_node_indices mishandles
inputs when graph is an onnx.GraphProto because node.input yields string names,
not tensor objects; update the implementation to handle both cases by extracting
the tensor name with something like name = tensor if isinstance(tensor, str)
else tensor.name (or use getattr(tensor, "name", tensor)), and use that name as
the key when populating tensor_consumer_map; also update the docstring to
mention that the function accepts both onnx.GraphProto and onnx-graphsurgeon
gs.Graph.
In `@modelopt/onnx/quantization/qdq_utils.py`:
- Around line 1041-1078: The docstring for get_quantized_tensors incorrectly
says it identifies QuantizeLinear nodes while the implementation scans
DequantizeLinear; update the docstring text (description, Args/Returns and any
examples) to state that it identifies DequantizeLinear nodes and that the
returned set contains tensor names that are inputs to DequantizeLinear nodes
(i.e., tensors being dequantized), and adjust any example wording that mentions
QuantizeLinear accordingly so docstring matches the actual behavior of
get_quantized_tensors.
🧹 Nitpick comments (8)
tests/unit/onnx/quantization/autotune/test_pattern_cache.py (1)
27-28: Consider removingsys.pathmanipulation.The
sys.path.insertis typically unnecessary when the package is properly installed (e.g., viapip install -e .). This pattern can cause import issues in different environments and makes tests less portable. If tests run via pytest with proper package installation, this line can be removed.tests/unit/onnx/quantization/autotune/test_region_pattern.py (2)
29-30: Consider removingsys.pathmanipulation.Same as in other test files - this pattern is unnecessary with proper package installation.
310-314: Consider removing or guarding debug print statements.These print statements produce output during test runs which can clutter test output. Consider either removing them or guarding with a verbosity flag if the visual output is needed for debugging.
Suggested change
- print("\n" + "=" * 60) - print("Region Tree Structure:") - print("=" * 60) - print(tree_output) - print("=" * 60) + # Uncomment for debugging: + # print("\n" + "=" * 60) + # print("Region Tree Structure:") + # print("=" * 60) + # print(tree_output) + # print("=" * 60)tests/unit/onnx/quantization/autotune/test_region_search.py (2)
28-29: Consider removingsys.pathmanipulation.Same pattern as other test files - unnecessary with proper package installation.
413-417: Consider removing debug print statements.These print statements produce output during test runs. Consider removing or commenting them out to keep test output clean.
Suggested change
result = output.getvalue() - print("\n" + "=" * 60) - print("Region Tree Structure:") - print("=" * 60) - print(result) - print("=" * 60) assert "Region" in resultmodelopt/onnx/quantization/autotune/region_pattern.py (3)
204-214: Repeatedlist(graph.nodes)conversion in recursive function.
list(graph.nodes)is called on every recursive invocation of_compute_signature_recursive. This creates a new list each time, which is inefficient for deep hierarchies. Consider passing the nodes list as a parameter or computing it once at the entry point.Proposed refactor
+ `@classmethod` + def from_region(cls, region: Region, graph: gs.Graph) -> "RegionPattern": + """Compute a structural pattern for a region.""" + nodes_list = list(graph.nodes) + signature_str = cls._compute_signature_recursive(region, graph, nodes_list) + total_size = len(region.get_region_nodes_and_descendants()) + return cls(signature_str, total_size) `@staticmethod` - def _compute_signature_recursive(region: Region, graph: gs.Graph) -> str: + def _compute_signature_recursive( + region: Region, graph: gs.Graph, nodes_list: list | None = None + ) -> str: """Recursively compute structural signature for a region.""" - nodes_list = list(graph.nodes) + if nodes_list is None: + nodes_list = list(graph.nodes) node_indices_set = set(region.get_nodes())Then update the recursive call on line 225:
child_sigs = "+".join( - [RegionPattern._compute_signature_recursive(child, graph) for child in sorted_children] + [RegionPattern._compute_signature_recursive(child, graph, nodes_list) for child in sorted_children] )
166-169: Usingassertfor runtime validation may be stripped in optimized mode.The
assertstatement on line 169 validates pattern matching, but asserts are stripped when Python runs with-Oflag. For production validation that should always run, consider raising an explicit exception.Proposed fix
def get_full_insertion_scheme(self, region: Region, graph: gs.Graph) -> InsertionScheme: """Get all possible insertion points for a region in a single InsertionScheme.""" region_pattern = RegionPattern.from_region(region, graph) - assert self == region_pattern, "Region pattern mismatch" + if self != region_pattern: + raise ValueError( + f"Region pattern mismatch: expected {self.signature}, got {region_pattern.signature}" + )
224-231: Minor redundancy in signature building.When there are no
node_ops, line 231 rebuilds the joined string with'+'.join(child_sigs)even thoughchild_sigswas already joined on line 224. This is a minor inefficiency.Proposed fix
- child_sigs = "+".join( - [RegionPattern._compute_signature_recursive(child, graph) for child in sorted_children] - ) + child_sig_list = [ + RegionPattern._compute_signature_recursive(child, graph) for child in sorted_children + ] + child_sigs = "+".join(child_sig_list) if node_ops: node_sig = "->".join(node_ops) return f"COMPOSITE({node_sig}|{child_sigs})" - return f"COMPOSITE({'+'.join(child_sigs)})" + return f"COMPOSITE({child_sigs})"
4e0f4e1 to
af26a85
Compare
Signed-off-by: Will Guo <willg@nvidia.com>
af26a85 to
559d12c
Compare
Signed-off-by: gcunhase <4861122+gcunhase@users.noreply.github.com>
Signed-off-by: Will Guo <willg@nvidia.com>
Signed-off-by: Will Guo <willg@nvidia.com>
e7ad869 to
f630313
Compare
Signed-off-by: Will Guo <willg@nvidia.com>
f630313 to
4afccc4
Compare
What does this PR do?
Type of change: new feature
Overview: This PR integrate automated Q/DQ placement tool to ModelOpt. This PR is 2/4 parts of the cahnges.
Part 1: #701
Part 2: #702
Part 3: #703
Part 4: #704
This PR contains the following changes:
Usage
Example output:
Testing
Implemented unit tests for new classes. All unit tests could get pass locally.
Before your PR is "Ready for review"
Additional Information
Summary by CodeRabbit
New Features
Chores
✏️ Tip: You can customize this high-level summary in your review settings.