You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Shape Tensor handling in conversion and design for dynamic converters
TL;DR
We recently added support for aten::size to output shape tensors (nvinfer1::ITensor) which can now pass shape information to the conversion stack. Shape Tensors are the method to encode dynamic shape information in TensorRT so this is necessary to add true support for dynamic shape. However, this change introduces violations of the type system which expose the converter and evaluator libraries to intermittently failing in unclear ways.
Background
We recently added support for aten::size to output shape tensors (nvinfer1::ITensor) which can now pass shape information to the conversion stack. The issue is that aten::size schema is aten::size(Tensor t) -> int[] so there is a type confusion introduced. Fundamentally, it is assumed by converters that arguments of a particular type will be actually be that type. Exceptions are Tensor which can be either a torch::Tensor or an nvinfer1::ITensor and Tensor[] which are lists formed of either variant of tensor. The issue with types such as int, int[], float etc. is they are used as compile time static data for the most part consumed in either evaluators to calculate other compile time static data or converters for layer settings. For example you might be multiplying 2 ints together to get the window size of a max pooling layer at compile time. It therefore does not seem feasible to just accept this type conversion and introduce additional support in converters to handle cases where an int or int[] is actually an ITensor as any argument to any tensor that is an int or int[] may in fact not contain static data.
Proposed Solution
Instead we could respect the types that schemas say they accept to keep the contract between converters and the system (https://pytorch.org/TensorRT/contributors/writing_converters.html). The proposed method to do this is introduce a new IR to handle operations in a dynamic shape mode.
The IR will contain placeholder operations with schemas that respect both the expectations of TensorRT and TorchScript. For example: The dynamic version of aten::size has a corresponding trt::size_dyn. The difference between these two operations is that aten::size's schema is aten::size(Tensor t) -> int[] and trt::size_dyn(Tensor t) -> (Tensor) (Note the different output type).
Obviously, consumers of aten::size are expecting an int[] and not Tensor so incrementally we will add dyn alternatives to these ops as they are found. So for example, aten::unflatten.int(Tensor self, int dim, int[] sizes) -> (Tensor), would have a dyn variant trt::unflatten_dyn.int(Tensor, self, int dim, Tensor sizes) -> (Tensor). (#1808)
Now the converter for trt::unflatten_dyn expects a Tensor for the sizes, and the implementation of the converter can operate using this assumption. Converters don't need to handle both static and dynamic cases.
TorchScript enforces types in edges (values) between nodes. This is actually a feature we can leverage. We can have a lowering pass which runs conditioned on if inputs are dynamic (data available at lowering time already) that replaces static variants with dyn variants in place and error out if the graph cannot be reconstructed using dyn ops available. This will in fact give users a clearer, earlier report of what operations need to have dynamic support added vs. opaque unwrapping errors that pop up when the compiler goes to unwrap an int[] from a Var containing an ITensor in a converter.
There is already an effort add dynamic shape support and DDS support to key converters by amending them. Instead we can just make additional converters and this proposal would add infrastructure to do this in a clear maintainable fashion, allowing converters to remain simple.
Alternative approaches
We can take out aten::size dynamic support as we are moving to dynamo and this makes TorchScript more maintainable as more resources transfer over.
We can start patching converters and evaluators to handle shape tensors where these issues pop up today. The problem is that we implicitly throw out the type system and we also will not be able to tell if there is certain compile time required data that has been subsumed into shape tensors. This approach makes converters more complicated and will likely introduce more bugs and model failures with less clear messaging as to why the system is failing.
We can push the TorchScript frontend to do as much as possible in TensorRT at runtime, i.e. auto freezing data as soon as its available. This would still require evaluators to be rewritten to essentially be new converters. It will also massively bloat the size of produced engines to do intermediate operations which we can currently do at compile time and likely cost performance.
Example Workflow
For any operation that needs dynamic shape support.
Create a variant schema trt::[op]_dyn(...) -> (...) where the type of the dynamic shape information is Tensor
Register this op with TorchScript in a manner similar to trt::const and with a translation map which defines mappings between static aten ops and dynamic trt::[op]_dyn ops. (
Have a pass which is run when dynamic shape support is requested to replace all instances of static ops in the translation map with their dynamic counter parts. In the case that the graph is not complete when these replacements are performed because there is some input expecting an int[] or int being provided a Tensor, provide an error stating what operations need dynamic alternatives.
Each trt::[op]_dyn converter should have a separate implementation from their static counterpart. This converter is written with the assumption that shape information is provided via an ITensor. If necessary we can amend this assumption that dyn converters actually need to handle both ITensor and int[] cases but in this case (static->dynamic) it is much easier to handle than the current situation (dynamic->static). The converter would simply freeze the int[] into a shape tensor like ITensorOrFreeze does. In fact we can add a convince function to handle this ShapeTensorOrFreeze which would freeze torch::Tensors or ints/int[] and return a shape tensor.
We can take out aten::size dynamic support as we are moving to dynamo and this makes TorchScript more maintainable as more resources transfer over.
I would have to maintain a separate fork in this case.
One concern about replacing ops with their _dyn variants is that if these end up in fallback regions (min_block size, IO legalization, etc.). They may cause issues with downstream consumers of the output TorchScript. A cleanup pass to revert these to their original forms might be needed.
Have a pass which is run when dynamic shape support is requested to replace all instances of static ops in the translation map with their dynamic counter parts.
Would we replace all ops generically? Do we have a path to selectively use the evaluator/converter path based on whether the accessed dim/shape_tensor is actually dynamic?
One concern about replacing ops with their _dyn variants is that if these end up in fallback regions (min_block size, IO legalization, etc.). They may cause issues with downstream consumers of the output TorchScript. A cleanup pass to revert these to their original forms might be needed.
This is true. Alternatively we can run the dyn replacement pass post partitioning only on TRT blocks
Would we replace all ops generically? Do we have a path to selectively use the evaluator/converter path based on whether the accessed dim/shape_tensor is actually dynamic?
My thought right now is we maintain some list of operations which produce shape tensors (e.g. aten::size) and use the uses of those ops outputs to identify operations which might need dyn replacements.
Would we be able to differentiate these two getitem's (one of which can be resolved to a constant, the other should be handled dynamically)? %size = (-1, 2) #batch dim is dynamic %dim0 = aten::__getitem__(%size, 0) %dim1 = aten::__getitem__(%size, 1)
if %size is a shape tensor the both cases would use the dyn version of getitem using the rule that all uses of shape tensors should be dyn variants. AFAIK once the data is in a ITensor, getting it out isnt straight forward.
ONNX Parser solves this by wrapping the TRT tensors in a class that maintains a copy itself. Not sure though without being part way through compilation how you'd tell if a specific dimension is static or dynamic since many of these shapes are constructed on the fly. Also not sure if this is exactly in scope since the primary goal here is to try to keep the torch types and the types we generate aligned so the job of converter writing stays simple. There could be a feature which takes in a "wrapped" shape tensor and figures out if the operation produces static or dynamic data as part of the conversion phase, have not put that much thought into this though.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Shape Tensor handling in conversion and design for dynamic converters
TL;DR
We recently added support for
aten::sizeto output shape tensors (nvinfer1::ITensor) which can now pass shape information to the conversion stack. Shape Tensors are the method to encode dynamic shape information in TensorRT so this is necessary to add true support for dynamic shape. However, this change introduces violations of the type system which expose the converter and evaluator libraries to intermittently failing in unclear ways.Background
We recently added support for
aten::sizeto output shape tensors (nvinfer1::ITensor) which can now pass shape information to the conversion stack. The issue is thataten::sizeschema isaten::size(Tensor t) -> int[]so there is a type confusion introduced. Fundamentally, it is assumed by converters that arguments of a particular type will be actually be that type. Exceptions areTensorwhich can be either atorch::Tensoror annvinfer1::ITensorandTensor[]which are lists formed of either variant of tensor. The issue with types such asint,int[],floatetc. is they are used as compile time static data for the most part consumed in either evaluators to calculate other compile time static data or converters for layer settings. For example you might be multiplying 2 ints together to get the window size of a max pooling layer at compile time. It therefore does not seem feasible to just accept this type conversion and introduce additional support in converters to handle cases where anintorint[]is actually anITensoras any argument to any tensor that is anintorint[]may in fact not contain static data.Proposed Solution
Instead we could respect the types that schemas say they accept to keep the contract between converters and the system (https://pytorch.org/TensorRT/contributors/writing_converters.html). The proposed method to do this is introduce a new IR to handle operations in a dynamic shape mode.
The IR will contain placeholder operations with schemas that respect both the expectations of TensorRT and TorchScript. For example: The dynamic version of
aten::sizehas a correspondingtrt::size_dyn. The difference between these two operations is thataten::size's schema isaten::size(Tensor t) -> int[]andtrt::size_dyn(Tensor t) -> (Tensor)(Note the different output type).Obviously, consumers of
aten::sizeare expecting anint[]and notTensorso incrementally we will adddynalternatives to these ops as they are found. So for example,aten::unflatten.int(Tensor self, int dim, int[] sizes) -> (Tensor), would have adynvarianttrt::unflatten_dyn.int(Tensor, self, int dim, Tensor sizes) -> (Tensor). (#1808)Now the converter for
trt::unflatten_dynexpects a Tensor for the sizes, and the implementation of the converter can operate using this assumption. Converters don't need to handle both static and dynamic cases.TorchScript enforces types in edges (values) between nodes. This is actually a feature we can leverage. We can have a lowering pass which runs conditioned on if inputs are dynamic (data available at lowering time already) that replaces static variants with dyn variants in place and error out if the graph cannot be reconstructed using
dynops available. This will in fact give users a clearer, earlier report of what operations need to have dynamic support added vs. opaque unwrapping errors that pop up when the compiler goes to unwrap anint[]from aVarcontaining anITensorin a converter.There is already an effort add dynamic shape support and DDS support to key converters by amending them. Instead we can just make additional converters and this proposal would add infrastructure to do this in a clear maintainable fashion, allowing converters to remain simple.
Alternative approaches
We can take out
aten::sizedynamic support as we are moving to dynamo and this makes TorchScript more maintainable as more resources transfer over.We can start patching converters and evaluators to handle shape tensors where these issues pop up today. The problem is that we implicitly throw out the type system and we also will not be able to tell if there is certain compile time required data that has been subsumed into shape tensors. This approach makes converters more complicated and will likely introduce more bugs and model failures with less clear messaging as to why the system is failing.
We can push the TorchScript frontend to do as much as possible in TensorRT at runtime, i.e. auto freezing data as soon as its available. This would still require evaluators to be rewritten to essentially be new converters. It will also massively bloat the size of produced engines to do intermediate operations which we can currently do at compile time and likely cost performance.
Example Workflow
For any operation that needs dynamic shape support.
trt::[op]_dyn(...) -> (...)where the type of the dynamic shape information isTensortrt::constand with a translation map which defines mappings between staticatenops and dynamictrt::[op]_dynops. (TensorRT/core/lowering/register_trt_placeholder_ops.cpp
Line 14 in 1d78f43
int[]orintbeing provided aTensor, provide an error stating what operations need dynamic alternatives.trt::[op]_dynconverter should have a separate implementation from their static counterpart. This converter is written with the assumption that shape information is provided via an ITensor. If necessary we can amend this assumption thatdynconverters actually need to handle both ITensor andint[]cases but in this case (static->dynamic) it is much easier to handle than the current situation (dynamic->static). The converter would simply freeze theint[]into a shape tensor likeITensorOrFreezedoes. In fact we can add a convince function to handle thisShapeTensorOrFreezewhich would freezetorch::Tensors orints/int[]and return a shape tensor.Implementation Phases
Prototype - Large
MVP - Large
Beta Was this translation helpful? Give feedback.
All reactions