-
Notifications
You must be signed in to change notification settings - Fork 377
Int8Tensor migration cleanup #3407
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Summary: This PR creates a new Int8Tensor and updates the configs to use the new Int8Tensor flow Test Plan: To ensure BC: ``` pytest test/quantization/test_quant_api.py ``` To test new Int8Tensor: ``` pytest test/quantization/quantize_/workflows/int8/test_int8_tensor.py ``` Reviewers: Subscribers: Tasks: Tags:
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/3407
Note: Links to docs will display an error until the docs builds have been completed. ⏳ No Failures, 1 PendingAs of commit a665d45 with merge base 3c3515a ( This comment was automatically generated by Dr. CI and updates every 15 minutes. |
|
also since we have #3391 but there are some infra issues so some Cis are not running. I'd suggest to recreate this PR exact and land that first, and then we can do additional fixes on top of that to recognize contributions from OSS |
| optional_tensor_attribute_names = [ | ||
| "block_size", | ||
| "act_quant_kwargs", | ||
| "dtype", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: should dtype be optional? or just required?
Oh I guess this is following float8 tensor, and only used for dequantize for now. I think one thing to make sure if just it still works when dtype == None
| # make scale the correct dim | ||
| if isinstance(granularity, PerRow): | ||
| scale = scale.unsqueeze(1) | ||
| elif isinstance(granularity, PerTensor): | ||
| scale = scale.unsqueeze(0).unsqueeze(1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these might be confusing, we could align with float8 path and have a keepdim=True variant for choose_qparams_affine I guess:
ao/torchao/quantization/quant_primitives.py
Lines 1553 to 1554 in 73730e8
| min_val = torch.amin(input, dim=reduction_dims, keepdim=False) | |
| max_val = torch.amax(input, dim=reduction_dims, keepdim=False) |
like choose_qparams_affine_keepdim
torchao/quantization/quant_api.py
Outdated
| layout: Optional[Layout] = PlainLayout() | ||
| act_mapping_type: Optional[MappingType] = MappingType.SYMMETRIC | ||
| weight_only_decode: bool = False | ||
| granularity: Optional[Union[PerRow, PerTensor]] = PerRow() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: typing needs to be updated to Optional[Union[Granularity, List[Granularity, Granularity]]] I think
torchao/quantization/quant_api.py
Outdated
| activation_granularity, weight_granularity = _normalize_granularity( | ||
| config.granularity | ||
| ) | ||
| act_quant_kwargs = QuantizeTensorToInt8Kwargs( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we also do some validations here on what are the combinations that's supported?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should just be, granularity and activation_mapping_type (symmetric vs asymmetric), which is what the v1 config supports. Will add a test for these combos
jerryzh168
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks, looks good overall, see some comments inline
torchao/quantization/quant_api.py
Outdated
| group_size: Optional[int] = None | ||
| granularity: Granularity = PerRow() | ||
| set_inductor_config: bool = True | ||
| version: int = 2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
btw, I think we should break BC in a separate PR
torchao/quantization/quant_api.py
Outdated
| weight_only_decode: bool = False | ||
| granularity: Optional[Union[PerRow, PerTensor]] = PerRow() | ||
| set_inductor_config: bool = True | ||
| version: int = 2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here, I think we should set this to 1 first and break BC separately to reduce the scope of changes
|
also for all the failed CIs, I think it's because you bumped versions, I think we can keep BC for all these older tests by setting version to 1 explicitly: ao/test/dtypes/test_affine_quantized_float.py Line 114 in 73730e8
while migrating the ones that we think that's going to be useful to test_int8_tensor.py |
| block_size=block_size, | ||
| scale=self.scale, | ||
| block_size=self.block_size, | ||
| scale=self.scale.squeeze(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is probably not very robust, maybe we can add a keepdim for both quantize_affine and dequantize_affine as well for now?
in the future we can just refactor these ops to use expanded scale/zero_point: #3324
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Took a look and the code as currently written will be fine for both flattend and non-flattened case, we just need keepdims for quantize_affine
| kwargs = { | ||
| "device": qdata.device, | ||
| "dtype": dtype or scale.dtype, | ||
| "dtype": dtype, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is dtype becoming required? if so, we can move dtype from optional_tensor_attribute_names to tensor_attribute_names and change its order in the arglist
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was just following how Float8Tensor did it:
| def __new__( |
| assert len(old_int8_tensor.qdata.shape) == len(old_int8_tensor.block_size), ( | ||
| "unsupported" | ||
| ) | ||
| new_int8_tensor = old_int8_tensor.__class__( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: we can just use Int8Tensor here to be more explicit I think
| @dataclass | ||
| class QuantizeTensorToInt8Kwargs(QuantizeTensorKwargs): | ||
| """Tensor kwargs for creating int8 tensor (either activation or weight) | ||
| """Tensor kwargs for creating int8 tensor from high precision |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is also just for activation I think, we can include that in the comment as well
| optional_tensor_attribute_names = ["act_quant_kwargs", "block_size", "dtype"] | ||
| tensor_attribute_names = [] | ||
| optional_tensor_attribute_names = [ | ||
| "block_size", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also, should block_size be optional?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think its optional here because its kwarg that defaults to None, I'm a little unclear on what the default block_size should be though. should we default to PerTensor?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
probably just make it required, it will be simpler I think, value of block_size is dependent on input so we can't really specify a per tensor block_size without knowing the shape of input
| output_dtype = activation_tensor.dtype | ||
|
|
||
| if weight_tensor.act_quant_kwargs is not None: | ||
| activation_tensor = Int8Tensor.from_hp( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mapping_type not passed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also should this use the _choose... API?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
from_hp calls _choose_qparams_affine under the hood.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh sorry I meant this:
| def _choose_quant_func_and_quantize_tensor( |
| weight, | ||
| granularity=config.granularity, | ||
| act_quant_kwargs=QuantizeTensorToInt8Kwargs(granularity=config.granularity), | ||
| granularity=weight_granularity, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mapping_type for weight is not passed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the config doesn't have an option for the weight_mapping_type, so we just use the default (symmetric)
torchao/quantization/quant_api.py
Outdated
| # TODO: Revisit for supported granularitys | ||
| # https://github.com/pytorch/ao/pull/3241#discussion_r2551497849 | ||
| granularity: Optional[Granularity] = PerRow() | ||
| granularity: Union[PerRow, PerTensor] = PerRow() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: just granularity: Granularity?
| cls, | ||
| w_hp: torch.Tensor, | ||
| hp_tensor: torch.Tensor, | ||
| granularity: Granularity = PerRow(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: we can make granularity required I think
| return Int8Tensor.from_hp( | ||
| tensor, | ||
| quant_kwargs.granularity, | ||
| quant_kwargs.mapping_type, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: pass the optional arg mapping_type by name is probably better
| mapping_type: whether to use symmetric or asymmetric quant, only symmetric is supported currently | ||
| """ | ||
|
|
||
| granularity: Granularity = PerRow() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: same here, we can just make granularity required I think
a5a6140 to
a665d45
Compare
|
forgot to mention, will this flatten stuff affect performance? |
Didn't run benchmarks, but flatten should just be a view for 2d -> 1d so there shouldn't be a perf hit. |
* Int8Tensor migration Summary: This PR creates a new Int8Tensor and updates the configs to use the new Int8Tensor flow Test Plan: To ensure BC: ``` pytest test/quantization/test_quant_api.py ``` To test new Int8Tensor: ``` pytest test/quantization/quantize_/workflows/int8/test_int8_tensor.py ``` Reviewers: Subscribers: Tasks: Tags: * ruff fixes * add init * fix ruff again * update * wip * undo update tests * fix ruff * fix varname * fix typing * add tests * fix dtype * fix ci * address granularity cr * update _choose_quant_func_and_quantize_tensor * make block size required attribute * made dtype required as well * address nits * skip per tensor weight only test for now
Summary:
This PR cleans up several things in
Int8Tensor:keepdim, previous scales for a 2d weight tensor was a 1d vector, now for a M x N weight is it is a M x 1 vector. This is the same as float8tensor, so we can reuse the same utility functions now.granularityandact_mapping_typecurrently only symmetric activation quantization is supported._choose_quant_func_and_quantize_tensorwith int8 quantize kwargs.Test Plan:
To ensure BC:
To test Int8Tensor:
Reviewers:
Subscribers:
Tasks:
Tags: