Skip to content

Conversation

@jcaip
Copy link
Contributor

@jcaip jcaip commented Dec 1, 2025

Summary:

This PR cleans up several things in Int8Tensor:

  • adds support for act_mapping_type (SYMMETRIC vs ASYMMETRIC) for parity with v1 config
  • changes scales logic to be keepdim, previous scales for a 2d weight tensor was a 1d vector, now for a M x N weight is it is a M x 1 vector. This is the same as float8tensor, so we can reuse the same utility functions now.
  • adds test for different combinations of granularity and act_mapping_type currently only symmetric activation quantization is supported.
  • updated _choose_quant_func_and_quantize_tensor with int8 quantize kwargs.

Test Plan:

To ensure BC:

pytest test/quantization/test_quant_api.py

To test Int8Tensor:

pytest test/quantization/quantize_/workflows/int8/test_int8_tensor.py

Reviewers:

Subscribers:

Tasks:

Tags:

Summary:

This PR creates a new Int8Tensor and updates the configs to use the new
Int8Tensor flow

Test Plan:

To ensure BC:
```
pytest test/quantization/test_quant_api.py
```

To test new Int8Tensor:
```
pytest test/quantization/quantize_/workflows/int8/test_int8_tensor.py
```

Reviewers:

Subscribers:

Tasks:

Tags:
@pytorch-bot
Copy link

pytorch-bot bot commented Dec 1, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/3407

Note: Links to docs will display an error until the docs builds have been completed.

⏳ No Failures, 1 Pending

As of commit a665d45 with merge base 3c3515a (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Dec 1, 2025
@jerryzh168
Copy link
Contributor

also since we have #3391 but there are some infra issues so some Cis are not running. I'd suggest to recreate this PR exact and land that first, and then we can do additional fixes on top of that to recognize contributions from OSS

@jcaip jcaip added the topic: improvement Use this tag if this PR is an improvement (doesn't fit into any of the other categories) label Dec 1, 2025
optional_tensor_attribute_names = [
"block_size",
"act_quant_kwargs",
"dtype",
Copy link
Contributor

@jerryzh168 jerryzh168 Dec 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: should dtype be optional? or just required?

Oh I guess this is following float8 tensor, and only used for dequantize for now. I think one thing to make sure if just it still works when dtype == None

Comment on lines 136 to 140
# make scale the correct dim
if isinstance(granularity, PerRow):
scale = scale.unsqueeze(1)
elif isinstance(granularity, PerTensor):
scale = scale.unsqueeze(0).unsqueeze(1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these might be confusing, we could align with float8 path and have a keepdim=True variant for choose_qparams_affine I guess:

min_val = torch.amin(input, dim=reduction_dims, keepdim=False)
max_val = torch.amax(input, dim=reduction_dims, keepdim=False)

like choose_qparams_affine_keepdim

layout: Optional[Layout] = PlainLayout()
act_mapping_type: Optional[MappingType] = MappingType.SYMMETRIC
weight_only_decode: bool = False
granularity: Optional[Union[PerRow, PerTensor]] = PerRow()
Copy link
Contributor

@jerryzh168 jerryzh168 Dec 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: typing needs to be updated to Optional[Union[Granularity, List[Granularity, Granularity]]] I think

activation_granularity, weight_granularity = _normalize_granularity(
config.granularity
)
act_quant_kwargs = QuantizeTensorToInt8Kwargs(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we also do some validations here on what are the combinations that's supported?

Copy link
Contributor Author

@jcaip jcaip Dec 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should just be, granularity and activation_mapping_type (symmetric vs asymmetric), which is what the v1 config supports. Will add a test for these combos

Copy link
Contributor

@jerryzh168 jerryzh168 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks, looks good overall, see some comments inline

group_size: Optional[int] = None
granularity: Granularity = PerRow()
set_inductor_config: bool = True
version: int = 2
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

btw, I think we should break BC in a separate PR

weight_only_decode: bool = False
granularity: Optional[Union[PerRow, PerTensor]] = PerRow()
set_inductor_config: bool = True
version: int = 2
Copy link
Contributor

@jerryzh168 jerryzh168 Dec 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here, I think we should set this to 1 first and break BC separately to reduce the scope of changes

@jerryzh168
Copy link
Contributor

also for all the failed CIs, I think it's because you bumped versions, I think we can keep BC for all these older tests by setting version to 1 explicitly:

while migrating the ones that we think that's going to be useful to test_int8_tensor.py

@jerryzh168 jerryzh168 self-requested a review December 2, 2025 04:49
@jcaip jcaip changed the title Int8Tensor migration Int8Tensor migration cleaniup Dec 3, 2025
@jcaip jcaip changed the title Int8Tensor migration cleaniup Int8Tensor migration cleanup Dec 3, 2025
block_size=block_size,
scale=self.scale,
block_size=self.block_size,
scale=self.scale.squeeze(),
Copy link
Contributor

@jerryzh168 jerryzh168 Dec 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is probably not very robust, maybe we can add a keepdim for both quantize_affine and dequantize_affine as well for now?

in the future we can just refactor these ops to use expanded scale/zero_point: #3324

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Took a look and the code as currently written will be fine for both flattend and non-flattened case, we just need keepdims for quantize_affine

kwargs = {
"device": qdata.device,
"dtype": dtype or scale.dtype,
"dtype": dtype,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is dtype becoming required? if so, we can move dtype from optional_tensor_attribute_names to tensor_attribute_names and change its order in the arglist

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was just following how Float8Tensor did it:

assert len(old_int8_tensor.qdata.shape) == len(old_int8_tensor.block_size), (
"unsupported"
)
new_int8_tensor = old_int8_tensor.__class__(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: we can just use Int8Tensor here to be more explicit I think

@dataclass
class QuantizeTensorToInt8Kwargs(QuantizeTensorKwargs):
"""Tensor kwargs for creating int8 tensor (either activation or weight)
"""Tensor kwargs for creating int8 tensor from high precision
Copy link
Contributor

@jerryzh168 jerryzh168 Dec 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is also just for activation I think, we can include that in the comment as well

optional_tensor_attribute_names = ["act_quant_kwargs", "block_size", "dtype"]
tensor_attribute_names = []
optional_tensor_attribute_names = [
"block_size",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also, should block_size be optional?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think its optional here because its kwarg that defaults to None, I'm a little unclear on what the default block_size should be though. should we default to PerTensor?

Copy link
Contributor

@jerryzh168 jerryzh168 Dec 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably just make it required, it will be simpler I think, value of block_size is dependent on input so we can't really specify a per tensor block_size without knowing the shape of input

output_dtype = activation_tensor.dtype

if weight_tensor.act_quant_kwargs is not None:
activation_tensor = Int8Tensor.from_hp(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mapping_type not passed?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also should this use the _choose... API?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

from_hp calls _choose_qparams_affine under the hood.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh sorry I meant this:

def _choose_quant_func_and_quantize_tensor(

weight,
granularity=config.granularity,
act_quant_kwargs=QuantizeTensorToInt8Kwargs(granularity=config.granularity),
granularity=weight_granularity,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mapping_type for weight is not passed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the config doesn't have an option for the weight_mapping_type, so we just use the default (symmetric)

@jcaip jcaip requested a review from jerryzh168 December 4, 2025 20:34
# TODO: Revisit for supported granularitys
# https://github.com/pytorch/ao/pull/3241#discussion_r2551497849
granularity: Optional[Granularity] = PerRow()
granularity: Union[PerRow, PerTensor] = PerRow()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: just granularity: Granularity?

cls,
w_hp: torch.Tensor,
hp_tensor: torch.Tensor,
granularity: Granularity = PerRow(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: we can make granularity required I think

return Int8Tensor.from_hp(
tensor,
quant_kwargs.granularity,
quant_kwargs.mapping_type,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: pass the optional arg mapping_type by name is probably better

mapping_type: whether to use symmetric or asymmetric quant, only symmetric is supported currently
"""

granularity: Granularity = PerRow()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: same here, we can just make granularity required I think

@jcaip jcaip force-pushed the jcaip/int8-tensor branch from a5a6140 to a665d45 Compare December 4, 2025 21:50
@jcaip jcaip merged commit c4273fe into main Dec 5, 2025
24 of 36 checks passed
@jerryzh168
Copy link
Contributor

forgot to mention, will this flatten stuff affect performance?

@jcaip
Copy link
Contributor Author

jcaip commented Dec 5, 2025

forgot to mention, will this flatten stuff affect performance?

Didn't run benchmarks, but flatten should just be a view for 2d -> 1d so there shouldn't be a perf hit.

@jcaip jcaip mentioned this pull request Dec 5, 2025
avizon-aws pushed a commit to avizon-aws/ao that referenced this pull request Dec 5, 2025
* Int8Tensor migration

Summary:

This PR creates a new Int8Tensor and updates the configs to use the new
Int8Tensor flow

Test Plan:

To ensure BC:
```
pytest test/quantization/test_quant_api.py
```

To test new Int8Tensor:
```
pytest test/quantization/quantize_/workflows/int8/test_int8_tensor.py
```

Reviewers:

Subscribers:

Tasks:

Tags:

* ruff fixes

* add init

* fix ruff again

* update

* wip

* undo update tests

* fix ruff

* fix varname

* fix typing

* add tests

* fix dtype

* fix ci

* address granularity cr

* update _choose_quant_func_and_quantize_tensor

* make block size required attribute

* made dtype required as well

* address nits

* skip per tensor weight only test for now
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. topic: improvement Use this tag if this PR is an improvement (doesn't fit into any of the other categories)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants