Int8Tensor migration cleanup #3407

jcaip · 2025-12-01T21:02:27Z

Summary:

This PR cleans up several things in Int8Tensor:

adds support for act_mapping_type (SYMMETRIC vs ASYMMETRIC) for parity with v1 config
changes scales logic to be keepdim, previous scales for a 2d weight tensor was a 1d vector, now for a M x N weight is it is a M x 1 vector. This is the same as float8tensor, so we can reuse the same utility functions now.
adds test for different combinations of granularity and act_mapping_type currently only symmetric activation quantization is supported.
updated _choose_quant_func_and_quantize_tensor with int8 quantize kwargs.

Test Plan:

To ensure BC:

pytest test/quantization/test_quant_api.py

To test Int8Tensor:

pytest test/quantization/quantize_/workflows/int8/test_int8_tensor.py

Reviewers:

Subscribers:

Tasks:

Tags:

Summary: This PR creates a new Int8Tensor and updates the configs to use the new Int8Tensor flow Test Plan: To ensure BC: ``` pytest test/quantization/test_quant_api.py ``` To test new Int8Tensor: ``` pytest test/quantization/quantize_/workflows/int8/test_int8_tensor.py ``` Reviewers: Subscribers: Tasks: Tags:

pytorch-bot · 2025-12-01T21:02:31Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/3407

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

⏳ No Failures, 1 Pending

As of commit a665d45 with merge base 3c3515a ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

torchao/quantization/quantize_/workflows/int8/int8_tensor.py

jerryzh168 · 2025-12-01T21:38:27Z

also since we have #3391 but there are some infra issues so some Cis are not running. I'd suggest to recreate this PR exact and land that first, and then we can do additional fixes on top of that to recognize contributions from OSS

jerryzh168 · 2025-12-02T00:30:05Z

torchao/quantization/quantize_/workflows/int8/int8_tensor.py

+    optional_tensor_attribute_names = [
+        "block_size",
+        "act_quant_kwargs",
+        "dtype",


nit: should dtype be optional? or just required?

Oh I guess this is following float8 tensor, and only used for dequantize for now. I think one thing to make sure if just it still works when dtype == None

jerryzh168 · 2025-12-02T00:35:20Z

torchao/quantization/quantize_/workflows/int8/int8_tensor.py

+        # make scale the correct dim
+        if isinstance(granularity, PerRow):
+            scale = scale.unsqueeze(1)
+        elif isinstance(granularity, PerTensor):
+            scale = scale.unsqueeze(0).unsqueeze(1)


these might be confusing, we could align with float8 path and have a keepdim=True variant for choose_qparams_affine I guess:

ao/torchao/quantization/quant_primitives.py

Lines 1553 to 1554 in 73730e8

min_val = torch.amin(input, dim=reduction_dims, keepdim=False)

max_val = torch.amax(input, dim=reduction_dims, keepdim=False)

like choose_qparams_affine_keepdim

jerryzh168 · 2025-12-02T00:37:25Z

torchao/quantization/quant_api.py

    layout: Optional[Layout] = PlainLayout()
    act_mapping_type: Optional[MappingType] = MappingType.SYMMETRIC
    weight_only_decode: bool = False
+    granularity: Optional[Union[PerRow, PerTensor]] = PerRow()


nit: typing needs to be updated to Optional[Union[Granularity, List[Granularity, Granularity]]] I think

jerryzh168 · 2025-12-02T00:38:23Z

torchao/quantization/quant_api.py

+        activation_granularity, weight_granularity = _normalize_granularity(
+            config.granularity
+        )
+        act_quant_kwargs = QuantizeTensorToInt8Kwargs(


should we also do some validations here on what are the combinations that's supported?

This should just be, granularity and activation_mapping_type (symmetric vs asymmetric), which is what the v1 config supports. Will add a test for these combos

jerryzh168

thanks, looks good overall, see some comments inline

torchao/quantization/quant_api.py

jerryzh168 · 2025-12-02T00:39:56Z

torchao/quantization/quant_api.py

    group_size: Optional[int] = None
+    granularity: Granularity = PerRow()
    set_inductor_config: bool = True
+    version: int = 2


btw, I think we should break BC in a separate PR

jerryzh168 · 2025-12-02T00:40:16Z

torchao/quantization/quant_api.py

    weight_only_decode: bool = False
+    granularity: Optional[Union[PerRow, PerTensor]] = PerRow()
    set_inductor_config: bool = True
+    version: int = 2


same here, I think we should set this to 1 first and break BC separately to reduce the scope of changes

jerryzh168 · 2025-12-02T00:42:35Z

also for all the failed CIs, I think it's because you bumped versions, I think we can keep BC for all these older tests by setting version to 1 explicitly:

ao/test/dtypes/test_affine_quantized_float.py

Line 114 in 73730e8

version=1,

while migrating the ones that we think that's going to be useful to test_int8_tensor.py

jerryzh168 · 2025-12-03T18:03:02Z

torchao/quantization/quantize_/workflows/int8/int8_tensor.py

-            block_size=block_size,
-            scale=self.scale,
+            block_size=self.block_size,
+            scale=self.scale.squeeze(),


this is probably not very robust, maybe we can add a keepdim for both quantize_affine and dequantize_affine as well for now?

in the future we can just refactor these ops to use expanded scale/zero_point: #3324

Took a look and the code as currently written will be fine for both flattend and non-flattened case, we just need keepdims for quantize_affine

jerryzh168 · 2025-12-03T18:04:29Z

torchao/quantization/quantize_/workflows/int8/int8_tensor.py

        kwargs = {
            "device": qdata.device,
-            "dtype": dtype or scale.dtype,
+            "dtype": dtype,


is dtype becoming required? if so, we can move dtype from optional_tensor_attribute_names to tensor_attribute_names and change its order in the arglist

I was just following how Float8Tensor did it:

ao/torchao/quantization/quantize_/workflows/float8/float8_tensor.py

Line 109 in 2872c9b

def __new__(

jerryzh168 · 2025-12-03T18:05:23Z

torchao/quantization/quantize_/workflows/int8/int8_tensor.py

+    assert len(old_int8_tensor.qdata.shape) == len(old_int8_tensor.block_size), (
+        "unsupported"
+    )
+    new_int8_tensor = old_int8_tensor.__class__(


nit: we can just use Int8Tensor here to be more explicit I think

jerryzh168 · 2025-12-03T18:41:54Z

torchao/quantization/quantize_/workflows/int8/int8_tensor.py

 @dataclass
 class QuantizeTensorToInt8Kwargs(QuantizeTensorKwargs):
-    """Tensor kwargs for creating int8 tensor (either activation or weight)
+    """Tensor kwargs for creating int8 tensor from high precision


this is also just for activation I think, we can include that in the comment as well

jerryzh168 · 2025-12-03T18:43:08Z

torchao/quantization/quantize_/workflows/int8/int8_tensor.py

-    optional_tensor_attribute_names = ["act_quant_kwargs", "block_size", "dtype"]
+    tensor_attribute_names = []
+    optional_tensor_attribute_names = [
+        "block_size",


also, should block_size be optional?

I think its optional here because its kwarg that defaults to None, I'm a little unclear on what the default block_size should be though. should we default to PerTensor?

probably just make it required, it will be simpler I think, value of block_size is dependent on input so we can't really specify a per tensor block_size without knowing the shape of input

torchao/quantization/quant_api.py

jerryzh168 · 2025-12-04T00:22:14Z

torchao/quantization/quantize_/workflows/int8/int8_tensor.py

    output_dtype = activation_tensor.dtype

    if weight_tensor.act_quant_kwargs is not None:
        activation_tensor = Int8Tensor.from_hp(


mapping_type not passed?

also should this use the _choose... API?

from_hp calls _choose_qparams_affine under the hood.

oh sorry I meant this:

ao/torchao/quantization/quantize_/common/quantize_tensor_kwargs.py

Line 33 in 0975a40

def _choose_quant_func_and_quantize_tensor(

jerryzh168 · 2025-12-04T00:22:42Z

torchao/quantization/quant_api.py

            weight,
-            granularity=config.granularity,
-            act_quant_kwargs=QuantizeTensorToInt8Kwargs(granularity=config.granularity),
+            granularity=weight_granularity,


mapping_type for weight is not passed?

the config doesn't have an option for the weight_mapping_type, so we just use the default (symmetric)

jerryzh168 · 2025-12-04T20:37:18Z

torchao/quantization/quant_api.py

-    # TODO: Revisit for supported granularitys
-    # https://github.com/pytorch/ao/pull/3241#discussion_r2551497849
-    granularity: Optional[Granularity] = PerRow()
+    granularity: Union[PerRow, PerTensor] = PerRow()


nit: just granularity: Granularity?

jerryzh168 · 2025-12-04T20:38:15Z

torchao/quantization/quantize_/workflows/int8/int8_tensor.py

        cls,
-        w_hp: torch.Tensor,
+        hp_tensor: torch.Tensor,
        granularity: Granularity = PerRow(),


nit: we can make granularity required I think

jerryzh168 · 2025-12-04T20:39:13Z

torchao/quantization/quantize_/common/quantize_tensor_kwargs.py

+        return Int8Tensor.from_hp(
+            tensor,
+            quant_kwargs.granularity,
+            quant_kwargs.mapping_type,


nit: pass the optional arg mapping_type by name is probably better

jerryzh168 · 2025-12-04T20:39:53Z

torchao/quantization/quantize_/workflows/int8/int8_tensor.py

+        mapping_type: whether to use symmetric or asymmetric quant, only symmetric is supported currently
    """

    granularity: Granularity = PerRow()


nit: same here, we can just make granularity required I think

jerryzh168 · 2025-12-05T00:03:40Z

forgot to mention, will this flatten stuff affect performance?

jcaip · 2025-12-05T00:49:51Z

forgot to mention, will this flatten stuff affect performance?

Didn't run benchmarks, but flatten should just be a view for 2d -> 1d so there shouldn't be a perf hit.

* Int8Tensor migration Summary: This PR creates a new Int8Tensor and updates the configs to use the new Int8Tensor flow Test Plan: To ensure BC: ``` pytest test/quantization/test_quant_api.py ``` To test new Int8Tensor: ``` pytest test/quantization/quantize_/workflows/int8/test_int8_tensor.py ``` Reviewers: Subscribers: Tasks: Tags: * ruff fixes * add init * fix ruff again * update * wip * undo update tests * fix ruff * fix varname * fix typing * add tests * fix dtype * fix ci * address granularity cr * update _choose_quant_func_and_quantize_tensor * make block size required attribute * made dtype required as well * address nits * skip per tensor weight only test for now

meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Dec 1, 2025

ruff fixes

0b73aed

jerryzh168 reviewed Dec 1, 2025

View reviewed changes

torchao/quantization/quantize_/workflows/int8/int8_tensor.py Outdated Show resolved Hide resolved

jcaip added 3 commits December 1, 2025 14:18

add init

1e49945

fix ruff again

669b6ee

update

9071526

jcaip added the topic: improvement Use this tag if this PR is an improvement (doesn't fit into any of the other categories) label Dec 1, 2025

jerryzh168 reviewed Dec 2, 2025

View reviewed changes

jerryzh168 approved these changes Dec 2, 2025

View reviewed changes

jerryzh168 reviewed Dec 2, 2025

View reviewed changes

torchao/quantization/quant_api.py Outdated Show resolved Hide resolved

jerryzh168 reviewed Dec 2, 2025

View reviewed changes

wip

1539e0f

jerryzh168 self-requested a review December 2, 2025 04:49

jcaip added 7 commits December 2, 2025 17:53

Merge branch 'main' into jcaip/int8-tensor

d9a2b1b

undo update tests

673f228

fix ruff

739fd64

fix varname

750db1a

fix typing

9410488

add tests

45a3a76

fix dtype

4e2f09c

jcaip changed the title ~~Int8Tensor migration~~ Int8Tensor migration cleaniup Dec 3, 2025

jcaip changed the title ~~Int8Tensor migration cleaniup~~ Int8Tensor migration cleanup Dec 3, 2025

jerryzh168 reviewed Dec 3, 2025

View reviewed changes

torchao/quantization/quant_api.py Outdated Show resolved Hide resolved

jerryzh168 reviewed Dec 3, 2025

View reviewed changes

torchao/quantization/quant_api.py Show resolved Hide resolved

jcaip added 2 commits December 3, 2025 15:53

fix ci

dd80cca

address granularity cr

7f73062

jerryzh168 reviewed Dec 4, 2025

View reviewed changes

jcaip added 3 commits December 4, 2025 12:12

update _choose_quant_func_and_quantize_tensor

ac6a2b6

make block size required attribute

f28df4a

made dtype required as well

328585e

jcaip requested a review from jerryzh168 December 4, 2025 20:34

jerryzh168 reviewed Dec 4, 2025

View reviewed changes

jerryzh168 approved these changes Dec 4, 2025

View reviewed changes

jcaip added 2 commits December 4, 2025 12:45

address nits

ce4d568

skip per tensor weight only test for now

a665d45

jcaip force-pushed the jcaip/int8-tensor branch from a5a6140 to a665d45 Compare December 4, 2025 21:50

jcaip merged commit c4273fe into main Dec 5, 2025
24 of 36 checks passed

jcaip mentioned this pull request Dec 5, 2025

[wip] int8 static quant #3421

Closed

	min_val = torch.amin(input, dim=reduction_dims, keepdim=False)
	max_val = torch.amax(input, dim=reduction_dims, keepdim=False)

Int8Tensor migration cleanup #3407

Int8Tensor migration cleanup #3407

Uh oh!

Conversation

jcaip commented Dec 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Dec 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/3407

⏳ No Failures, 1 Pending

Uh oh!

Uh oh!

jerryzh168 commented Dec 1, 2025

Uh oh!

jerryzh168 Dec 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jerryzh168 Dec 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jcaip Dec 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jerryzh168 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jerryzh168 Dec 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jerryzh168 commented Dec 2, 2025

Uh oh!

jerryzh168 Dec 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jerryzh168 Dec 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jerryzh168 Dec 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jcaip commented Dec 1, 2025 •

edited

Loading

pytorch-bot bot commented Dec 1, 2025 •

edited

Loading

jerryzh168 Dec 2, 2025 •

edited

Loading

jerryzh168 Dec 2, 2025 •

edited

Loading

jcaip Dec 3, 2025 •

edited

Loading

jerryzh168 Dec 2, 2025 •

edited

Loading

jerryzh168 Dec 3, 2025 •

edited

Loading

jerryzh168 Dec 3, 2025 •

edited

Loading

jerryzh168 Dec 4, 2025 •

edited

Loading