Skip to content

Conversation

@sahas3
Copy link
Member

@sahas3 sahas3 commented Nov 11, 2025

This PR enables e2e test for quantized torch.mm and it's other variants through the tosa path.

torch IR for quantized matmul is shown in the following snippet:

%2 = torch.aten._make_per_tensor_quantized_tensor %0, %float2.150000e-02, %int-25 : !torch.vtensor<[3,4],si8>, !torch.float, !torch.int -> !torch.vtensor<[3,4],!torch.qint8>
  %3 = torch.aten._make_per_tensor_quantized_tensor %1, %float1.760000e-02, %int18 : !torch.vtensor<[4,3],si8>, !torch.float, !torch.int -> !torch.vtensor<[4,3],!torch.qint8>
  %4 = torch.aten.mm %2, %3 : !torch.vtensor<[3,4],!torch.qint8>, !torch.vtensor<[4,3],!torch.qint8> -> !torch.vtensor<[3,3],!torch.qint32>
  %5 = torch.aten.int_repr %4 : !torch.vtensor<[3,3],!torch.qint32> -> !torch.vtensor<[3,3],si32>
  %6 = torch.aten._make_per_tensor_quantized_tensor %5, %float3.784000e-04, %int0 : !torch.vtensor<[3,3],si32>, !torch.float, !torch.int -> !torch.vtensor<[3,3],!torch.qint32>
  %7 = torch.aten.dequantize.tensor %6 : !torch.vtensor<[3,3],!torch.qint32> -> !torch.vtensor<[3,3],f32>
  1. This change adds legalizations for _make_per_tensor_quantized_tensor, int_repr which are basically cast operations. The former op carries the zero-point/scale information for (de)quantizing values.
  2. Legalization for dequantize.tensor is also added which is the usual dequantization op.
  3. Legalization for matmul is fixed to infer the zero-point information from the source _make_per_tensor_quantized_tensor ops for the matmul operands. Scale doesn't need to be considered, as it will be taken care of correctly at the output via FuseQuantizedOps transform.

@sahas3 sahas3 requested a review from sjarus November 11, 2025 13:24
@sahas3
Copy link
Member Author

sahas3 commented Nov 17, 2025

Hi @sjarus, a gentle reminder to take a look at this PR when you have a chance. Thanks!

Copy link
Collaborator

@sjarus sjarus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the delay! My github notifications aren't always reliable.

@sahas3
Copy link
Member Author

sahas3 commented Nov 19, 2025

Sorry for the delay! My github notifications aren't always reliable.

No worries, thanks for the review!

@sahas3 sahas3 merged commit b834f94 into llvm:main Nov 19, 2025
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants