[tosa] : Add e2e support for quantized matmul. #4371

sahas3 · 2025-11-11T13:20:19Z

This PR enables e2e test for quantized torch.mm and it's other variants through the tosa path.

torch IR for quantized matmul is shown in the following snippet:

%2 = torch.aten._make_per_tensor_quantized_tensor %0, %float2.150000e-02, %int-25 : !torch.vtensor<[3,4],si8>, !torch.float, !torch.int -> !torch.vtensor<[3,4],!torch.qint8>
  %3 = torch.aten._make_per_tensor_quantized_tensor %1, %float1.760000e-02, %int18 : !torch.vtensor<[4,3],si8>, !torch.float, !torch.int -> !torch.vtensor<[4,3],!torch.qint8>
  %4 = torch.aten.mm %2, %3 : !torch.vtensor<[3,4],!torch.qint8>, !torch.vtensor<[4,3],!torch.qint8> -> !torch.vtensor<[3,3],!torch.qint32>
  %5 = torch.aten.int_repr %4 : !torch.vtensor<[3,3],!torch.qint32> -> !torch.vtensor<[3,3],si32>
  %6 = torch.aten._make_per_tensor_quantized_tensor %5, %float3.784000e-04, %int0 : !torch.vtensor<[3,3],si32>, !torch.float, !torch.int -> !torch.vtensor<[3,3],!torch.qint32>
  %7 = torch.aten.dequantize.tensor %6 : !torch.vtensor<[3,3],!torch.qint32> -> !torch.vtensor<[3,3],f32>

This change adds legalizations for _make_per_tensor_quantized_tensor, int_repr which are basically cast operations. The former op carries the zero-point/scale information for (de)quantizing values.
Legalization for dequantize.tensor is also added which is the usual dequantization op.
Legalization for matmul is fixed to infer the zero-point information from the source _make_per_tensor_quantized_tensor ops for the matmul operands. Scale doesn't need to be considered, as it will be taken care of correctly at the output via FuseQuantizedOps transform.

sahas3 · 2025-11-17T17:56:25Z

Hi @sjarus, a gentle reminder to take a look at this PR when you have a chance. Thanks!

sjarus

Sorry for the delay! My github notifications aren't always reliable.

sahas3 · 2025-11-19T00:40:28Z

Sorry for the delay! My github notifications aren't always reliable.

No worries, thanks for the review!

[tosa] : Add e2e support for quantized matmul.

dcc4050

sahas3 requested a review from sjarus November 11, 2025 13:24

sjarus approved these changes Nov 18, 2025

View reviewed changes

sahas3 merged commit b834f94 into llvm:main Nov 19, 2025
3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[tosa] : Add e2e support for quantized matmul. #4371

[tosa] : Add e2e support for quantized matmul. #4371

Uh oh!

sahas3 commented Nov 11, 2025 •

edited

Loading

Uh oh!

sahas3 commented Nov 17, 2025

Uh oh!

sjarus left a comment

Uh oh!

sahas3 commented Nov 19, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[tosa] : Add e2e support for quantized matmul. #4371

[tosa] : Add e2e support for quantized matmul. #4371

Uh oh!

Conversation

sahas3 commented Nov 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sahas3 commented Nov 17, 2025

Uh oh!

sjarus left a comment

Choose a reason for hiding this comment

Uh oh!

sahas3 commented Nov 19, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

sahas3 commented Nov 11, 2025 •

edited

Loading