Skip to content

The MUL unit of APOT #19

@clevercool

Description

@clevercool

Hi,

Do you have the specific design of the MUL (Multiplication) unit for APOT quantization?

We know that uniform(Int) quantization or POT quantization are friendly to hardware.

Assume that:
R = real number
S = Scale number
T = quantized number
R1 = S1 * T1
R2 = S2 * T2

Uniform quantization simply adopts the INT MUL unit:

T1 = m
T2 = n

So, we have:

R1 * R2 = (S1 * S2) * (m * n) 

For POT:

T1 = 2^m
T2 = 2^n

So, we have:

R1 * R2 = (S1 * S2) * (2^m * 2^n) 
             =  (S1 * S2) * 2^(m + n)

The POT is similar to the only-exponent float MUL.

However, for APOT, I have two questions about the MUL design. There are additive elements in the data.
Assume a 4-bit POT:
The first two bits decoder table:

00 01 10 11
2^0 2^-1 2^-3 2^-5

And the last two bits:

00 01 10 11
2^0 2^-2 2^-4 2^-6

For the first two bits, the decoder table is not continuous: 0, -1, -3, -5.

Q1: How do you efficiently decode the binary code to the APOT, especially in the MUL unit?

Aussume the two number in APOT:

0101: T1 = 2^-1 + 2^-2
1010: T2 = 2^-3 + 2^-4
T1 * T2 = (2^-1 + 2^-2) * (2^-3 + 2^-4) 
             = (2^-1 * 2^-3) + (2^-1 * 2^-4) + (2^-2 * 2^-3) + (2^-2 * 2^-4)
             = 2^-4 + 2^-5 + 2^-5 + 2^-6

Obviously, the calculation has 4x (9x) add operations than POT in 4-bit (6-bit). And the result violates the definition of APOT, which won't have the same additive element in a number, such as 2^-5.

Q2: How do you deal with the complex computation and the subnormal number for APOT?

One direct solution is to convert a float with fake quantization. But is it a violation of the principle of quantization?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions