AWQ support

### Feature request

Integrate AWQ models with TGI. AWQ is a quantization method that has better speedups than GPTQ. They mainly quantize linear layers which they replace with an optimized GEMM kernel. It is W4A16 quantization.

Code: https://github.com/mit-han-lab/llm-awq
Paper: https://arxiv.org/pdf/2306.00978.pdf

cc @michaelfeil @Atry 

### Motivation

The main motivation is simply to speed up models further. I achieved 134 tokens/s on a 4090+i9-13900k with AWQ quantization on an MPT 7B model (LLaMa gets 100+ tokens).

### Your contribution

Currently, I am not able to contribute.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AWQ support #8

Feature request

Motivation

Your contribution

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

AWQ support #8

Description

Feature request

Motivation

Your contribution

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions