|
1 | | -LLMs Quantization Recipes |
2 | | ---- |
| 1 | +## LLMs Quantization Recipes |
3 | 2 |
|
4 | | -Intel® Neural Compressor supported advanced large language models (LLMs) quantization technologies including SmoothQuant (SQ) and Weight-Only Quant (WOQ), |
5 | | -and verified a list of LLMs on 4th Gen Intel® Xeon® Scalable Processor (codenamed Sapphire Rapids) with [PyTorch](https://pytorch.org/), |
6 | | -[Intel® Extension for PyTorch](https://github.com/intel/intel-extension-for-pytorch) and [Intel® Extension for Transformers](https://github.com/intel/intel-extension-for-transformers). |
| 3 | +Intel® Neural Compressor supported advanced large language models (LLMs) quantization technologies including SmoothQuant (SQ) and Weight-Only Quant (WOQ), |
| 4 | +and verified a list of LLMs on 4th Gen Intel® Xeon® Scalable Processor (codenamed Sapphire Rapids) with [PyTorch](https://pytorch.org/), |
| 5 | +[Intel® Extension for PyTorch](https://github.com/intel/intel-extension-for-pytorch) and [Intel® Extension for Transformers](https://github.com/intel/intel-extension-for-transformers). |
7 | 6 | This document aims to publish the specific recipes we achieved for the popular LLMs and help users to quickly get an optimized LLM with limited 1% accuracy loss. |
8 | 7 |
|
9 | | -> Notes: |
10 | | -> - The quantization algorithms provide by [Intel® Neural Compressor](https://github.com/intel/neural-compressor) and the evaluate functions provide by [Intel® Extension for Transformers](https://github.com/intel/intel-extension-for-transformers). |
11 | | -> - The model list are continuing update, please expect to find more LLMs in the future. |
| 8 | +> Notes: |
| 9 | +> |
| 10 | +> - The quantization algorithms provide by [Intel® Neural Compressor](https://github.com/intel/neural-compressor) and the evaluate functions provide by [Intel® Extension for Transformers](https://github.com/intel/intel-extension-for-transformers). |
| 11 | +> - The model list are continuing update, please expect to find more LLMs in the future. |
12 | 12 |
|
13 | 13 | ## IPEX key models |
14 | | -| Models | SQ INT8 | WOQ INT8 | WOQ INT4 | |
15 | | -|:-------------------------:|:-------:|:--------:|:--------:| |
16 | | -| EleutherAI/gpt-j-6b | ✔ | ✔ | ✔ | |
17 | | -| facebook/opt-1.3b | ✔ | ✔ | ✔ | |
18 | | -| facebook/opt-30b | ✔ | ✔ | ✔ | |
19 | | -| meta-llama/Llama-2-7b-hf | ✔ | ✔ | ✔ | |
20 | | -| meta-llama/Llama-2-13b-hf | ✔ | ✔ | ✔ | |
21 | | -| meta-llama/Llama-2-70b-hf | ✔ | ✔ | ✔ | |
22 | | -| tiiuae/falcon-40b | ✔ | ✔ | ✔ | |
23 | | - |
| 14 | + |
| 15 | +| Models | SQ INT8 | WOQ INT8 | WOQ INT4 | |
| 16 | +| :-----------------------------: | :-----: | :------: | :------: | |
| 17 | +| EleutherAI/gpt-j-6b | ✔ | ✔ | ✔ | |
| 18 | +| facebook/opt-1.3b | ✔ | ✔ | ✔ | |
| 19 | +| facebook/opt-30b | ✔ | ✔ | ✔ | |
| 20 | +| meta-llama/Llama-2-7b-hf | WIP | ✔ | ✔ | |
| 21 | +| meta-llama/Llama-2-13b-hf | ✔ | ✔ | ✔ | |
| 22 | +| meta-llama/Llama-2-70b-hf | ✔ | ✔ | ✔ | |
| 23 | +| tiiuae/falcon-7b | ✔ | ✔ | ✔ | |
| 24 | +| tiiuae/falcon-40b | ✔ | ✔ | ✔ | |
| 25 | +| baichuan-inc/Baichuan-13B-Chat | ✔ | ✔ | ✔ | |
| 26 | +| baichuan-inc/Baichuan2-13B-Chat | ✔ | ✔ | ✔ | |
| 27 | +| baichuan-inc/Baichuan2-7B-Chat | ✔ | ✔ | ✔ | |
| 28 | +| bigscience/bloom-1b7 | ✔ | ✔ | ✔ | |
| 29 | +| databricks/dolly-v2-12b | ✖ | ✔ | ✖ | |
| 30 | +| EleutherAI/gpt-neox-20b | ✖ | ✔ | ✖ | |
| 31 | +| mistralai/Mistral-7B-v0.1 | ✖ | ✔ | ✔ | |
| 32 | +| THUDM/chatglm2-6b | WIP | ✔ | WIP | |
| 33 | +| THUDM/chatglm3-6b | WIP | ✔ | WIP | |
| 34 | + |
24 | 35 | **Detail recipes can be found [HERE](https://github.com/intel/intel-extension-for-transformers/blob/main/examples/huggingface/pytorch/text-generation/quantization/llm_quantization_recipes.md).** |
25 | | -> Notes: |
26 | | -> - This model list comes from [IPEX](https://intel.github.io/intel-extension-for-pytorch/cpu/latest/tutorials/llm.html). |
27 | | -> - WOQ INT4 recipes will be published soon. |
| 36 | + |
| 37 | +> Notes: |
| 38 | +> |
| 39 | +> - This model list comes from [IPEX](https://intel.github.io/intel-extension-for-pytorch/cpu/latest/tutorials/llm.html). |
| 40 | +> - The WIP recipes will be published soon. |
0 commit comments