You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: examples/pytorch/nlp/huggingface_models/language-modeling/pruning/eager/README.md
+35-24Lines changed: 35 additions & 24 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -10,7 +10,21 @@ The retraining free pruning feature is still in development, please stay tuned.
10
10
## 1. Environment
11
11
12
12
PyTorch 1.8 or higher version is needed with pytorch_fx backend
13
-
The loading of llama models requires transformers version 4.28.0 or higher.
13
+
The transformers version required varies across different types of models. Here, the transformers version used for running models during experiments is provided as a reference.
14
+
| Model | Transformers version |
15
+
| :----: | :----: |
16
+
| EleutherAI/gpt-j-6b | 4.28/4.30/4.34/4.36 |
17
+
| huggyllama/llama-7b | 4.28/4.30/4.34/4.36 |
18
+
| meta-llama/Llama-2-7b-hf | 4.30/4.34/4.36 |
19
+
| facebook/opt-6.7b | 4.28/4.30/4.34/4.36 |
20
+
| databricks/dolly-v2-3b | 4.28/4.30/4.34/4.36 |
21
+
| tiiuae/falcon-7b | 4.28/4.30/4.34/4.36 |
22
+
| mosaicml/mpt-7b | 4.28/4.30/4.34/4.36 |
23
+
| bigscience/bloom-7b1 | 4.28/4.30/4.34/4.36 |
24
+
| baichuan-inc/Baichuan-7B | 4.28/4.30 |
25
+
| Qwen/Qwen-7B | 4.28/4.30/4.34/4.36 |
26
+
| THUDM/chatglm3-6b | 4.34/4.36 |
27
+
| mistralai/Mistral-7B-v0.1 | 4.34/4.36 |
14
28
15
29
16
30
```shell
@@ -39,18 +53,6 @@ Pruning scripts are available for LLM sparse models such as GPT-j, BLOOM, OPT, L
39
53
## Retrain-free Results
40
54
41
55
The last token accuracy for channel pruning using [the retrain-free scripts](https://github.com/intel/neural-compressor/tree/master/examples/pytorch/nlp/huggingface_models/language-modeling/pruning/eager/scripts/run_gptj_pruning.sh) is presented in the following table.
42
-
| Model | Calibration dataset | Evaluation dataset | Sparsity pattern | Over MLP block sparsity |Element-wise/matmul, Gemm, conv ratio | Dense last token accuracy | Sparse last token accuracy | Relative drop |
The last word acc of the 1x1 pattern sparse model using [the sparseGPT script](https://github.com/intel/neural-compressor/tree/master/examples/pytorch/nlp/huggingface_models/language-modeling/pruning/eager/scripts/run_llm_sparsegpt.sh) is shown in the following table.
78
+
76
79
| Model | Task | Calibration dataset | Evaluation dataset | Sparsity | Precision | Dense last word accuracy | Sparse last word accuracy | Relative drop |
0 commit comments