Skip to content

Commit 0ca51a1

Browse files
yuwenzhomengniwang95
authored andcommitted
Update ITREX version in ONNXRT WOQ example and fix bugs in hf models (#1333)
Signed-off-by: yuwenzho <yuwen.zhou@intel.com>
1 parent f760298 commit 0ca51a1

File tree

4 files changed

+6
-2
lines changed
  • examples/onnxrt/nlp/huggingface_model

4 files changed

+6
-2
lines changed

examples/onnxrt/nlp/huggingface_model/question_answering/quantization/ptq_static/main.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -483,6 +483,8 @@ def eval_func(model, *args):
483483
if model_args.model_name_or_path == 'mrm8488/spanbert-finetuned-squadv1':
484484
fp32_op_names = ['/bert/embeddings/word_embeddings/Gather',
485485
'/bert/encoder/layer.[5-7|9]/output/dense/MatMul']
486+
elif model_args.model_name_or_path == 'salti/bert-base-multilingual-cased-finetuned-squad':
487+
fp32_op_names = ['/bert/encoder/layer.[4-5]/output/dense/MatMul']
486488
elif model_args.model_name_or_path == 'distilbert-base-uncased-distilled-squad':
487489
fp32_op_names = ['/distilbert/transformer/layer.[1-5]/ffn/lin[1-2]/MatMul']
488490
elif model_args.model_name_or_path == 'deepset/roberta-large-squad2':

examples/onnxrt/nlp/huggingface_model/text_generation/llama/quantization/ptq_static/main.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -25,7 +25,7 @@
2525
import onnxruntime as ort
2626
from torch.nn.functional import pad
2727
from torch.utils.data import DataLoader
28-
from intel_extension_for_transformers.evaluation.lm_eval import evaluate
28+
from intel_extension_for_transformers.llm.evaluation.lm_eval import evaluate
2929
from optimum.onnxruntime import ORTModelForCausalLM
3030
from transformers import LlamaConfig, LlamaTokenizer
3131

examples/onnxrt/nlp/huggingface_model/text_generation/llama/quantization/weight_only/README.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,8 @@ pip install -r requirements.txt
1212
```
1313
> Note: Validated ONNX Runtime [Version](/docs/source/installation_guide.md#validated-software-environment).
1414
15+
> Note: Weight-only quantization in Intel® Neural Compressor is still under development. We encourage you to use the `master` branch to access the latest features.
16+
1517
## 2. Prepare Model
1618

1719
Note that this README.md uses meta-llama/Llama-2-7b-hf as an example. There are other models available that can be used for weight-only quantization. The following table shows a few models' configurations:

examples/onnxrt/nlp/huggingface_model/text_generation/llama/quantization/weight_only/main.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -26,7 +26,7 @@
2626
import onnxruntime as ort
2727
from torch.nn.functional import pad
2828
from torch.utils.data import DataLoader
29-
from intel_extension_for_transformers.evaluation.lm_eval import evaluate
29+
from intel_extension_for_transformers.llm.evaluation.lm_eval import evaluate
3030
from optimum.onnxruntime import ORTModelForCausalLM
3131
from transformers import LlamaConfig, LlamaTokenizer
3232

0 commit comments

Comments
 (0)