Skip to content

Commit 5fb2184

Browse files
xin3heYantom1ulivnedudilesterHolyFalafel
authored
Cherry pick Habana software 1.18.0 update (#2025)
Signed-off-by: xinhe3 <xinhe3@habana.ai> Signed-off-by: Yi Liu <yiliu4@habana.ai> Signed-off-by: Sun, Xuehao <xuehao.sun@intel.com> Signed-off-by: chensuyue <suyue.chen@intel.com> Co-authored-by: yan tomsinsky <ytomsinsky@habana.ai> Co-authored-by: Uri Livne <ulivne@habana.ai> Co-authored-by: Dudi Lester <dlester@habana.ai> Co-authored-by: Danny <dsemiat@habana.ai> Co-authored-by: Tomer Gafni <tgafni@habana.ai> Co-authored-by: Eran Geva <egeva@habana.ai> Co-authored-by: Daniel Ohayon <danielohayon444@gmail.com> Co-authored-by: Roi Tiefenbrunn <rtiefenbrunn@habana.ai> Co-authored-by: Kamil Felskowski <kfelskowskix@habana.ai> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
1 parent d6149aa commit 5fb2184

File tree

67 files changed

+99756
-945
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

67 files changed

+99756
-945
lines changed

.azure-pipelines/scripts/models/run_model_trigger_common.sh

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -88,7 +88,7 @@ elif [ "${mode}" == "tuning" ]; then
8888
cd ${WORK_SOURCE_DIR}/${model_src_dir}
8989
# for int4 models add "--accuracy" to run tuning after quantize
9090
if [[ "${model}" == *"int4"* ]]; then
91-
sed -i "s|--quantize|--quantize --accuracy --int8|g" run_quant.sh
91+
sed -i "s|--quantize|--quantize --accuracy --load|g" run_quant.sh
9292
fi
9393

9494
$BOLD_YELLOW && echo "workspace ${WORK_SOURCE_DIR}/${model_src_dir}" && $RESET

.azure-pipelines/scripts/ut/3x/coverage.3x_pt

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,7 @@ include =
77
*/neural_compressor/torch/*
88
omit =
99
*/neural_compressor/torch/algorithms/fp8_quant/*
10+
*/neural_compressor/torch/algorithms/mixed_low_precision/*
1011
*/neural_compressor/torch/amp/*
1112
exclude_lines =
1213
pragma: no cover

.azure-pipelines/scripts/ut/3x/coverage.3x_pt_fp8

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,7 @@ branch = True
44
[report]
55
include =
66
*/neural_compressor/torch/algorithms/fp8_quant/*
7+
*/neural_compressor/torch/algorithms/mixed_low_precision/*
78
exclude_lines =
89
pragma: no cover
910
raise NotImplementedError

.azure-pipelines/scripts/ut/3x/run_3x_pt_fp8.sh

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,6 @@ sed -i '/^intel_extension_for_pytorch/d' /neural-compressor/test/3x/torch/requir
1010
sed -i '/^auto_round/d' /neural-compressor/test/3x/torch/requirements.txt
1111
cat /neural-compressor/test/3x/torch/requirements.txt
1212
pip install -r /neural-compressor/test/3x/torch/requirements.txt
13-
pip install git+https://github.com/HabanaAI/DeepSpeed.git@1.16.0
1413
pip install pytest-cov
1514
pip install pytest-html
1615
pip install pytest-html-merger
@@ -27,6 +26,7 @@ pytest --cov="${inc_path}" -vs --disable-warnings --html=report_1.html --self-co
2726
pytest --cov="${inc_path}" -vs --disable-warnings --html=report_2.html --self-contained-html torch/quantization/weight_only/test_rtn.py 2>&1 | tee -a ${ut_log_name}
2827
# pytest --cov="${inc_path}" -vs --disable-warnings --html=report_3.html --self-contained-html torch/quantization/weight_only/test_autoround.py 2>&1 | tee -a ${ut_log_name}
2928
pytest --cov="${inc_path}" -vs --disable-warnings --html=report_4.html --self-contained-html torch/quantization/fp8_quant 2>&1 | tee -a ${ut_log_name}
29+
pytest --cov="${inc_path}" -vs --disable-warnings --html=report_5.html --self-contained-html torch/algorithms/fp8_quant 2>&1 | tee -a ${ut_log_name}
3030

3131
mkdir -p report && mv *.html report
3232
pytest_html_merger -i ./report -o ./report.html

.azure-pipelines/template/docker-template.yml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -74,7 +74,7 @@ steps:
7474
7575
- ${{ if eq(parameters.imageSource, 'pull') }}:
7676
- script: |
77-
docker pull vault.habana.ai/gaudi-docker/1.17.0/ubuntu22.04/habanalabs/pytorch-installer-2.3.1:latest
77+
docker pull vault.habana.ai/gaudi-docker/1.18.0/ubuntu22.04/habanalabs/pytorch-installer-2.4.0:latest
7878
displayName: "Pull habana docker image"
7979
8080
- script: |
@@ -95,7 +95,7 @@ steps:
9595
else
9696
docker run -dit --disable-content-trust --privileged --name=${{ parameters.containerName }} --shm-size="2g" \
9797
--runtime=habana -e HABANA_VISIBLE_DEVICES=all -e OMPI_MCA_btl_vader_single_copy_mechanism=none --cap-add=sys_nice --net=host --ipc=host \
98-
-v ${BUILD_SOURCESDIRECTORY}:/neural-compressor vault.habana.ai/gaudi-docker/1.17.0/ubuntu22.04/habanalabs/pytorch-installer-2.3.1:latest
98+
-v ${BUILD_SOURCESDIRECTORY}:/neural-compressor vault.habana.ai/gaudi-docker/1.18.0/ubuntu22.04/habanalabs/pytorch-installer-2.4.0:latest
9999
fi
100100
echo "Show the container list after docker run ... "
101101
docker ps -a

docs/source/3x/PT_FP8Quant.md

Lines changed: 1 addition & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -20,15 +20,6 @@ Intel Neural Compressor provides general quantization APIs to leverage HPU FP8 c
2020

2121
## Supported Parameters
2222

23-
<style type="text/css">
24-
.tg {border-collapse:collapse;border-spacing:0;}
25-
.tg td{border-color:black;border-style:solid;border-width:1px;font-family:Arial, sans-serif;font-size:14px;
26-
overflow:hidden;padding:10px 5px;word-break:normal;}
27-
.tg th{border-color:black;border-style:solid;border-width:1px;font-family:Arial, sans-serif;font-size:14px;
28-
font-weight:normal;overflow:hidden;padding:10px 5px;word-break:normal;}
29-
.tg .tg-fymr{border-color:inherit;font-weight:bold;text-align:left;vertical-align:top}
30-
.tg .tg-0pky{border-color:inherit;text-align:left;vertical-align:top}
31-
</style>
3223
<table class="tg"><thead>
3324
<tr>
3425
<th class="tg-fymr">Attribute</th>
@@ -74,7 +65,7 @@ Intel Neural Compressor provides general quantization APIs to leverage HPU FP8 c
7465
<tr>
7566
<td class="tg-0pky">scale_method</td>
7667
<td class="tg-0pky">The method for calculating the scale from the measurement.</td>
77-
<td class="tg-0pky">- without_scale - Convert to/from FP8 without scaling.<br>- unit_scale - Always use scale of 1.<br>- maxabs_hw (default) - Scale is calculated to stretch/compress the maxabs measurement to the full-scale of FP8 and then aligned to the corresponding HW accelerated scale.<br>- maxabs_pow2 - Scale is calculated to stretch/compress the maxabs measurement to the full-scale of FP8 and then rounded to the power of 2.<br>- maxabs_hw_opt_weight - Scale of model params (weights) is chosen as the scale that provides minimal mean-square-error between quantized and non-quantized weights, from all possible HW accelerated scales. Scale of activations is calculated the same as maxabs_hw.<br>- act_maxabs_pow2_weights_pcs_opt_pow2 - Scale of model params (weights) is calculated per-channel of the params tensor. The scale per-channel is calculated the same as maxabs_hw_opt_weight. Scale of activations is calculated the same as maxabs_pow2.<br>- act_maxabs_hw_weights_pcs_maxabs_pow2 - Scale of model params (weights) is calculated per-channel of the params tensor. The scale per-channel is calculated the same as maxabs_pow2. Scale of activations is calculated the same as maxabs_hw.</td>
68+
<td class="tg-0pky">- unit_scale - Always use scale of 1.<br>- hw_aligned_single_scale - Always use scale that's aligned to the corresponding HW accelerated scale.<br>- maxabs_hw (default) - Scale is calculated to stretch/compress the maxabs measurement to the full-scale of FP8 and then aligned to the corresponding HW accelerated scale.<br>- maxabs_pow2 - Scale is calculated to stretch/compress the maxabs measurement to the full-scale of FP8 and then rounded to the power of 2.<br>- maxabs_hw_opt_weight - Scale of model params (weights) is chosen as the scale that provides minimal mean-square-error between quantized and non-quantized weights, from all possible HW accelerated scales. Scale of activations is calculated the same as maxabs_hw.<br>- act_maxabs_pow2_weights_pcs_opt_pow2 - Scale of model params (weights) is calculated per-channel of the params tensor. The scale per-channel is calculated the same as maxabs_hw_opt_weight. Scale of activations is calculated the same as maxabs_pow2.<br>- act_maxabs_hw_weights_pcs_maxabs_pow2 - Scale of model params (weights) is calculated per-channel of the params tensor. The scale per-channel is calculated the same as maxabs_pow2. Scale of activations is calculated the same as maxabs_hw.</td>
7869
</tr>
7970
<tr>
8071
<td class="tg-0pky">measure_exclude</td>

examples/3.x_api/pytorch/nlp/huggingface_models/language-modeling/quantization/smooth_quant/requirements.txt

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -11,3 +11,4 @@ neural-compressor
1111
lm_eval==0.4.3
1212
peft
1313
optimum-intel
14+
intel_extension_for_pytorch

examples/3.x_api/pytorch/nlp/huggingface_models/language-modeling/quantization/smooth_quant/run_clm_no_trainer.py

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -217,7 +217,6 @@ def eval_func(model):
217217

218218

219219
if args.load:
220-
# TODO: we need run_benchmark.sh for loading and remove --accuracy in run_quant.sh, currently run_quant.sh will get fp32 result
221220
if args.int8 or args.int8_bf16_mixed:
222221
print("load int8 model")
223222
from neural_compressor.torch.quantization import load

examples/3.x_api/pytorch/nlp/huggingface_models/language-modeling/quantization/static_quant/ipex/requirements.txt

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -10,3 +10,4 @@ einops
1010
neural-compressor
1111
lm_eval==0.4.3
1212
peft
13+
intel_extension_for_pytorch

examples/3.x_api/pytorch/nlp/huggingface_models/language-modeling/quantization/static_quant/ipex/run_clm_no_trainer.py

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -198,7 +198,6 @@ def run_fn(model):
198198
user_model.save(args.output_dir)
199199

200200
if args.load:
201-
# TODO: we need run_benchmark.sh for loading and remove --accuracy in run_quant.sh, currently run_quant.sh will get fp32 result
202201
if args.int8 or args.int8_bf16_mixed:
203202
print("load int8 model")
204203
from neural_compressor.torch.quantization import load

0 commit comments

Comments
 (0)