Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
41 commits
Select commit Hold shift + click to select a range
66d1104
feat(tools/quantize): support .ini format
tpoisonooo Jun 13, 2022
e0a0ca6
apply code-format changes
tpoisonooo Jul 28, 2022
7565af0
fix(CI): rebase error
tpoisonooo Jul 28, 2022
7a3841f
Merge branch 'improve-mha-int8' of https://github.com/tpoisonooo/ncnn…
tpoisonooo Jul 28, 2022
d5f7835
apply code-format changes
tpoisonooo Jul 28, 2022
fa8b0bc
fix(CI): test mha exceeding
tpoisonooo Aug 1, 2022
9d3fb10
improvement(src/mha): add file structure
tpoisonooo Aug 1, 2022
f38ca73
apply code-format changes
tpoisonooo Aug 1, 2022
4a05da2
improvement(src/layer/mha): add const fold
tpoisonooo Aug 2, 2022
8b9c6a7
Merge branch 'improve-mha-int8' of https://github.com/tpoisonooo/ncnn…
tpoisonooo Aug 2, 2022
49cbb14
apply code-format changes
tpoisonooo Aug 2, 2022
37848f1
improvement(src/layer/mha): update
tpoisonooo Aug 2, 2022
d76eb0d
Merge branch 'improve-mha-int8' of https://github.com/tpoisonooo/ncnn…
tpoisonooo Aug 2, 2022
95692d7
fix(src/layer/mha): miss convert weight to int8
tpoisonooo Aug 3, 2022
07a1424
fix(src/layer/x86/mha): update int8
tpoisonooo Aug 3, 2022
9c1c2c9
apply code-format changes
tpoisonooo Aug 3, 2022
9454c51
improvement(src/x86/mha): int8 optimization
tpoisonooo Aug 4, 2022
42ad426
apply code-format changes
tpoisonooo Aug 4, 2022
6854ef7
feat(src/layer): layernorm support int8
tpoisonooo Aug 5, 2022
9bd2ac4
apply code-format changes
tpoisonooo Aug 5, 2022
a6a818c
feat(src/layer/layernorm.cpp): add int8 support
tpoisonooo Aug 8, 2022
ac0d745
apply code-format changes
tpoisonooo Aug 8, 2022
4d19507
fix(src/layer): update layernorm OMP option
tpoisonooo Aug 8, 2022
fa00976
Merge branch 'full-integer-vit' of https://github.com/tpoisonooo/ncnn…
tpoisonooo Aug 8, 2022
6ddf7cc
feat(src/layer): binaryop support int8
tpoisonooo Aug 10, 2022
5727e19
feat(src/layer): test add int8 failed
tpoisonooo Aug 11, 2022
86148bd
feat(src/layer): revert int8 BinaryOp
tpoisonooo Aug 11, 2022
e7f84d0
apply code-format changes
tpoisonooo Aug 11, 2022
4588921
feat(src/layer/x86): revert binaryop
tpoisonooo Aug 11, 2022
a544565
Merge branch 'vit-add-layernorm-int8' of https://github.com/tpoisonoo…
tpoisonooo Aug 11, 2022
9277a1f
docs(src): update docs
tpoisonooo Aug 11, 2022
a666997
fix(CI): layernorm int8 build error
tpoisonooo Aug 15, 2022
8226eee
fix(src/layer): fixme warning
tpoisonooo Aug 15, 2022
259ca0b
fix(CI): sse2 run error
tpoisonooo Aug 15, 2022
2c1d9b0
fix(CI): build error
tpoisonooo Aug 15, 2022
45f9207
fix(CI): test_layernorm
tpoisonooo Aug 15, 2022
a2c48aa
fix(CI): test_layernorm
tpoisonooo Aug 15, 2022
102e1dd
fix(CI): add hook
tpoisonooo Aug 15, 2022
4c9c906
fix(layernorm): add debug hook
tpoisonooo Aug 15, 2022
e9c6792
fix(CI): update
tpoisonooo Aug 15, 2022
81a1149
fix(test): update
tpoisonooo Aug 15, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions benchmark/benchncnn.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -321,6 +321,9 @@ int main(int argc, char** argv)
benchmark("vision_transformer", ncnn::Mat(384, 384, 3), opt);

benchmark("FastestDet", ncnn::Mat(352, 352, 3), opt);

benchmark("vision_transformer_int8", ncnn::Mat(384, 384, 3), opt);

#if NCNN_VULKAN
delete g_blob_vkallocator;
delete g_staging_vkallocator;
Expand Down
146 changes: 146 additions & 0 deletions benchmark/vision_transformer_int8.param
Original file line number Diff line number Diff line change
@@ -0,0 +1,146 @@
7767517
144 192
Input input 0 1 input
MemoryData backbone.cls_token 0 1 backbone.cls_token 0=768 1=1
MemoryData backbone.pos_embed 0 1 backbone.pos_embed 0=768 1=145
Convolution Conv_0 1 1 input onnx::Reshape_153 0=768 1=32 3=32 5=1 6=2359296 8=2
Reshape Reshape_2 1 1 onnx::Reshape_153 onnx::Transpose_155 0=-1 1=768
Permute Transpose_3 1 1 onnx::Transpose_155 onnx::Concat_156 0=1
Concat Concat_4 2 1 backbone.cls_token onnx::Concat_156 onnx::Add_157
BinaryOp Add_5 2 1 onnx::Add_157 backbone.pos_embed input.1
Split splitncnn_0 1 2 input.1 input.1_splitncnn_0 input.1_splitncnn_1
LayerNorm LayerNorm_6 1 1 input.1_splitncnn_1 qkv_input 0=768 1=1.000000e-06 3=101
Split splitncnn_1 1 3 qkv_input qkv_input_splitncnn_0 qkv_input_splitncnn_1 qkv_input_splitncnn_2
MultiHeadAttention MultiHeadAttention_15 3 1 qkv_input_splitncnn_2 qkv_input_splitncnn_1 qkv_input_splitncnn_0 onnx::Add_168 0=768 1=12 2=589824 3=1
BinaryOp Add_16 2 1 input.1_splitncnn_0 onnx::Add_168 input.4
Split splitncnn_2 1 2 input.4 input.4_splitncnn_0 input.4_splitncnn_1
LayerNorm LayerNorm_17 1 1 input.4_splitncnn_1 A.1 0=768 1=1.000000e-06 3=101
InnerProduct Gemm_18 1 1 A.1 mmdeploy::Gelu_171 0=3072 1=1 2=2359296 8=2
GELU Gelu_19 1 1 mmdeploy::Gelu_171 A.3 0=1
InnerProduct Gemm_20 1 1 A.3 input.16 0=768 1=1 2=2359296 8=2
BinaryOp Add_21 2 1 input.4_splitncnn_0 input.16 input.20
Split splitncnn_3 1 2 input.20 input.20_splitncnn_0 input.20_splitncnn_1
LayerNorm LayerNorm_22 1 1 input.20_splitncnn_1 qkv_input.3 0=768 1=1.000000e-06 3=101
Split splitncnn_4 1 3 qkv_input.3 qkv_input.3_splitncnn_0 qkv_input.3_splitncnn_1 qkv_input.3_splitncnn_2
MultiHeadAttention MultiHeadAttention_31 3 1 qkv_input.3_splitncnn_2 qkv_input.3_splitncnn_1 qkv_input.3_splitncnn_0 onnx::Add_184 0=768 1=12 2=589824 3=1
BinaryOp Add_32 2 1 input.20_splitncnn_0 onnx::Add_184 input.24
Split splitncnn_5 1 2 input.24 input.24_splitncnn_0 input.24_splitncnn_1
LayerNorm LayerNorm_33 1 1 input.24_splitncnn_1 A.5 0=768 1=1.000000e-06 3=101
InnerProduct Gemm_34 1 1 A.5 mmdeploy::Gelu_187 0=3072 1=1 2=2359296 8=2
GELU Gelu_35 1 1 mmdeploy::Gelu_187 A.7 0=1
InnerProduct Gemm_36 1 1 A.7 input.36 0=768 1=1 2=2359296 8=2
BinaryOp Add_37 2 1 input.24_splitncnn_0 input.36 input.40
Split splitncnn_6 1 2 input.40 input.40_splitncnn_0 input.40_splitncnn_1
LayerNorm LayerNorm_38 1 1 input.40_splitncnn_1 qkv_input.7 0=768 1=1.000000e-06 3=101
Split splitncnn_7 1 3 qkv_input.7 qkv_input.7_splitncnn_0 qkv_input.7_splitncnn_1 qkv_input.7_splitncnn_2
MultiHeadAttention MultiHeadAttention_47 3 1 qkv_input.7_splitncnn_2 qkv_input.7_splitncnn_1 qkv_input.7_splitncnn_0 onnx::Add_200 0=768 1=12 2=589824 3=1
BinaryOp Add_48 2 1 input.40_splitncnn_0 onnx::Add_200 input.44
Split splitncnn_8 1 2 input.44 input.44_splitncnn_0 input.44_splitncnn_1
LayerNorm LayerNorm_49 1 1 input.44_splitncnn_1 A.9 0=768 1=1.000000e-06 3=101
InnerProduct Gemm_50 1 1 A.9 mmdeploy::Gelu_203 0=3072 1=1 2=2359296 8=2
GELU Gelu_51 1 1 mmdeploy::Gelu_203 A.11 0=1
InnerProduct Gemm_52 1 1 A.11 input.56 0=768 1=1 2=2359296 8=2
BinaryOp Add_53 2 1 input.44_splitncnn_0 input.56 input.60
Split splitncnn_9 1 2 input.60 input.60_splitncnn_0 input.60_splitncnn_1
LayerNorm LayerNorm_54 1 1 input.60_splitncnn_1 qkv_input.11 0=768 1=1.000000e-06 3=101
Split splitncnn_10 1 3 qkv_input.11 qkv_input.11_splitncnn_0 qkv_input.11_splitncnn_1 qkv_input.11_splitncnn_2
MultiHeadAttention MultiHeadAttention_63 3 1 qkv_input.11_splitncnn_2 qkv_input.11_splitncnn_1 qkv_input.11_splitncnn_0 onnx::Add_216 0=768 1=12 2=589824 3=1
BinaryOp Add_64 2 1 input.60_splitncnn_0 onnx::Add_216 input.64
Split splitncnn_11 1 2 input.64 input.64_splitncnn_0 input.64_splitncnn_1
LayerNorm LayerNorm_65 1 1 input.64_splitncnn_1 A.13 0=768 1=1.000000e-06 3=101
InnerProduct Gemm_66 1 1 A.13 mmdeploy::Gelu_219 0=3072 1=1 2=2359296 8=2
GELU Gelu_67 1 1 mmdeploy::Gelu_219 A.15 0=1
InnerProduct Gemm_68 1 1 A.15 input.76 0=768 1=1 2=2359296 8=2
BinaryOp Add_69 2 1 input.64_splitncnn_0 input.76 input.80
Split splitncnn_12 1 2 input.80 input.80_splitncnn_0 input.80_splitncnn_1
LayerNorm LayerNorm_70 1 1 input.80_splitncnn_1 qkv_input.15 0=768 1=1.000000e-06 3=101
Split splitncnn_13 1 3 qkv_input.15 qkv_input.15_splitncnn_0 qkv_input.15_splitncnn_1 qkv_input.15_splitncnn_2
MultiHeadAttention MultiHeadAttention_79 3 1 qkv_input.15_splitncnn_2 qkv_input.15_splitncnn_1 qkv_input.15_splitncnn_0 onnx::Add_232 0=768 1=12 2=589824 3=1
BinaryOp Add_80 2 1 input.80_splitncnn_0 onnx::Add_232 input.84
Split splitncnn_14 1 2 input.84 input.84_splitncnn_0 input.84_splitncnn_1
LayerNorm LayerNorm_81 1 1 input.84_splitncnn_1 A.17 0=768 1=1.000000e-06 3=101
InnerProduct Gemm_82 1 1 A.17 mmdeploy::Gelu_235 0=3072 1=1 2=2359296 8=2
GELU Gelu_83 1 1 mmdeploy::Gelu_235 A.19 0=1
InnerProduct Gemm_84 1 1 A.19 input.96 0=768 1=1 2=2359296 8=2
BinaryOp Add_85 2 1 input.84_splitncnn_0 input.96 input.100
Split splitncnn_15 1 2 input.100 input.100_splitncnn_0 input.100_splitncnn_1
LayerNorm LayerNorm_86 1 1 input.100_splitncnn_1 qkv_input.19 0=768 1=1.000000e-06 3=101
Split splitncnn_16 1 3 qkv_input.19 qkv_input.19_splitncnn_0 qkv_input.19_splitncnn_1 qkv_input.19_splitncnn_2
MultiHeadAttention MultiHeadAttention_95 3 1 qkv_input.19_splitncnn_2 qkv_input.19_splitncnn_1 qkv_input.19_splitncnn_0 onnx::Add_248 0=768 1=12 2=589824 3=1
BinaryOp Add_96 2 1 input.100_splitncnn_0 onnx::Add_248 input.104
Split splitncnn_17 1 2 input.104 input.104_splitncnn_0 input.104_splitncnn_1
LayerNorm LayerNorm_97 1 1 input.104_splitncnn_1 A.21 0=768 1=1.000000e-06 3=101
InnerProduct Gemm_98 1 1 A.21 mmdeploy::Gelu_251 0=3072 1=1 2=2359296 8=2
GELU Gelu_99 1 1 mmdeploy::Gelu_251 A.23 0=1
InnerProduct Gemm_100 1 1 A.23 input.116 0=768 1=1 2=2359296 8=2
BinaryOp Add_101 2 1 input.104_splitncnn_0 input.116 input.120
Split splitncnn_18 1 2 input.120 input.120_splitncnn_0 input.120_splitncnn_1
LayerNorm LayerNorm_102 1 1 input.120_splitncnn_1 qkv_input.23 0=768 1=1.000000e-06 3=101
Split splitncnn_19 1 3 qkv_input.23 qkv_input.23_splitncnn_0 qkv_input.23_splitncnn_1 qkv_input.23_splitncnn_2
MultiHeadAttention MultiHeadAttention_111 3 1 qkv_input.23_splitncnn_2 qkv_input.23_splitncnn_1 qkv_input.23_splitncnn_0 onnx::Add_264 0=768 1=12 2=589824 3=1
BinaryOp Add_112 2 1 input.120_splitncnn_0 onnx::Add_264 input.124
Split splitncnn_20 1 2 input.124 input.124_splitncnn_0 input.124_splitncnn_1
LayerNorm LayerNorm_113 1 1 input.124_splitncnn_1 A.25 0=768 1=1.000000e-06 3=101
InnerProduct Gemm_114 1 1 A.25 mmdeploy::Gelu_267 0=3072 1=1 2=2359296 8=2
GELU Gelu_115 1 1 mmdeploy::Gelu_267 A.27 0=1
InnerProduct Gemm_116 1 1 A.27 input.136 0=768 1=1 2=2359296 8=2
BinaryOp Add_117 2 1 input.124_splitncnn_0 input.136 input.140
Split splitncnn_21 1 2 input.140 input.140_splitncnn_0 input.140_splitncnn_1
LayerNorm LayerNorm_118 1 1 input.140_splitncnn_1 qkv_input.27 0=768 1=1.000000e-06 3=101
Split splitncnn_22 1 3 qkv_input.27 qkv_input.27_splitncnn_0 qkv_input.27_splitncnn_1 qkv_input.27_splitncnn_2
MultiHeadAttention MultiHeadAttention_127 3 1 qkv_input.27_splitncnn_2 qkv_input.27_splitncnn_1 qkv_input.27_splitncnn_0 onnx::Add_280 0=768 1=12 2=589824 3=1
BinaryOp Add_128 2 1 input.140_splitncnn_0 onnx::Add_280 input.144
Split splitncnn_23 1 2 input.144 input.144_splitncnn_0 input.144_splitncnn_1
LayerNorm LayerNorm_129 1 1 input.144_splitncnn_1 A.29 0=768 1=1.000000e-06 3=101
InnerProduct Gemm_130 1 1 A.29 mmdeploy::Gelu_283 0=3072 1=1 2=2359296 8=2
GELU Gelu_131 1 1 mmdeploy::Gelu_283 A.31 0=1
InnerProduct Gemm_132 1 1 A.31 input.156 0=768 1=1 2=2359296 8=2
BinaryOp Add_133 2 1 input.144_splitncnn_0 input.156 input.160
Split splitncnn_24 1 2 input.160 input.160_splitncnn_0 input.160_splitncnn_1
LayerNorm LayerNorm_134 1 1 input.160_splitncnn_1 qkv_input.31 0=768 1=1.000000e-06 3=101
Split splitncnn_25 1 3 qkv_input.31 qkv_input.31_splitncnn_0 qkv_input.31_splitncnn_1 qkv_input.31_splitncnn_2
MultiHeadAttention MultiHeadAttention_143 3 1 qkv_input.31_splitncnn_2 qkv_input.31_splitncnn_1 qkv_input.31_splitncnn_0 onnx::Add_296 0=768 1=12 2=589824 3=1
BinaryOp Add_144 2 1 input.160_splitncnn_0 onnx::Add_296 input.164
Split splitncnn_26 1 2 input.164 input.164_splitncnn_0 input.164_splitncnn_1
LayerNorm LayerNorm_145 1 1 input.164_splitncnn_1 A.33 0=768 1=1.000000e-06 3=101
InnerProduct Gemm_146 1 1 A.33 mmdeploy::Gelu_299 0=3072 1=1 2=2359296 8=2
GELU Gelu_147 1 1 mmdeploy::Gelu_299 A.35 0=1
InnerProduct Gemm_148 1 1 A.35 input.176 0=768 1=1 2=2359296 8=2
BinaryOp Add_149 2 1 input.164_splitncnn_0 input.176 input.180
Split splitncnn_27 1 2 input.180 input.180_splitncnn_0 input.180_splitncnn_1
LayerNorm LayerNorm_150 1 1 input.180_splitncnn_1 qkv_input.35 0=768 1=1.000000e-06 3=101
Split splitncnn_28 1 3 qkv_input.35 qkv_input.35_splitncnn_0 qkv_input.35_splitncnn_1 qkv_input.35_splitncnn_2
MultiHeadAttention MultiHeadAttention_159 3 1 qkv_input.35_splitncnn_2 qkv_input.35_splitncnn_1 qkv_input.35_splitncnn_0 onnx::Add_312 0=768 1=12 2=589824 3=1
BinaryOp Add_160 2 1 input.180_splitncnn_0 onnx::Add_312 input.184
Split splitncnn_29 1 2 input.184 input.184_splitncnn_0 input.184_splitncnn_1
LayerNorm LayerNorm_161 1 1 input.184_splitncnn_1 A.37 0=768 1=1.000000e-06 3=101
InnerProduct Gemm_162 1 1 A.37 mmdeploy::Gelu_315 0=3072 1=1 2=2359296 8=2
GELU Gelu_163 1 1 mmdeploy::Gelu_315 A.39 0=1
InnerProduct Gemm_164 1 1 A.39 input.196 0=768 1=1 2=2359296 8=2
BinaryOp Add_165 2 1 input.184_splitncnn_0 input.196 input.200
Split splitncnn_30 1 2 input.200 input.200_splitncnn_0 input.200_splitncnn_1
LayerNorm LayerNorm_166 1 1 input.200_splitncnn_1 qkv_input.39 0=768 1=1.000000e-06 3=101
Split splitncnn_31 1 3 qkv_input.39 qkv_input.39_splitncnn_0 qkv_input.39_splitncnn_1 qkv_input.39_splitncnn_2
MultiHeadAttention MultiHeadAttention_175 3 1 qkv_input.39_splitncnn_2 qkv_input.39_splitncnn_1 qkv_input.39_splitncnn_0 onnx::Add_328 0=768 1=12 2=589824 3=1
BinaryOp Add_176 2 1 input.200_splitncnn_0 onnx::Add_328 input.204
Split splitncnn_32 1 2 input.204 input.204_splitncnn_0 input.204_splitncnn_1
LayerNorm LayerNorm_177 1 1 input.204_splitncnn_1 A.41 0=768 1=1.000000e-06 3=101
InnerProduct Gemm_178 1 1 A.41 mmdeploy::Gelu_331 0=3072 1=1 2=2359296 8=2
GELU Gelu_179 1 1 mmdeploy::Gelu_331 A.43 0=1
InnerProduct Gemm_180 1 1 A.43 input.216 0=768 1=1 2=2359296 8=2
BinaryOp Add_181 2 1 input.204_splitncnn_0 input.216 input.220
Split splitncnn_33 1 2 input.220 input.220_splitncnn_0 input.220_splitncnn_1
LayerNorm LayerNorm_182 1 1 input.220_splitncnn_1 qkv_input.43 0=768 1=1.000000e-06 3=101
Split splitncnn_34 1 3 qkv_input.43 qkv_input.43_splitncnn_0 qkv_input.43_splitncnn_1 qkv_input.43_splitncnn_2
MultiHeadAttention MultiHeadAttention_191 3 1 qkv_input.43_splitncnn_2 qkv_input.43_splitncnn_1 qkv_input.43_splitncnn_0 onnx::Add_344 0=768 1=12 2=589824 3=1
BinaryOp Add_192 2 1 input.220_splitncnn_0 onnx::Add_344 input.224
Split splitncnn_35 1 2 input.224 input.224_splitncnn_0 input.224_splitncnn_1
LayerNorm LayerNorm_193 1 1 input.224_splitncnn_1 A.45 0=768 1=1.000000e-06 3=101
InnerProduct Gemm_194 1 1 A.45 mmdeploy::Gelu_347 0=3072 1=1 2=2359296 8=2
GELU Gelu_195 1 1 mmdeploy::Gelu_347 A.47 0=1
InnerProduct Gemm_196 1 1 A.47 input.236 0=768 1=1 2=2359296 8=2
BinaryOp Add_197 2 1 input.224_splitncnn_0 input.236 input.240
LayerNorm LayerNorm_198 1 1 input.240 onnx::Gather_351 0=768 1=1.000000e-06 3=1
Crop Gather_200 1 1 onnx::Gather_351 A -23309=1,0 -23310=1,1 -23311=1,0
InnerProduct Gemm_201 1 1 A cls_score 0=1000 1=1 2=768000 8=2
Softmax Softmax_202 1 1 cls_score output
16 changes: 14 additions & 2 deletions docs/developer-guide/operators.md
Original file line number Diff line number Diff line change
Expand Up @@ -971,11 +971,14 @@ y = x * gamma + beta by elementwise
| 0 | affine_size | int | 0 | |
| 1 | eps | float | 0.001f | x = x / sqrt(var + eps) |
| 2 | affine | int | 1 | |
| 3 | int8_scale_term | int | 0 | |

| weight | type | shape |
| ------------- | ----- | --------------------- |
| gamma_data | float | [affine_size] |
| beta_data | float | [affine_size] |
| input_scales | float | [affine_size] |
| output_scale | float | [affine_size] |

# Log
```
Expand Down Expand Up @@ -1084,9 +1087,10 @@ y = affine(out)
| 0 | embed_dim | int | 0 | |
| 1 | num_head | int | 1 | |
| 2 | weight_data_size| int | 0 | |
| 3 | int8_scale_term| int | 0 | |

| weight | type | shape |
| ------------- | ----- | --------------------- |
| weight | type | shape | description |
| ------------- | ----- | --- | --------------------- |
| q_weight_data | float/fp16/int8 | [weight_data_size] |
| q_bias_data | float | [embed_dim] |
| k_weight_data | float/fp16/int8 | [weight_data_size] |
Expand All @@ -1095,6 +1099,14 @@ y = affine(out)
| v_bias_data | float | [embed_dim] |
| out_weight_data| float/fp16/int8 | [weight_data_size] |
| out_bias_data | float | [embed_dim] |
| q_input_scale | float | [1] |
| k_input_scale | float | [1] |
| v_input_scale | float | [1] |
| q_weight_scales | float | [embed_dim] |
| k_weight_scales | float | [embed_dim] |
| v_weight_scales | float | [embed_dim] |
| internal_scales | float | [5] | scales for xq/xk/xv/before_softmax/before_output |


# MVN
```
Expand Down
28 changes: 26 additions & 2 deletions docs/how-to-use-and-FAQ/quantized-int8-inference.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ Some imagenet sample images here https://github.com/nihui/imagenet-sample-images

```shell
find images/ -type f > imagelist.txt
./ncnn2table mobilenet-opt.param mobilenet-opt.bin imagelist.txt mobilenet.table mean=[104,117,123] norm=[0.017,0.017,0.017] shape=[224,224,3] pixel=BGR thread=8 method=kl
./ncnn2table mobilenet-opt.param mobilenet-opt.bin imagelist.txt mobilenet.table mean=[104,117,123] norm=[0.017,0.017,0.017] shape=[224,224,3] pixel=BGR thread=8 method=kl format=txt
```

* mean and norm are the values you passed to ```Mat::substract_mean_normalize()```
Expand All @@ -35,6 +35,7 @@ find images/ -type f > imagelist.txt
* pixel is the pixel format of your model, image pixels will be converted to this type before ```Extractor::input()```
* thread is the CPU thread count that could be used for parallel inference
* method is the post training quantization algorithm, kl and aciq are currently supported
* format is the output file type of quantization parameters, choose `ini` for `txt`. Using `txt` by default

If your model has multiple input nodes, you can use multiple list files and other parameters

Expand All @@ -60,7 +61,7 @@ mobilenet.load_model("mobilenet-int8.bin");

## mixed precision inference

Before quantize your model, comment the layer weight scale line in table file, then the layer will do the float32 inference
Before quantize your model, comment layer weight scale line in the table file with `txt` format, then the layer will do the float32 inference

```
conv1_param_0 156.639840536
Expand All @@ -69,3 +70,26 @@ conv1_param_0 156.639840536
```
#conv1_param_0 156.639840536
```

If you are using `ini` format, just remove whole quantization parameters of the layer, for example:

```
[conv0]
type = "Conv"
weight = [ 156.639840536 ]
input_scale = 1.23

[fire]
type = "Gemm"
weight = [ 156.639840536 ]
input_scale = 1.23
```

to

```
[fire]
type = "Gemm"
weight = [ 156.639840536 ]
input_scale = 1.23
```
1 change: 1 addition & 0 deletions examples/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -65,6 +65,7 @@ if(NCNN_PIXEL)
ncnn_add_example(nanodetplus_pnnx)
ncnn_add_example(scrfd)
ncnn_add_example(scrfd_crowdhuman)
ncnn_add_example(vision_transformer_int8)
if(OpenCV_FOUND)
ncnn_add_example(yolov4)
ncnn_add_example(rvm)
Expand Down
Loading