Model Builder: Add Post processing script to convert fp16/32 LM_HEAD to int8 and use tied embeddings #1437

sushraja-msft · 2025-04-30T23:01:43Z

With previous change

one can produce fp16/fp32 lm_head onnx models.

With this additional post processing script such models can be converted to int8 lm_head and also use tied embeddings to save disk space.

Usage:

python PostProcess_LM_HEAD_int8_quantization_and_tied_embeddings.py phi4_fp16_lm_head\model.onnx C:\AI\final_model\model.onnx --block_size 32 --accuracy_level 4 --fp16 --tie_embeddings

…ve int8 lm_head and to tie embeddings to reduce model size

sushraja-msft added 2 commits April 30, 2025 15:48

Add ability to exclude a node from quantization

df988f6

Add script to post process an onnx model with fp16/fp32 lm_head to ha…

e2926e1

…ve int8 lm_head and to tie embeddings to reduce model size

jiafatom mentioned this pull request May 8, 2025

Postprocessing to share lm_head weights to embedding #1461

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Model Builder: Add Post processing script to convert fp16/32 LM_HEAD to int8 and use tied embeddings #1437

Model Builder: Add Post processing script to convert fp16/32 LM_HEAD to int8 and use tied embeddings #1437

Uh oh!

sushraja-msft commented Apr 30, 2025 •

edited

Loading

Uh oh!

Uh oh!

Model Builder: Add Post processing script to convert fp16/32 LM_HEAD to int8 and use tied embeddings #1437

Are you sure you want to change the base?

Model Builder: Add Post processing script to convert fp16/32 LM_HEAD to int8 and use tied embeddings #1437

Uh oh!

Conversation

sushraja-msft commented Apr 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

sushraja-msft commented Apr 30, 2025 •

edited

Loading