Skip to content

Model Builder: Add Post processing script to convert fp16/32 LM_HEAD to int8 and use tied embeddings #1437

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

sushraja-msft
Copy link
Contributor

@sushraja-msft sushraja-msft commented Apr 30, 2025

With previous change

#1436

one can produce fp16/fp32 lm_head onnx models.

With this additional post processing script such models can be converted to int8 lm_head and also use tied embeddings to save disk space.

Usage:

python PostProcess_LM_HEAD_int8_quantization_and_tied_embeddings.py phi4_fp16_lm_head\model.onnx C:\AI\final_model\model.onnx --block_size 32 --accuracy_level 4 --fp16 --tie_embeddings  

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant