-
Notifications
You must be signed in to change notification settings - Fork 51
Description

I tried the quantization configuration in #4 and #5, i.e. v6k4096, but could not reproduce the similar results in the llama2 13B model. I also used the script configuration in the readme, i.e. v8k65536res256, but could only get poor ppl results. How can I reproduce the unfinetuned results in Table 10?
Here are the scripts and results of the algorithm branch I used:
python run_vptq.py --model_name /Llama-2-13b-hf --output_dir outputs/llama2_13b_v6k4096 --vector_lens -1 6 --group_num 1 --num_centroids -1 4096 --num_res_centroids -1 -1 --npercent 0 --blocksize 128 --new_eval --seq_len 4096 --kmeans_mode hessian --num_gpus 1 --enable_perm --enable_norm --save_model --save_packed_model --hessian_path /hess/llama2_13b_6144/ --inv_hessian_path /invhess/llama2_13b_6144/ --ktol 1e-5 --kiter 100
results:
"wikitext2": 253.8889923095703,
"c4-new": 180.23741149902344
python run_vptq.py --model_name /Llama-2-13b-hf --output_dir outputs/llama2_13b_v8k65536r256 --vector_lens -1 8 --group_num 1 --num_centroids -1 65536 --num_res_centroids -1 -256 --npercent 0 --blocksize 128 --new_eval --seq_len 4096 --kmeans_mode hessian --num_gpus 1 --enable_perm --enable_norm --save_model --save_packed_model --hessian_path /hess/llama2_13b_6144/ --inv_hessian_path /invhess/llama2_13b_6144/ --ktol 1e-5 --kiter 100
results:
"wikitext2": 71.66665649414062,
"c4-new": 64.28524017333984
Thanks and look forward to your answer.