-
Notifications
You must be signed in to change notification settings - Fork 18
Open
Description
如果一开始设置了retain token=192,在此处预设每层保留的token数,这样等效的是191
SparseVLMs/llava/model/language_model/score.py
Lines 11 to 14 in 87fe431
| sparse_token_list_192 = [300, 200, 110] if not V2_0 else [300, 200, 118] # 2*576 4*300 10*200 16*110 | |
| sparse_token_list_128 = [303, 110, 36] if not V2_0 else [238, 108, 60] | |
| sparse_token_list_96 = [238, 48, 26] if not V2_0 else [246, 54, 28] | |
| sparse_token_list_64 = [66, 30, 17] if not V2_0 else [66, 34, 20] |
但是计算cluster and merge的话岂不是增加了token数,选取的是未被选择的前30%(定义为merge_token_stage2),cluster成int(merge_token_stage2.shape[1] / 10) + 1
SparseVLMs/llava/model/language_model/modelling_sparse_llama.py
Lines 280 to 290 in 87fe431
| merge_token_idx_stage1 = torch.where(pred_score_vis==0)[1] | |
| merge_token_stage1 = relation_vis_text[0][merge_token_idx_stage1] | |
| merge_token_num_stage1 = int(merge_token_idx_stage1.shape[0] * 0.3 ) + 1 # Top 30% | |
| merge_token_stage2_idx = merge_token_stage1.topk(merge_token_num_stage1)[1] | |
| merge_token_stage2 = total_sparse_token[:,merge_token_stage2_idx,:] | |
| cluster_num = int(merge_token_stage2.shape[1] / 10) + 1 | |
| if (cluster_num == 0) : | |
| cluster_num = merge_token_stage2.shape[1] | |
| merge_sparse_token = cluster_and_merge(merge_token_stage2, cluster_num) |
那在剪枝层增加的token数是
276/10+1=28, (328-200)/10+1=13, (213-110)/10+1=11
那等效token数应该是
另外将token数量设置为固定值,论文原文中3.2节Sparsification Level Adaptation中关于attention score的rank有什么用?
Gp1g
Metadata
Metadata
Assignees
Labels
No labels