Skip to content

关于有效token数量 #33

@betacatZ

Description

@betacatZ

如果一开始设置了retain token=192,在此处预设每层保留的token数,这样等效的是191

$$ (2\times 576+4\times300+10\times200+16\times110)/32=191 $$

sparse_token_list_192 = [300, 200, 110] if not V2_0 else [300, 200, 118] # 2*576 4*300 10*200 16*110
sparse_token_list_128 = [303, 110, 36] if not V2_0 else [238, 108, 60]
sparse_token_list_96 = [238, 48, 26] if not V2_0 else [246, 54, 28]
sparse_token_list_64 = [66, 30, 17] if not V2_0 else [66, 34, 20]

但是计算cluster and merge的话岂不是增加了token数,选取的是未被选择的前30%(定义为merge_token_stage2),cluster成int(merge_token_stage2.shape[1] / 10) + 1
merge_token_idx_stage1 = torch.where(pred_score_vis==0)[1]
merge_token_stage1 = relation_vis_text[0][merge_token_idx_stage1]
merge_token_num_stage1 = int(merge_token_idx_stage1.shape[0] * 0.3 ) + 1 # Top 30%
merge_token_stage2_idx = merge_token_stage1.topk(merge_token_num_stage1)[1]
merge_token_stage2 = total_sparse_token[:,merge_token_stage2_idx,:]
cluster_num = int(merge_token_stage2.shape[1] / 10) + 1
if (cluster_num == 0) :
cluster_num = merge_token_stage2.shape[1]
merge_sparse_token = cluster_and_merge(merge_token_stage2, cluster_num)

那在剪枝层增加的token数是

276/10+1=28, (328-200)/10+1=13, (213-110)/10+1=11

那等效token数应该是

$$ (2\times576+4\times328+10\times213+16\times121)/32=204 $$

另外将token数量设置为固定值,论文原文中3.2节Sparsification Level Adaptation中关于attention score的rank有什么用?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions