Skip to content

training scripts for flickr30k #9

@AllenLiang96

Description

@AllenLiang96

In order to reproduce the experiments asap, i use flickr30k datasets as the benchmark and utilize ddp to run the code.
the modified training script for flickr is as follows:
CUDA_VISIBLE_DEVICES=0,1,2,3 python3 train.py \ --data_name f30k --cnn_type resnet152 --wemb_type glove \ --margin 0.2 --max_violation --img_num_embeds 4 --txt_num_embeds 4 \ --img_attention --txt_attention --img_finetune --txt_finetune \ --mmd_weight 0.01 --unif_weight 0.01 \ --batch_size 200 --warm_epoch 0 --num_epochs 80 \ --optimizer adamw --lr_scheduler cosine --lr_step_size 30 --lr_step_gamma 0.1 \ --warm_img --finetune_lr_lower 1 \ --lr 1e-3 --txt_lr_scale 1 --img_pie_lr_scale 0.1 --txt_pie_lr_scale 0.1 \ --eval_on_gpu --sync_bn --amp \ --loss smooth_chamfer --eval_similarity smooth_chamfer --temperature 16 \ --txt_pooling rnn --arch slot --txt_attention_input wemb \ --spm_img_pos_enc_type none --spm_txt_pos_enc_type sine \ --spm_1x1 --spm_residual --spm_residual_norm --spm_residual_activation none \ --spm_activation gelu \ --spm_ff_mult 4 --spm_last_ln \ --img_res_pool max --img_res_first_fc \ --spm_input_dim 1024 --spm_query_dim 1024 \ --spm_depth 4 --spm_weight_sharing \ --remark coco_butd_bigru \ --res_only_norm --img_1x1_dropout 0.1 --spm_pre_norm \ --gpo_1x1 --gpo_rnn \ --weight_decay 1e-4 --grad_clip 1 --lr_warmup -1 --unif_residual \ --workers 4 --dropout 0.1 --caption_drop_prob 0.2 --butd_drop_prob 0.2
But the performance are unsatisfactory, the results are as follows.
image

so what's my problem in this trainning script, i will be very grateful if you can point out my mistake, thank you!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions