Hi ! For all datasets in dialoGLUE benchmark, I can reproduce similar results except for the MultiWOZ.
For ConverBERT-DG, your joint goal is around 58, but I can only get 56, which is the same as the original Trippy reported.
I wonder if you have used different hyper-parameters for Trippy? If so, can you share them ?
Thank you!
The original hypers for Trippy are as follows:
--do_lower_case \ --learning_rate=1e-4 \ --num_train_epochs=10 \ --max_seq_length=180 \ --per_gpu_train_batch_size=48 \ --per_gpu_eval_batch_size=1 \ --output_dir=${OUT_DIR} \ --save_epochs=2 \ --logging_steps=10 \ --warmup_proportion=0.1 \ --eval_all_checkpoints \ --adam_epsilon=1e-6 \ --label_value_repetitions \ --swap_utterances \ --append_history \ --use_history_labels \ --delexicalize_sys_utts \ --class_aux_feats_inform \ --class_aux_feats_ds \
Hi ! For all datasets in dialoGLUE benchmark, I can reproduce similar results except for the MultiWOZ.
For ConverBERT-DG, your joint goal is around 58, but I can only get 56, which is the same as the original Trippy reported.
I wonder if you have used different hyper-parameters for Trippy? If so, can you share them ?
Thank you!
The original hypers for Trippy are as follows:
--do_lower_case \ --learning_rate=1e-4 \ --num_train_epochs=10 \ --max_seq_length=180 \ --per_gpu_train_batch_size=48 \ --per_gpu_eval_batch_size=1 \ --output_dir=${OUT_DIR} \ --save_epochs=2 \ --logging_steps=10 \ --warmup_proportion=0.1 \ --eval_all_checkpoints \ --adam_epsilon=1e-6 \ --label_value_repetitions \ --swap_utterances \ --append_history \ --use_history_labels \ --delexicalize_sys_utts \ --class_aux_feats_inform \ --class_aux_feats_ds \