Skip to content

finetune models have relative bad performance when using my own base level pretrain models #111

@652994331

Description

@652994331

Hi, guys, thank you for your electra models. Recently, i used my own data to continue pretrain a base-level electra model(this one: https://github.com/ymcui/Chinese-ELECTRA). This pretrain model is a Chinese electra, so it has a different vocab.txt(same as bert base vocab 21128 lines). So, what I have done was: First, I used build_pretrain_dateset.py(21128 vocab) to generate tfrecords. Second, I added init_checkpoint according to this:#74. Third, I pretrained my own base-level electra-chinese model from Chinese-ELETRA, the parameters i used were lr 2e-4, training steps 1000000, base model. command line are like this: python3 run_pretraining.py --data-dir pretrain_chinese_model/ --model-name my_model --init_checkpoint pretrain_chinese_model/models/Chinese-Electra

The loss after 100000 steps was around 3.4. However, I used 100000 steps pretrain_model to finetune a classification model. the performance is much worse than original Chinese_Electra. i was wondering why even 100000steps continue pretrain from Chinese_Electra could make such a bad performance, did I make any mistakes.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions