Questions about discriminative_fine_tuning

In Section 5.4.3  " We find that assign a lower learn- ing rate to the lower layer is effective to fine-tuning BERT, and an appropriate setting is ξ=0.95 and lr=2.0e-5." 
Compared to the code in [https://github.com/xuyige/BERT4doc-Classification/blob/master/codes/fine-tuning/run_classifier.py#L812](url
)
Seem that you divide the bert layer into 3 part (4 layers for one part) and set different learning rate for each part.
Some questions about it:
1. How could the decay factor 0.95  match the number 2.6 in code ?
2. And the last classify layer seem not be contained , no need to set lr for it ? 


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Questions about discriminative_fine_tuning #5

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Questions about discriminative_fine_tuning #5

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions