-
Notifications
You must be signed in to change notification settings - Fork 349
Description
Here is a set of tokens that should not be masked during dynamic masking.
electra/pretrain/pretrain_helpers.py
Line 121 in 7911132
| ignore_ids = [vocab["[SEP]"], vocab["[CLS]"], vocab["[MASK]"]] |
But should we also avoid masking all those ['PAD'] at the end of a sentence (if the sentence is shorter than max_seq_length and if there is no second sentence segment)?
I understand ['PAD'] itself has token_id = 0, but I do not see this being used to prevent masking in downstream steps. If we do not ignore it, this will affect the probability calculation here
electra/pretrain/pretrain_helpers.py
Lines 167 to 170 in 7911132
| # Get a probability of masking each position in the sequence | |
| candidate_mask_float = tf.cast(candidates_mask, tf.float32) | |
| sample_prob = (proposal_distribution * candidate_mask_float) | |
| sample_prob /= tf.reduce_sum(sample_prob, axis=-1, keepdims=True) |
Also, we will be trying to predict 'PAD' that is outside a sequence, which is a bit unintuitive.
Maybe I am missing something here. Thanks again for putting up such a great work!