Skip to content

Should dynamic masking also ignore ['PAD'] #59

@ccchang0111

Description

@ccchang0111

Here is a set of tokens that should not be masked during dynamic masking.

ignore_ids = [vocab["[SEP]"], vocab["[CLS]"], vocab["[MASK]"]]

But should we also avoid masking all those ['PAD'] at the end of a sentence (if the sentence is shorter than max_seq_length and if there is no second sentence segment)?

I understand ['PAD'] itself has token_id = 0, but I do not see this being used to prevent masking in downstream steps. If we do not ignore it, this will affect the probability calculation here

# Get a probability of masking each position in the sequence
candidate_mask_float = tf.cast(candidates_mask, tf.float32)
sample_prob = (proposal_distribution * candidate_mask_float)
sample_prob /= tf.reduce_sum(sample_prob, axis=-1, keepdims=True)

Also, we will be trying to predict 'PAD' that is outside a sequence, which is a bit unintuitive.

Maybe I am missing something here. Thanks again for putting up such a great work!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions