Skip to content

W2v2 convdebug#3

Open
taylanbil wants to merge 19 commits intoorig-w2v2from
w2v2-convdebug
Open

W2v2 convdebug#3
taylanbil wants to merge 19 commits intoorig-w2v2from
w2v2-convdebug

Conversation

@taylanbil
Copy link
Copy Markdown
Owner

No description provided.

@taylanbil taylanbil requested a review from alexeib November 2, 2020 20:41
loss = F.binary_cross_entropy_with_logits(
logits, target.float(), weights,
reduction="sum" if reduce else "none",
ignore_index=-1,
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this wont actually work because binary cross entropy with logits does not have ignore index. you need to use reduction="none" here and then zero out the loss coming from unmasked states

sample_size = target.numel() if self.infonce else target.long().sum().item()
if 'sample_size' in sample and self.infonce:
sample_size = sample['sample_size']
elif 'mask_indices' in sample['net_input'] and self.infonce:
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe remove "and self.infonce" part?

across workers prior to calling `reduce_metrics`. Setting this
to True will improves distributed training speed.
"""
return False
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you keep it at false, do you still see larger accuracy etc (i know we tried 1 node, but still)

return y.new(0)

(bsz, tsz), fsz = mask_indices.shape, self.args.final_dim
high = mask_indices.sum(-1).max().item()
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this means that sometimes you will sample negatives from masked timesteps for examples that are shorter than the longest one. why not sample separately per each example in the batch and use correct high for each example?


if self.n_negatives > 0:
for i in range(1, bsz):
neg_idxs[i] += i * high
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is problematic because previously it assumed "high" is the number of timesteps, but we've redefined high above to be something smaller. you need to do neg_idxs[i] += i * tsz here

neg_idxs = torch.randint(
low=0, high=high - 1, size=(bsz, self.n_negatives * num)
)
neg_idxs = torch.randint(low=0, high=high-1, size=(bsz, self.n_negatives * tsz))
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here, for xla, you should loop over bsz and sample negatives individually for each example setting high to be tsz-sum(padding[b]) as we discussed. otherwise you might be sampling negatives from states that are padded. i guess padding[b] would come from padding_counts which is currently not used

pdb.set_trace()
negs = negs.view(
bsz, num, self.n_negatives + self.cross_sample_negatives, fsz
bsz, tsz, self.n_negatives + self.cross_sample_negatives, fsz
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no need to change this


y = self.project_q(y)

num = y.size(1) if tszs_after_mask is None else max(tszs_after_mask)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you dont need to do this, just set it to y.size(1) always

if self.negatives_from_everywhere:
negs, _ = self.sample_negatives(unmasked_features, y.size(1))
negs, _ = self.sample_negatives(
unmasked_features, num, padding_counts=padding_counts,
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no need to change to num here

else:
negs, _ = self.sample_negatives(y, y.size(1))
negs, _ = self.sample_negatives(
y, num,
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants