W2v2 convdebug#3

Open

taylanbil wants to merge 19 commits intoorig-w2v2from

Owner

taylanbil commented Nov 2, 2020

No description provided.

ultrons and others added 19 commits

October 6, 2020 20:23


          Intermediate experiments wav2vec

7e6f3cf


          input shape temp update

6a03e5c


          clean up

433ca76


          dataset updates

19535a2


          clean up

a500587


          move tensor idx to matrix op inside apply_mask

b49f103


          use tensor operators to replace tensor indexing, passed consistency t…

73e2f3b

…est verification


          Minor improvements

8802a31


          Fix bucketpadlendataset

dc60592


          Moved mask matrices creation to dataset prep.

be3ca6a


          Remove dynamism, apply mask correctly, add some guardrails, some clea…

30f3761

…nups.


          Send device data to cpu b4 logging.

af6d4d6


          Fix data bucketing for RawAudioDataset, refactor bucketing functions,…

9ce9909

… fix filling w/ -inf in wav2vec2, minor cleanups


          Sample size computeation during data prep to reduce atens, dont call …

7cd1be0

…item in log_scalar, minor cleanups


          Remove extra validation atens, clean up marking step and sending to cpu.

9b98663


          Correct loss computation for w2v2 criterion + refactor index_put

91cdca2


          Fix bug in index_put + fix integer division

2c59cec


          Dont call float on extra logs, clean up comment.


          Conv debug attempts.

45bde79

taylanbil requested a review from alexeib

November 2, 2020 20:41

alexeib reviewed

View reviewed changes

fairseq/criterions/wav2vec_criterion.py

+                          loss = F.binary_cross_entropy_with_logits(
+                              logits, target.float(), weights,
+                              reduction="sum" if reduce else "none",
+                              ignore_index=-1,

Collaborator

alexeib Nov 3, 2020

this wont actually work because binary cross entropy with logits does not have ignore index. you need to use reduction="none" here and then zero out the loss coming from unmasked states

fairseq/criterions/wav2vec_criterion.py

-                      sample_size = target.numel() if self.infonce else target.long().sum().item()
+                      if 'sample_size' in sample and self.infonce:
+                          sample_size = sample['sample_size']
+                      elif 'mask_indices' in sample['net_input'] and self.infonce:

Collaborator

alexeib Nov 3, 2020

maybe remove "and self.infonce" part?

fairseq/criterions/wav2vec_criterion.py

                       across workers prior to calling `reduce_metrics`. Setting this
                       to True will improves distributed training speed.
                       """
-                      return False

Collaborator

alexeib Nov 3, 2020

if you keep it at false, do you still see larger accuracy etc (i know we tried 1 node, but still)

fairseq/data/audio/raw_audio_dataset.py

+                          return y.new(0)
+                      (bsz, tsz), fsz = mask_indices.shape, self.args.final_dim
+                      high = mask_indices.sum(-1).max().item()

Collaborator

alexeib Nov 3, 2020

this means that sometimes you will sample negatives from masked timesteps for examples that are shorter than the longest one. why not sample separately per each example in the batch and use correct high for each example?

fairseq/data/audio/raw_audio_dataset.py

+                      if self.n_negatives > 0:
+                          for i in range(1, bsz):
+                              neg_idxs[i] += i * high

Collaborator

alexeib Nov 3, 2020

this is problematic because previously it assumed "high" is the number of timesteps, but we've redefined high above to be something smaller. you need to do neg_idxs[i] += i * tsz here

fairseq/models/wav2vec/wav2vec2.py

-                              neg_idxs = torch.randint(
-                                  low=0, high=high - 1, size=(bsz, self.n_negatives * num)
-                              )
+                              neg_idxs = torch.randint(low=0, high=high-1, size=(bsz, self.n_negatives * tsz))

Collaborator

alexeib Nov 3, 2020

here, for xla, you should loop over bsz and sample negatives individually for each example setting high to be tsz-sum(padding[b]) as we discussed. otherwise you might be sampling negatives from states that are padded. i guess padding[b] would come from padding_counts which is currently not used

fairseq/models/wav2vec/wav2vec2.py

+                      pdb.set_trace()
                       negs = negs.view(
-                          bsz, num, self.n_negatives + self.cross_sample_negatives, fsz
+                          bsz, tsz, self.n_negatives + self.cross_sample_negatives, fsz

Collaborator

alexeib Nov 3, 2020

no need to change this

fairseq/models/wav2vec/wav2vec2.py


		y = self.project_q(y)

		num = y.size(1) if tszs_after_mask is None else max(tszs_after_mask)

Collaborator

alexeib Nov 3, 2020

you dont need to do this, just set it to y.size(1) always

fairseq/models/wav2vec/wav2vec2.py

                           if self.negatives_from_everywhere:
-                              negs, _ = self.sample_negatives(unmasked_features, y.size(1))
+                              negs, _ = self.sample_negatives(
+                                  unmasked_features, num, padding_counts=padding_counts,

Collaborator

alexeib Nov 3, 2020

no need to change to num here

fairseq/models/wav2vec/wav2vec2.py

                           else:
-                              negs, _ = self.sample_negatives(y, y.size(1))
+                              negs, _ = self.sample_negatives(
+                                  y, num,

Collaborator

alexeib Nov 3, 2020

or here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet