Enable w2v2 tpu by taylanbil · Pull Request #7 · taylanbil/fairseq

taylanbil · 2021-02-16T22:32:44Z

commits are separated to logical bits and organized, w/ informational messages.

Summary: # Before submitting - [ ] Was this discussed/approved via a Github issue? (no need for typos, doc improvements) - [ ] Did you read the [contributor guideline](https://github.com/pytorch/fairseq/blob/master/CONTRIBUTING.md)? - [ ] Did you make sure to update the docs? - [ ] Did you write any new necessary tests? ## What does this PR do? Fixes facebookresearch#3227 All models that do **not** make use of group norm, such as - Wav2Vec 2.0 Large (LV-60)* - Wav2Vec 2.0 Large (LV-60) + Self Training * do need this fix IMO to able to correctly run batches through the model. Before this PR, the following code snippet failed: ```python import fairseq import torch # get model wav2vec_path = "data/wav2vec2_vox_960h_new.pt" model, cfg, task = fairseq.checkpoint_utils.load_model_ensemble_and_task( [wav2vec_path], arg_overrides={"data": "./data"} ) model = model[0] model.eval() # create single input input_wav_0 = torch.randn((1, 2000)) input_wav_1 = torch.randn((1, 3000)) # create batched input batch_input_wav = torch.zeros((2, 3000)) batch_input_wav[0, :input_wav_0.shape[-1]] = input_wav_0 batch_input_wav[1, :input_wav_1.shape[-1]] = input_wav_1 # create padding mask padding_mask = torch.zeros((2, 3000), dtype=torch.bool) padding_mask[0, input_wav_0.shape[-1]:] = True # run batch & single output = model(source=input_wav_0, padding_mask=None)["encoder_out"] batch_output = model(source=batch_input_wav, padding_mask=padding_mask)["encoder_out"] # is equal? print("Is batched forward and simple forward equal?", torch.allclose(output[:,0], batch_output[:output.shape[0], 0], atol=1e-3)) ``` Note: It is assumed that both https://dl.fbaipublicfiles.com/fairseq/wav2vec/dict.ltr.txt and https://dl.fbaipublicfiles.com/fairseq/wav2vec/wav2vec2_vox_960h_new.pt were downloaded and stored in the folder data. Also, see [this](https://colab.research.google.com/drive/1ASZ4lVZbKkj-dvRHDl1lo0mCcsaOERlG?usp=sharing) notebook for reproducibility. This PR should fix the behavior and make the above code snippet / notebook run succesfully. ## PR review Gently pinging alexeib for Wav2Vec2 Anyone in the community is free to review the PR once the tests have passed. If we didn't discuss your PR in Github issues there's a high chance it will not be merged. ## Did you have fun? Make sure you had fun coding � Pull Request resolved: facebookresearch#3228 Reviewed By: aconneau Differential Revision: D26373721 Pulled By: alexeib fbshipit-source-id: 3d5aca2f8136d1a8c4b5b4bc9c03cd05a69a3b52

Reviewed By: myleott, chtran Differential Revision: D26348808 fbshipit-source-id: 010ef00024e02c09ec35b624f0713ce5f1f387b4

Summary: At the start of the half there were some expired handles and it was annoying to track down which datasets were responsible when sampling data among multiple datasets and which flows were running them. Lets improve the error message to address several pain points 1. Explicitly tell the user which dataset has expired handles 2. Link to a scuba query to enable the user to find all flows that have expired handles 3. Fail job if 10k handles have expired, rather than if 10k handles in a row have expired. This can detect failures from datasets that have for example 50% expired handles 4. add logging when handles fail Reviewed By: cruvadom Differential Revision: D26187820 fbshipit-source-id: 771a359ea01de80b38932921346e98cff812f2f7

Summary: fairscale.nn.Pipe has been ported to PyTorch: https://github.com/pytorch/pytorch/blob/master/torch/distributed/pipeline/sync/pipe.py#L138. As a result, modifying the pipeline transformer to use PyTorch pipe if available. This change depends on pytorch/pytorch#50860. Pull Request resolved: facebookresearch#3149 Test Plan: ``` python train.py ru_en_bin/ --arch transformer_iwslt_de_en_pipeline_parallel --share-decoder-input-output-embed --optimizer adam --adam-betas '(0.9, 0.98)' --clip-norm 0.0 --lr 5e-4 --lr-scheduler inverse_sqrt --warmup-updates 4000 --dropout 0.3 --weight-decay 0.0001 --criterion label_smoothed_cross_entropy --label-smoothing 0.1 --max-tokens 4096 --eval-bleu --eval-bleu-args '{"beam": 5, "max_len_a": 1.2, "max_len_b": 10}' --eval-bleu-detok moses --eval-bleu-remove-bpe --eval-bleu-print-samples --best-checkpoint-metric bleu --maximize-best-checkpoint-metric --pipeline-model-parallel --pipeline-balance '[1,3,5,3,3,1]' --pipeline-devices '[0,1,0,2,3,0]' --pipeline-chunks 16 --distributed-world-size 1 --distributed-no-spawn --disable-validation --max-epoch 1 ``` Output with torch pipe: ``` 2021-01-20 16:13:35 | INFO | train | epoch 001 | loss 12.676 | nll_loss 12.331 | ppl 5151.97 | wps 5108 | ups 1.66 | wpb 3081.6 | bsz 131.6 | num_updates 380 | lr 4.75e-05 | gnorm 2.08 | train_wall 229 | wall 233 2021-01-20 16:13:36 | INFO | fairseq_cli.train | done training in 233.1 seconds ``` Output with fairscale pipe: ``` 2021-01-20 14:13:59 | INFO | train | epoch 001 | loss 12.677 | nll_loss 12.331 | ppl 5152.07 | wps 5198.9 | ups 1.69 | wpb 3081.6 | bsz 131.6 | num_updates 380 | lr 4.75e-05 | gnorm 2.08 | train_wall 224 | wall 228 2021-01-20 14:13:59 | INFO | fairseq_cli.train | done training in 228.0 seconds ``` Reviewed By: myleott Differential Revision: D26204633 Pulled By: shruti-bh fbshipit-source-id: 535f816e8d149b47fc6ba8385981accf67257257

…ch#3231) Summary: More informative exception when numpy version changes to ask the user to recompile Cython files # Before submitting - [With myleott ] Was this discussed/approved via a Github issue? (no need for typos, doc improvements) - [x] Did you read the [contributor guideline](https://github.com/pytorch/fairseq/blob/master/CONTRIBUTING.md)? - [N/A ] Did you make sure to update the docs? - [N/A ] Did you write any new necessary tests? ## What does this PR do? Raises a more informative error to tell the user to recompile Cython files after an update to the numpy version. ## PR review Anyone in the community is free to review the PR once the tests have passed. If we didn't discuss your PR in Github issues there's a high chance it will not be merged. ## Did you have fun? Make sure you had fun coding � Pull Request resolved: facebookresearch#3231 Reviewed By: myleott Differential Revision: D26375174 Pulled By: mwillwork fbshipit-source-id: f0a93e162bc4cf84619581110d21bea907baf7fc

Summary: this allows tasks to declare some properties they'd like to save in the checkpoint (such as a dictionary), which are loaded when checkpoint is restored. Pull Request resolved: fairinternal/fairseq-py#1562 Test Plan: tested by training a new wav2vec model, then finetuning it, then decoding it and making sure the dict only loaded once, during fine tuning process (and was obtained from checkpoint for decoding) Reviewed By: myleott, gwenzek Differential Revision: D25937974 Pulled By: alexeib fbshipit-source-id: b9908042f76ec8cda943f33885eb9b1f121662ae

Summary: - I don't think there is a convention for the shapes of `encoder_out` and `encoder_padding_mask` in fairseq but `fst_external_decoder.py` expects `encoder_padding_mask` to be of shape T x B. `encoder_padding_mask` also seems unused in the fairseq [CTC criterion and w2l decoder integration](https://fburl.com/diffusion/ms1zi2px) so taking the easy way out and changing its shape. - Also checking in some changes to the pyspeech audio_pretraining task required to make decoding work Reviewed By: alexeib Differential Revision: D26382442 fbshipit-source-id: 87c8f9433026c0e011847f4e2e094beb2cd2182c

Summary: fixes fairseqlm integration with flashlight (formerly wav2letter) decoder Pull Request resolved: fairinternal/fairseq-py#1617 Reviewed By: xuqiantong Differential Revision: D26415650 Pulled By: alexeib fbshipit-source-id: 813684ba55047e92378f508101ff1eec55754420

Summary: raise an exception if trying to use wav2vec seq2seq finetuning without autoregressive flag Pull Request resolved: fairinternal/fairseq-py#1618 Reviewed By: xuqiantong Differential Revision: D26417249 Pulled By: alexeib fbshipit-source-id: 777b6d170b0f8196746e03b399e4d7c21ac0b837

Summary: # Before submitting - [ ] Was this discussed/approved via a Github issue? (no need for typos, doc improvements) - [ ] Did you read the [contributor guideline](https://github.com/pytorch/fairseq/blob/master/CONTRIBUTING.md)? - [ ] Did you make sure to update the docs? - [ ] Did you write any new necessary tests? ## What does this PR do? Fixes # (issue). ## PR review Anyone in the community is free to review the PR once the tests have passed. If we didn't discuss your PR in Github issues there's a high chance it will not be merged. ## Did you have fun? Make sure you had fun coding � Pull Request resolved: facebookresearch#3240 Reviewed By: aconneau Differential Revision: D26420073 Pulled By: alexeib fbshipit-source-id: 5939535b945a64e61d655cd36dc955ae46410bfb

Summary: somehow merging previous pull request deleted the readme Pull Request resolved: fairinternal/fairseq-py#1621 Reviewed By: michaelauli Differential Revision: D26429893 Pulled By: alexeib fbshipit-source-id: 3e6ed1e4698e67e56e0b88d304f42907a4f6cf41

Summary: OSS removed the 'partition' key in their state dict to accommodate for changing partition size. This requires an update on the fairseq side to not look into the parameter partition, just broadcast everything, and let the optimizer on each rank decides which parameters are relevant. This diff also needs D26419095 to function completely, and blefaudeux has made fixes upstream in facebookresearch/fairscale#383 Reviewed By: myleott Differential Revision: D26382917 fbshipit-source-id: 95af1022be59e88814748acaee36a1a350f7dc5b

…s. (facebookresearch#3237) Summary: …ith BLEU scores # Before submitting - [no] Was this discussed/approved via a Github issue? (no need for typos, doc improvements) - [yes] Did you read the [contributor guideline](https://github.com/pytorch/fairseq/blob/master/CONTRIBUTING.md)? - [no need] Did you make sure to update the docs? - [no need] Did you write any new necessary tests? ## What does this PR do? Fixes bugs of evaluation with BLEU score when training with multi-gpus. But no error will happend if there is no distributed training. when --eval-bleu is set to be `True` (default it is `False` and the best checkpoint is selected according to loss) and training with multi-gpus (when the number of gpu which participate in distributed training is greater than 1), following error will happend. ```bash Traceback (most recent call last): Traceback (most recent call last): File "/data/cordercorder/anaconda3/envs/nmt/bin/fairseq-train", line 33, in <module> File "/data/cordercorder/anaconda3/envs/nmt/bin/fairseq-train", line 33, in <module> Traceback (most recent call last): File "/data/cordercorder/anaconda3/envs/nmt/bin/fairseq-train", line 33, in <module> sys.exit(load_entry_point('fairseq', 'console_scripts', 'fairseq-train')())sys.exit(load_entry_point('fairseq', 'console_scripts', 'fairseq-train')()) File "/data1/cordercorder/fairseq/fairseq_cli/train.py", line 450, in cli_main File "/data1/cordercorder/fairseq/fairseq_cli/train.py", line 450, in cli_main sys.exit(load_entry_point('fairseq', 'console_scripts', 'fairseq-train')()) File "/data1/cordercorder/fairseq/fairseq_cli/train.py", line 450, in cli_main distributed_utils.call_main(cfg, main)distributed_utils.call_main(cfg, main) File "/data1/cordercorder/fairseq/fairseq/distributed/utils.py", line 349, in call_main File "/data1/cordercorder/fairseq/fairseq/distributed/utils.py", line 349, in call_main distributed_utils.call_main(cfg, main) File "/data1/cordercorder/fairseq/fairseq/distributed/utils.py", line 349, in call_main distributed_main(cfg.distributed_training.device_id, main, cfg, kwargs) distributed_main(cfg.distributed_training.device_id, main, cfg, kwargs) File "/data1/cordercorder/fairseq/fairseq/distributed/utils.py", line 326, in distributed_main File "/data1/cordercorder/fairseq/fairseq/distributed/utils.py", line 326, in distributed_main distributed_main(cfg.distributed_training.device_id, main, cfg, kwargs) File "/data1/cordercorder/fairseq/fairseq/distributed/utils.py", line 326, in distributed_main main(cfg, **kwargs) File "/data1/cordercorder/fairseq/fairseq_cli/train.py", line 143, in main main(cfg, **kwargs) main(cfg, **kwargs)rder/fairseq/fairseq_cli/train.py", line 143, in main File "/data1/cordercorder/fairseq/fairseq_cli/train.py", line 143, in main valid_losses, should_stop = train(cfg, trainer, task, epoch_itr) File "/data/cordercorder/anaconda3/envs/nmt/lib/python3.7/contextlib.py", line 74, in inner valid_losses, should_stop = train(cfg, trainer, task, epoch_itr) File "/data/cordercorder/anaconda3/envs/nmt/lib/python3.7/contextlib.py", line 74, in inner valid_losses, should_stop = train(cfg, trainer, task, epoch_itr) File "/data/cordercorder/anaconda3/envs/nmt/lib/python3.7/contextlib.py", line 74, in inner return func(*args, **kwds) File "/data1/cordercorder/fairseq/fairseq_cli/train.py", line 259, in train Traceback (most recent call last): File "/data/cordercorder/anaconda3/envs/nmt/bin/fairseq-train", line 33, in <module> return func(*args, **kwds) return func(*args, **kwds) File "/data1/cordercorder/fairseq/fairseq_cli/train.py", line 259, in train File "/data1/cordercorder/fairseq/fairseq_cli/train.py", line 259, in train cfg, trainer, task, epoch_itr, valid_subsets, end_of_epoch File "/data1/cordercorder/fairseq/fairseq_cli/train.py", line 345, in validate_and_save cfg, trainer, task, epoch_itr, valid_subsets, end_of_epoch File "/data1/cordercorder/fairseq/fairseq_cli/train.py", line 345, in validate_and_save cfg, trainer, task, epoch_itr, valid_subsets, end_of_epochsys.exit(load_entry_point('fairseq', 'console_scripts', 'fairseq-train')()) File "/data1/cordercorder/fairseq/fairseq_cli/train.py", line 345, in validate_and_save File "/data1/cordercorder/fairseq/fairseq_cli/train.py", line 450, in cli_main valid_losses = validate(cfg, trainer, task, epoch_itr, valid_subsets) File "/data1/cordercorder/fairseq/fairseq_cli/train.py", line 413, in validate valid_losses = validate(cfg, trainer, task, epoch_itr, valid_subsets) File "/data1/cordercorder/fairseq/fairseq_cli/train.py", line 413, in validate valid_losses = validate(cfg, trainer, task, epoch_itr, valid_subsets) File "/data1/cordercorder/fairseq/fairseq_cli/train.py", line 413, in validate trainer.valid_step(sample) File "/data/cordercorder/anaconda3/envs/nmt/lib/python3.7/contextlib.py", line 74, in inner distributed_utils.call_main(cfg, main) File "/data1/cordercorder/fairseq/fairseq/distributed/utils.py", line 349, in call_main trainer.valid_step(sample) File "/data/cordercorder/anaconda3/envs/nmt/lib/python3.7/contextlib.py", line 74, in inner return func(*args, **kwds) File "/data1/cordercorder/fairseq/fairseq/trainer.py", line 834, in valid_step trainer.valid_step(sample) File "/data/cordercorder/anaconda3/envs/nmt/lib/python3.7/contextlib.py", line 74, in inner return func(*args, **kwds) File "/data1/cordercorder/fairseq/fairseq/trainer.py", line 834, in valid_step return func(*args, **kwds)distributed_main(cfg.distributed_training.device_id, main, cfg, kwargs) File "/data1/cordercorder/fairseq/fairseq/trainer.py", line 834, in valid_step File "/data1/cordercorder/fairseq/fairseq/distributed/utils.py", line 326, in distributed_main main(cfg, **kwargs) File "/data1/cordercorder/fairseq/fairseq_cli/train.py", line 143, in main logging_output = self._reduce_and_log_stats(logging_outputs, sample_size) File "/data1/cordercorder/fairseq/fairseq/trainer.py", line 1157, in _reduce_and_log_stats logging_output = self._reduce_and_log_stats(logging_outputs, sample_size) File "/data1/cordercorder/fairseq/fairseq/trainer.py", line 1157, in _reduce_and_log_stats valid_losses, should_stop = train(cfg, trainer, task, epoch_itr) File "/data/cordercorder/anaconda3/envs/nmt/lib/python3.7/contextlib.py", line 74, in inner logging_output = self._reduce_and_log_stats(logging_outputs, sample_size) File "/data1/cordercorder/fairseq/fairseq/trainer.py", line 1157, in _reduce_and_log_stats return func(*args, **kwds) File "/data1/cordercorder/fairseq/fairseq_cli/train.py", line 259, in train cfg, trainer, task, epoch_itr, valid_subsets, end_of_epoch File "/data1/cordercorder/fairseq/fairseq_cli/train.py", line 345, in validate_and_save self.task.reduce_metrics(logging_outputs, self.get_criterion()) File "/data1/cordercorder/fairseq/fairseq/tasks/translation.py", line 410, in reduce_metrics self.task.reduce_metrics(logging_outputs, self.get_criterion())valid_losses = validate(cfg, trainer, task, epoch_itr, valid_subsets) File "/data1/cordercorder/fairseq/fairseq/tasks/translation.py", line 410, in reduce_metrics File "/data1/cordercorder/fairseq/fairseq_cli/train.py", line 413, in validate self.task.reduce_metrics(logging_outputs, self.get_criterion()) File "/data1/cordercorder/fairseq/fairseq/tasks/translation.py", line 410, in reduce_metrics metrics.log_scalar("_bleu_counts", np.array(counts)) File "/data/cordercorder/anaconda3/envs/nmt/lib/python3.7/site-packages/torch/tensor.py", line 480, in __array__ trainer.valid_step(sample) File "/data/cordercorder/anaconda3/envs/nmt/lib/python3.7/contextlib.py", line 74, in inner metrics.log_scalar("_bleu_counts", np.array(counts)) File "/data/cordercorder/anaconda3/envs/nmt/lib/python3.7/site-packages/torch/tensor.py", line 480, in __array__ return func(*args, **kwds)metrics.log_scalar("_bleu_counts", np.array(counts)) File "/data1/cordercorder/fairseq/fairseq/trainer.py", line 834, in valid_step File "/data/cordercorder/anaconda3/envs/nmt/lib/python3.7/site-packages/torch/tensor.py", line 480, in __array__ return self.numpy() TypeError: can't convert cuda:2 device type tensor to numpy. Use Tensor.cpu() to copy the tensor to host memory first. return self.numpy() TypeError: can't convert cuda:3 device type tensor to numpy. Use Tensor.cpu() to copy the tensor to host memory first. return self.numpy() TypeError: can't convert cuda:1 device type tensor to numpy. Use Tensor.cpu() to copy the tensor to host memory first. logging_output = self._reduce_and_log_stats(logging_outputs, sample_size) File "/data1/cordercorder/fairseq/fairseq/trainer.py", line 1157, in _reduce_and_log_stats self.task.reduce_metrics(logging_outputs, self.get_criterion()) File "/data1/cordercorder/fairseq/fairseq/tasks/translation.py", line 410, in reduce_metrics metrics.log_scalar("_bleu_counts", np.array(counts)) File "/data/cordercorder/anaconda3/envs/nmt/lib/python3.7/site-packages/torch/tensor.py", line 480, in __array__ return self.numpy() TypeError: can't convert cuda:0 device type tensor to numpy. Use Tensor.cpu() to copy the tensor to host memory first. Traceback (most recent call last): File "/data/cordercorder/anaconda3/envs/nmt/lib/python3.7/runpy.py", line 193, in _run_module_as_main "__main__", mod_spec) File "/data/cordercorder/anaconda3/envs/nmt/lib/python3.7/runpy.py", line 85, in _run_code exec(code, run_globals) File "/data/cordercorder/anaconda3/envs/nmt/lib/python3.7/site-packages/torch/distributed/launch.py", line 261, in <module> main() File "/data/cordercorder/anaconda3/envs/nmt/lib/python3.7/site-packages/torch/distributed/launch.py", line 257, in main cmd=cmd) subprocess.CalledProcessError: Command '['/data/cordercorder/anaconda3/envs/nmt/bin/python', '-u', '/data/cordercorder/anaconda3/envs/nmt/bin/fairseq-train', '--local_rank=3', 'tiny_data_bin', '--distributed-world-size', '4', '--arch', 'transformer', '--share-decoder-input-output-embed', '--optimizer', 'adam', '--adam-betas', '(0.9, 0.98)', '--clip-norm', '0.0', '--lr-scheduler', 'inverse_sqrt', '--warmup-init-lr', '1e-07', '--warmup-updates', '3000', '--lr', '0.0005', '--stop-min-lr', '1e-09', '--dropout', '0.25', '--weight-decay', '0.0001', '--criterion', 'label_smoothed_cross_entropy', '--label-smoothing', '0.1', '--max-tokens', '5000', '--batch-size', '64', '--update-freq', '4', '--max-epoch', '30', '--save-dir', 'checkpoint', '--skip-invalid-size-inputs-valid-test', '--eval-bleu', '--eval-bleu-args', '{"beam": 5}', '--eval-bleu-remove-bpe', 'sentencepiece', '--eval-bleu-print-samples', '--eval-tokenized-bleu', '--best-checkpoint-metric', 'bleu', '--maximize-best-checkpoint-metric', '--validate-interval-updates', '1']' returned non-zero exit status 1. ``` The error is cased by the fact that the numpy of version 1.20.1 does't support codes like following: ```python import torch import numpy as np a = torch.tensor(0, device="cuda:0") b = np.array([a]) ``` The above codes will lead to error: "TypeError: can't convert cuda:0 device type tensor to numpy. Use Tensor.cpu() to copy the tensor to host memory first.", but the codes run well if the numpy version is 1.18.1 or 1.17.0 (when the numpy version is below 1.20.0, it is ok, I guess). However, it seems like that the latest version of fairseq need a numpy package of version 1.20.0 or higher (issue facebookresearch#3203 ). ### Reproduce the error Download the source code of fairseq (commit ID: 7061a0f) and run following code: ```bash export CUDA_VISIBLE_DEVICES=0,1,2,3 data_bin_dir=tiny_data_bin python -m torch.distributed.launch --nproc_per_node=4 \ --master_addr="127.0.0.1" \ --master_port=12345 \ $(which fairseq-train) ${data_bin_dir} \ --distributed-world-size 4 \ --arch transformer \ --share-decoder-input-output-embed \ --optimizer adam \ --adam-betas '(0.9, 0.98)' \ --clip-norm 0.0 \ --lr-scheduler inverse_sqrt \ --warmup-init-lr 1e-07 \ --warmup-updates 3000 \ --lr 0.0005 \ --stop-min-lr 1e-09 \ --dropout 0.25 \ --weight-decay 0.0001 \ --criterion label_smoothed_cross_entropy \ --label-smoothing 0.1 \ --max-tokens 5000 \ --batch-size 64 \ --update-freq 4 \ --max-epoch 30 \ --save-dir checkpoint \ --skip-invalid-size-inputs-valid-test \ --eval-bleu \ --eval-bleu-args '{"beam": 5}' \ --eval-bleu-remove-bpe sentencepiece \ --eval-bleu-print-samples \ --eval-tokenized-bleu \ --best-checkpoint-metric bleu \ --maximize-best-checkpoint-metric \ --validate-interval-updates 1 ``` ## PR review Anyone in the community is free to review the PR once the tests have passed. If we didn't discuss your PR in Github issues there's a high chance it will not be merged. ## Did you have fun? Make sure you had fun coding � Pull Request resolved: facebookresearch#3237 Reviewed By: myleott Differential Revision: D26429732 Pulled By: alexeib fbshipit-source-id: bc887ce952d28541cb07dbbdc7e80e99428a6b34

Summary: fixes previous change that changes state/dataset/etc to class variables instead of instance variables Pull Request resolved: fairinternal/fairseq-py#1623 Reviewed By: michaelauli Differential Revision: D26439560 Pulled By: alexeib fbshipit-source-id: ab9e75a425a47ac7ace006419259e254770e560e

…coder (facebookresearch#1559) Summary: Pull Request resolved: fairinternal/fairseq-py#1559 This matches the behavior of RobertaEncoder. Test Plan: Imported from OSS Reviewed By: gwenzek Differential Revision: D25936937 Pulled By: myleott fbshipit-source-id: 795ec8d50298a41d9e9638101436faa01cdf1586

Summary: This is long overdue, but finally deprecating the RobertaEncoder components and just using TransformerEncoder directly. This will make it easier for some upcoming online backtranslation changes, and will eventually make migrating it to dataclasses/Hydra easier too. It also fixes some longstanding inconsistencies in layernorm placement in the model parallel roberta code. Pull Request resolved: fairinternal/fairseq-py#1560 Test Plan: - confirmed that training gives identical losses as before: https://gist.github.com/myleott/9a4d213fb88a02b00094ea074f5a2e2d - confirmed that old roberta models can be loaded and produce identical results - confirmed that old linformer models can be loaded and produce identical results (reran commands from D25938236 (facebookresearch@bf54551)) - confirmed that old model parallel models can be loaded and produce identical results: ``` python -m fairseq_cli.validate --path checkpoint.mp1/checkpoint_last.pt --task dummy_masked_lm --criterion masked_lm --max-sentences 8 --dataset-size 100 --model-parallel-size 2 --distributed-world-size 2 before: 2021-01-19 19:04:14 | INFO | valid | | valid on 'valid' subset | loss 14.62 | ppl 25174.3 | wps 0 | wpb 53248 | bsz 104 after: 2021-01-19 19:06:59 | INFO | valid | | valid on 'valid' subset | loss 14.62 | ppl 25174.3 | wps 0 | wpb 53248 | bsz 104 ``` Reviewed By: gwenzek, ngoyal2707 Differential Revision: D25937145 Pulled By: myleott fbshipit-source-id: 1ce0bc93e28e03fb926534ea4134684a49232599

Summary: Pull Request resolved: fairinternal/fairseq-py#1570 Test Plan: Imported from OSS Reviewed By: gwenzek, ngoyal2707 Differential Revision: D25967675 Pulled By: myleott fbshipit-source-id: 7c7f8d25b87ef9b4f0a85331548bb3a2886a1e92

Summary: # Before submitting - [ ] Was this discussed/approved via a Github issue? (no need for typos, doc improvements) - [ ] Did you read the [contributor guideline](https://github.com/pytorch/fairseq/blob/master/CONTRIBUTING.md)? - [ ] Did you make sure to update the docs? - [ ] Did you write any new necessary tests? ## What does this PR do? Fixes # (issue). ## PR review Anyone in the community is free to review the PR once the tests have passed. If we didn't discuss your PR in Github issues there's a high chance it will not be merged. ## Did you have fun? Make sure you had fun coding � Pull Request resolved: fairinternal/fairseq-py#1629 Reviewed By: myleott Differential Revision: D26484942 Pulled By: sshleifer fbshipit-source-id: 9dcbab5c404c14d8f35628d823102ad9ce59dffd

Summary: Integrating LASER (Language-Agnostic SEntence Representations) training code - [ ] Was this discussed/approved via a Github issue? (no need for typos, doc improvements) - [ Y] Did you read the [contributor guideline](https://github.com/pytorch/fairseq/blob/master/CONTRIBUTING.md)? - [ N/A] Did you make sure to update the docs? - [ Y] Did you write any new necessary tests? => an additional test in `test_iterators.py` ## What does this PR do? This diff introduces the training code for LASER. It includes a specific `laser` task in `laser_task.py` which reads a json configuration file describing the binarized datasets of language pairs. `multitask_data_utils.py` defines dataset wrappers and iterators used by `laser` task. ## PR review Anyone in the community is free to review the PR once the tests have passed. If we didn't discuss your PR in Github issues there's a high chance it will not be merged. ## Did you have fun? Yes. � Pull Request resolved: fairinternal/fairseq-py#1207 Reviewed By: myleott Differential Revision: D26454296 Pulled By: Celebio fbshipit-source-id: c987672aa66abf31b039ee11867b06912d3486e5

…1626) Summary: Add back a couple speed optimizations in the original roberta code that got lost in the refactor Pull Request resolved: fairinternal/fairseq-py#1626 Reviewed By: gwenzek Differential Revision: D26478534 Pulled By: myleott fbshipit-source-id: b945de5e9bffd51cd63630cc3aa1f0078a41cca8

…ookresearch#3253) Summary: # Before submitting - [ ] Was this discussed/approved via a Github issue? (no need for typos, doc improvements) - [x] Did you read the [contributor guideline](https://github.com/pytorch/fairseq/blob/master/CONTRIBUTING.md)? - [x] Did you make sure to update the docs? - [x] Did you write any new necessary tests? ## What does this PR do? - updates audio_utils to handle multi-channel audio as well as mono, with no change needed for existing recipes - adds speech-to-text example for Multilingual TEDx (http://openslr.org/100) data ## PR review Anyone in the community is free to review the PR once the tests have passed. If we didn't discuss your PR in Github issues there's a high chance it will not be merged. ## Did you have fun? Make sure you had fun coding � Pull Request resolved: facebookresearch#3253 Reviewed By: yuntang Differential Revision: D26514419 Pulled By: kahne fbshipit-source-id: 699e428affda5b1347f96a8310691ab152dd6769

Summary: after D26382917 (facebookresearch@02803a1) shipped somehow the self._device was removed in optimizer, (or maybe I didn't test it the right way in the previous diff?) fortunately OSS doesn't need it any way. Reviewed By: myleott Differential Revision: D26523538 fbshipit-source-id: 637c1e344670340ae40b32635ef51f5501966b0c

Summary: This is the pull request for the code for the paper [SimulMT to SimulST: Adapting Simultaneous Text Translation to End-to-End Simultaneous Speech Translation](https://www.aclweb.org/anthology/2020.aacl-main.58/) The model will also be used for [IWSLT 2021 shared task on simultaneous translation ](https://iwslt.org/2021/simultaneous) This pull request includes - Convtransformer offline model - Convtransformer simultaneous translation model with fixed pre-decision module - The agent files for inference for the convtransformer simultaneous translation model jmp84 The README is still missing. Just curious where should I place it? Pull Request resolved: fairinternal/fairseq-py#1607 Test Plan: Imported from GitHub, without a `Test Plan:` line. ********** One of the failing landing integration tests ``` buck test mode/dev //multimo/fb/models/test:multimo_fb_model_test https://fburl.com/testinfra/oxq2cn5n ``` Reviewed By: jmp84 Differential Revision: D26439663 Pulled By: sravyapopuri388 fbshipit-source-id: b127cb4962756af221b65e3ccb6598a42fc75f7f

Summary: This diff integrates simul ST training into pyspeech with very minor modifications to the open sourced code. Specific changes made are - In fixed_pre_decision.py remove self as argument to p_choose function as it is already called with super in line 101 - In monotonic_multihead_attention.py remove pdb.set_trace() - Move label_smoothed_cross_entropy_latency_augmented.py to fairseq/criterions folder and add missing arguments to parser - In fairseq/data/data_utils.py type cast max_tokens to int to avoid type error. - Update fairseq/convtransformer.py to pyspeech/convtransformer.py # Next steps: - Verify decoding using the model trained - Support everstore handle based decoding in simuleval and integrate it into pyspeech. Reviewed By: jmp84 Differential Revision: D26478861 fbshipit-source-id: 3b02b2aee757e5464b71dbdd7ebdba42659faee5

Summary: Fix LibriSpeech data prep script * Lowercasing transcript to be consistent with the pre-trained models Reviewed By: jmp84 Differential Revision: D26538845 fbshipit-source-id: 0885f99e2c85f0e722a24f3cb83f2635ce9429bc

Summary: # Before submitting - [x] Was this discussed/approved via a Github issue? (no need for typos, doc improvements) - [x] Did you read the [contributor guideline](https://github.com/pytorch/fairseq/blob/master/CONTRIBUTING.md)? - [x] Did you make sure to update the docs? - [ ] Did you write any new necessary tests? ## What does this PR do? Fixes KeyError mentioned in # (3211). ## PR review Anyone in the community is free to review the PR once the tests have passed. If we didn't discuss your PR in Github issues there's a high chance it will not be merged. ## Did you have fun? Make sure you had fun coding � Pull Request resolved: facebookresearch#3212 Reviewed By: alexeib Differential Revision: D26513255 Pulled By: myleott fbshipit-source-id: 5a11cb369c9d4202fab6998d269e7da5f3d3e534

…kresearch#3249) Summary: # Before submitting - [x] Was this discussed/approved via a Github issue? (no need for typos, doc improvements) - [x] Did you read the [contributor guideline](https://github.com/pytorch/fairseq/blob/master/CONTRIBUTING.md)? - [x] Did you make sure to update the docs? - [ ] Did you write any new necessary tests? ## What does this PR do? Fixes facebookresearch#3178 (issue). ## PR review Anyone in the community is free to review the PR once the tests have passed. If we didn't discuss your PR in Github issues there's a high chance it will not be merged. ## Did you have fun? Make sure you had fun coding � (I did ;) Pull Request resolved: facebookresearch#3249 Reviewed By: alexeib Differential Revision: D26513275 Pulled By: myleott fbshipit-source-id: 2785098a945404c07eb72c079177654b1739a7a2

Summary: I tried resuming a run from a checkpoint in f250883864, but ran into: AssertionError: Criterion does not match; please reset the optimizer (--reset-optimizer). DistributedTimeoutWrapper vs ContrastiveLabelsCriterion Based on this, I believe since D25836853 (facebookresearch@d68a353) we are no longer saving the actual criterion's name, but DistributedTimeoutWrapper in the checkpoint. This is kind of weird though, as I would expect more people to run into this issue. Not sure if I am doing something wrong, let me know if so, thanks! Reviewed By: myleott Differential Revision: D26478656 fbshipit-source-id: bc3c7c925f5505140d9df4438af3a73d65d4f531

Summary: Pull Request resolved: fairinternal/fairseq-py#1669 Unit tests for async writes integration done in D26467815 (facebookresearch@3100d0b). Ongoing performance tests: https://fb.quip.com/kjM7Atb1kKbO Reviewed By: myleott Differential Revision: D26732660 fbshipit-source-id: faf8cac67b9167af4195358c1a2592804c13562c

Reviewed By: vimalmanohar Differential Revision: D26220694 fbshipit-source-id: ed13f8527a1b203e1a9d004fa8a86e1ad6423d60

Summary: The sampling process in multi_corpus_dataset is very inefficient. Turns out we can signficantly optimize it by sampling in batches rather than one by one. this allows: 1. fast local development and iteration with corpus sampling, as the turnaround time was long before 2. makes it take less time for our jobs can start training, enabling earlier signal if for example there is a configuration issue Reviewed By: zhengwy888 Differential Revision: D26187821 fbshipit-source-id: b4f7f6b7c187b3785499308226e2af671a6c354f

Summary: there are a few changes here: - convert config persisted in checkpoints into a plain dict when saving and back to omegaconf config when loading: this helps avoid compatibility issues between different versions of python, omegaconf, etc - update checkpoints that have old print_alignment saved - add lr_float to composite optimizer to enable sweeping on lr with auto sweepers like ax - fixing some edge cases for config loading Pull Request resolved: fairinternal/fairseq-py#1671 Reviewed By: myleott Differential Revision: D26791583 Pulled By: alexeib fbshipit-source-id: 124dec74932052925c43b6a93130f4428803cb46

Summary: Provide an ability to pass attn_mask to TransformerSentenceEncoder. The default is None and hence this is backwards compatible. The attention mask can either be a 2D tensor (of shape [tgt_seq_len, src_seq_len]) or a 3D tensor of shape (bcz * num_heads, tgt_seq_len, src_seq_len). In case of self attention, tgt_seq_len = src_seq_len. Reviewed By: myleott Differential Revision: D26790767 fbshipit-source-id: 937d6c6cf08790c7d43d33fda97a30425f31ea06

Summary: Pull Request resolved: fairinternal/fairseq-py#1666 Context: the checkpoint saving call stack has become a bit convoluted: ``` train.py + checkpoint_utils.save_checkpoint + trainer.save_checkpoint + checkpoint_utils.save_state + checkpoint_utils.torch_persistent_save ``` This diff slightly simplifies the checkpoint saving logic by exposing a `state_dict` method inside the Trainer. This simplifies the call stack to: ``` train.py + checkpoint_utils.save_checkpoint + trainer.save_checkpoint + checkpoint_utils.torch_persistent_save ``` This new structure is important for the FullyShardedDataParallel diff (next diff in the stack), since it enables the Trainer to save multiple checkpoints for the different optimizer state shards. Test Plan: - unit tests - trained WMT En-De models; confirmed checkpoints save/load properly, resuming from a checkpoint gives identical results - `buck test fblearner/flow/projects/langtech/translation:tests` (2 failures are in trunk too): https://www.internalfb.com/intern/testinfra/testconsole/testrun/2533274840914654/ Reviewed By: zhengwy888 Differential Revision: D26771146 Pulled By: myleott fbshipit-source-id: 10f91979cd42205c1d8abcaa9ab56f63eba31e93

facebookresearch#1667) Summary: Pull Request resolved: fairinternal/fairseq-py#1667 Add support for FullyShardedDataParallel (--ddp-backend=fully_sharded) This enables fully parameter + optimizer state sharding by using FullyShardedDataParallel (FSDP) from fairscale. The user just needs to provide `--ddp-backend=fully_sharded` to enable. Other common options work out-of-the-box (e.g., `--fp16`, `--memory-efficient-fp16`, `--update-freq`, etc.). This should be a drop-in replacement for the "c10d" backend. This yields pretty big speedups for small models and enables training ~13B parameter models on 8 GPUs and 175B parameter models on 128 GPUs, without model parallelism. This also adds a new option `--cpu-offload` that offloads the optimizer state and FP32 model copy to CPU, which is particularly useful when combined with `--optimizer=cpu_adam`. Note: after enabling this, each GPU will save a checkpoint file, since the optimizer state is sharded. Each checkpoint will contain a single shard of the optimizer state and the rank 0 checkpoint will contain the full model weights. Note: a known limitation of the current implementation is that you cannot resume training on a different world_size. This constraint will be relaxed in future iterations. Test Plan: Imported from OSS Reviewed By: sshleifer Differential Revision: D26771144 Pulled By: myleott fbshipit-source-id: 74c2f46f57719e24e2dcfc9d9ee7c2fc0aeedb46

Summary: 1. In fblearner flow we are dumping cmvn stats into json file (e.g. f253830726) Previously there's only --config option taking .npz path from a yaml file, and it's the only usage for the config. This diff adds an option --global-stats to import from json. 2. Inherit FairseqSimulSTAgent from nn.Module instead of SpeechAgent whose root class is object to prepare for scripting methods. Copy over / simplify all the necessary methods from SpeechAgent/Agent. Reviewed By: jmp84 Differential Revision: D26800957 fbshipit-source-id: 74be527f8473c13405a60bb16ce6da5a7dc0b888

Summary: Fix bug on converting stereo audio in audio_utils.py - Github issue: facebookresearch#3303 Reviewed By: jmp84 Differential Revision: D26825964 fbshipit-source-id: 26905e71540bc52e98d76996b199ac0fbe78357b

Summary: # Before submitting - [ ] Was this discussed/approved via a Github issue? (no need for typos, doc improvements) - [x] Did you read the [contributor guideline](https://github.com/pytorch/fairseq/blob/master/CONTRIBUTING.md)? - [ ] Did you make sure to update the docs? - [ ] Did you write any new necessary tests? ## What does this PR do? Fix a typo in gcmv_path given for config yaml generation (actual: gcvmn_cvmn_path, correct: gcmvn_path) ## PR review Anyone in the community is free to review the PR once the tests have passed. If we didn't discuss your PR in Github issues there's a high chance it will not be merged. ## Did you have fun? Make sure you had fun coding � Pull Request resolved: facebookresearch#3307 Reviewed By: jmp84 Differential Revision: D26826231 Pulled By: kahne fbshipit-source-id: 6b60f2a8a8b4ba1c0c088299a08ef04fdfe870a8

alexeib

sorry for the delay in reviewing. this is looking very good. could you please add a usage example to examples/wav2vec/README.md and then i will merge

alexeib · 2021-03-09T01:44:10Z

also can you do this as a PR to fairseq repo at https://github.com/pytorch/fairseq/?

…#3327) Summary: Pull Request resolved: facebookresearch#3327 Reviewed By: sshleifer Differential Revision: D26899416 Pulled By: myleott fbshipit-source-id: bbb493a5c4e0a51f3b26fe8f94e3962b6206d6f6

Summary: Pull Request resolved: facebookresearch#3331 Reviewed By: sshleifer Differential Revision: D26912554 Pulled By: myleott fbshipit-source-id: b45a161fbd52a12da13d7e011d562d35a5b5a1a7

Summary: update audio_utils and fix mTEDx example - Updated `audio_utils` - Added support for OGG Vorbis (the only supported lossy compressed format) - Added a separate `convert_to_mono()` helper function - Updated `get_waveform()` - added new arguments `frames` and `start` for reading part of audios - added new argument `mono` for auto conversion to mono-channel audio - unified returned waveform shape to channels x length (same as torchaudio default) - Updated mTEDx and MUST-C data prep scripts - Replaced `torchaudio.info()` with `soundfile.info()` (the latter is faster and the former has incompatible interface between <0.8 and the latest 0.8) - Replaced `torchaudio.load()` with `get_waveform` for auto conversion to mono channel Reviewed By: jmp84 Differential Revision: D26901114 fbshipit-source-id: fa9560c9714d51a91157d5141564574d4eee454d

Summary: Pull Request resolved: fairinternal/fairseq-py#1683 Reviewed By: jmp84 Differential Revision: D26914869 Pulled By: xutaima fbshipit-source-id: a5d2efdcff1852e56304e77838840b3aad5124b0

Summary: ### Changes: - `PlasmaArray` saves the underlying data to `self.array`, `PlasmaView` never does that, instead it fetches the data from `plasma_store` shared memory when it is needed. - `PlasmaArray` starts a new, ephemeral plasma_store and puts a new array in it when it is pickled. If `--use-plasma-view`, there is one server started before `spawn` and arrays are only put into it once, in `PlasmaArray.__init__` to accommodate this. - user can now pass `--plasma-path` to explicitly control where server is started. - We now make plasma keys based on `(split_path, (block_size, document_sep_len, str(break_mode), len(dataset)))`, so two jobs sharing plasma server but with different datasets, or same dataset but different clargs, will read each the other's array. ### Results [pre March 1] This saves some CPU memory (5-15%), according to both `psutil` and `psrecord`: here we run base_cmd (below) with num_workers=0,2,8, 2 GPUS and collect the logs. `branch` refers to `--use-plasma-view`, `master` uses `PlasmaArray` ``` +-------------------------+----------------+---------+-------+ | setting | cpu_mem_used | wps | ppl | +=========================+================+=========+=======+ | branch_nw0_gpu2_ddm.log | 12 | 55143.2 | 429.1 | +-------------------------+----------------+---------+-------+ | branch_nw2_gpu2_ddm.log | 13.67 | 43377.6 | 429.1 | +-------------------------+----------------+---------+-------+ | branch_nw8_gpu2_ddm.log | 18.36 | 53019.9 | 429.1 | +-------------------------+----------------+---------+-------+ | master_nw0_gpu2_ddm.log | 12.26 | 56733 | 429.1 | +-------------------------+----------------+---------+-------+ | master_nw2_gpu2_ddm.log | 14.58 | 53337.9 | 429.1 | +-------------------------+----------------+---------+-------+ | master_nw8_gpu2_ddm.log | 21.1 | 53217.2 | 429.1 | +-------------------------+----------------+---------+-------+ ``` ### Replication 1) get this branch ```bash git fetch && git checkout share-plasma-server ``` 2) Train tiny model and save logs ```bash base_cmd () { fairseq-train --fp16 /private/home/sshleifer/data-bin/stories_mmap \ --task language_modeling \ --arch transformer_lm_gpt2_tiny \ --sample-break-mode complete --tokens-per-sample 512 \ --optimizer adam --clip-norm 0.0 --lr 0.0005 \ --batch-size 1 \ --max-update 200 --max-epoch 1 \ --log-format simple --log-interval 100 \ --restore-file x.pt --no-save \ --skip-invalid-size-inputs-valid-test --disable-validation $@ } USE_LOCK=1 CUDA_VISIBLE_DEVICES=0,1 base_cmd --num-workers 0 --use-plasma-view | tee branch_nw0_gpu2_ddm.log ``` ### TODO: - [x] test larger dataset - [x] make it optional, cleanup - [x] 1 GPU - [x] unit-tests - [x] ask hashing Q on stackoverflow https://stackoverflow.com/questions/66354598/deterministic-method-to-hash-np-array-int - [ ] measure whether `PlasmaArray` disable for small array's logic helps - [ x] test with fb_sweep - [ x] measure 4 GPU savings Pull Request resolved: fairinternal/fairseq-py#1645 Test Plan: Read github PR description: fairinternal/fairseq-py#1645 Reviewed By: myleott Differential Revision: D26630365 Pulled By: sshleifer fbshipit-source-id: b0c4163fbc97a7aefb116de70265fba11f6d7b42

…1690) Summary: Pull Request resolved: fairinternal/fairseq-py#1690 Reviewed By: jmp84 Differential Revision: D27025669 Pulled By: xutaima fbshipit-source-id: 8125365adedfdc938813d08e911e1f6ebe4f584b

… early Summary: I had some issues with loading checkpoints from 5B parameter models (60 GB checkpoint files) due to OOM. Reviewed By: myleott Differential Revision: D27027616 fbshipit-source-id: 2b816e8e46ec80f0ec721aa7a6702cee531b94eb

- xmp.spawn 8 or 1 processes instead of always 8. - util function to get the xla metrics report.o - util functions to move stuff to/from tpu. - make utils.item a no-op for xla. This is not critical on xla, and causes big performance hit. - util function to check if a tensor is on xla device. - util function to do torch.index_put efficiently on xla. dd

- add util function to mark step and send a given tensor/container to cpu. - instead of 1 transfer per tensor (N total) in `logging_outputs`, we can do 1 transfer total. - remove redundant mark_step's - remove redundant compilation check on each device. XLA metrics are global even if they come from one device. - s/GPU/device/g

- XLA compiles every time it sees a new graph, this includes dynamic input shapes. - This commit introduces bucketing to raw_audio_dataset. - Tweaks bucket_pad_length_dataset and data_utils.py to enable this. - Computation of mask indices in wav2vec2's `forward` is costly on XLA. - Moving it to the data preparation phase, optionally for gpus, forced for tpus.

- Use the util functions from previous commits in order to route the XLA codepath better. - In model - Compute mask_indices only if it's not pre-computed in data prep phase. - Remove the dynamicity in model's forward caused by mask_indices. - adjust loss computation in criterion accordingly. - Adjust sampling of negatives, by integrating the padding_count that comes from data prep phase. - future work; sampling of negatives could also be taken out of model and to the data prep phase. I experimented w/ this and observed speed gains. - Copy hydra params from model to task, in order for dataset's to have the necessary mask arguments to enable mask indices creation.

Per previous commit, audio_pretraining task tries to copy mask prepatarion related arguments to pass on to fairseq_dataset. For the downstream finetuning job, fairseq uses the same task, and even though the task arguments are optional, when it tries to copy from model and can't (for a GPU built model), it errors. Maybe there's a better way to do this in hydra, by passing a kwarg to `II`?

patrickvonplaten and others added 14 commits February 10, 2021 14:04

Extra logging to confirm OOM source

ac90cb3

Reviewed By: myleott, chtran Differential Revision: D26348808 fbshipit-source-id: 010ef00024e02c09ec35b624f0713ce5f1f387b4

taylanbil requested review from alexeib and bilgeacun February 16, 2021 22:32

Myle Ott and others added 14 commits February 16, 2021 15:52

Fix LibriSpeech data prep script

675f608

Summary: Fix LibriSpeech data prep script * Lowercasing transcript to be consistent with the pre-trained models Reviewed By: jmp84 Differential Revision: D26538845 fbshipit-source-id: 0885f99e2c85f0e722a24f3cb83f2635ce9429bc

EricZLou and others added 10 commits March 3, 2021 10:50

add unit test for multi_corpus_dataset

1fed7a8

Reviewed By: vimalmanohar Differential Revision: D26220694 fbshipit-source-id: ed13f8527a1b203e1a9d004fa8a86e1ad6423d60

fix bug on converting stereo audio in audio_utils.py

7c95746

Summary: Fix bug on converting stereo audio in audio_utils.py - Github issue: facebookresearch#3303 Reviewed By: jmp84 Differential Revision: D26825964 fbshipit-source-id: 26905e71540bc52e98d76996b199ac0fbe78357b

alexeib approved these changes Mar 9, 2021

View reviewed changes

Myle Ott and others added 13 commits March 9, 2021 06:31

Add README/tutorial for Fully Sharded Data Parallel (facebookresearch…

00d5b7a

…#3327) Summary: Pull Request resolved: facebookresearch#3327 Reviewed By: sshleifer Differential Revision: D26899416 Pulled By: myleott fbshipit-source-id: bbb493a5c4e0a51f3b26fe8f94e3962b6206d6f6

Update README for Fully Sharded Data Parallel (facebookresearch#3331)

c600667

Summary: Pull Request resolved: facebookresearch#3331 Reviewed By: sshleifer Differential Revision: D26912554 Pulled By: myleott fbshipit-source-id: b45a161fbd52a12da13d7e011d562d35a5b5a1a7

Update simul trans doc (facebookresearch#1683)

d031611

Summary: Pull Request resolved: fairinternal/fairseq-py#1683 Reviewed By: jmp84 Differential Revision: D26914869 Pulled By: xutaima fbshipit-source-id: a5d2efdcff1852e56304e77838840b3aad5124b0

Fix a bug that FairseqSimulSTAgent is not an agent (facebookresearch#…

252d5a9

…1690) Summary: Pull Request resolved: fairinternal/fairseq-py#1690 Reviewed By: jmp84 Differential Revision: D27025669 Pulled By: xutaima fbshipit-source-id: 8125365adedfdc938813d08e911e1f6ebe4f584b

Add warning if mask_channel_prob is 0 on TPUs.

62a96aa

taylanbil force-pushed the enable-w2v2-tpu branch from 4040950 to 62a96aa Compare March 15, 2021 22:52

taylanbil added 4 commits March 16, 2021 17:41

Add missing import.

388f420

Added content to README about tpus and examples.

f2baa7e

Default to mask precomputation in dataset when running on tpus.

1cee791

Add working example of hydra + config.

42932be

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable w2v2 tpu#7

Enable w2v2 tpu#7
taylanbil wants to merge 82 commits intoenable-w2v2-tpu-BASECOMMITfrom
enable-w2v2-tpu

taylanbil commented Feb 16, 2021

Uh oh!

alexeib left a comment

Uh oh!

alexeib commented Mar 9, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants

Conversation

taylanbil commented Feb 16, 2021

Uh oh!

alexeib left a comment

Choose a reason for hiding this comment

Uh oh!

alexeib commented Mar 9, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants