Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
82 commits
Select commit Hold shift + click to select a range
4fed0be
Fix padding mask for new architectures (#3228)
patrickvonplaten Feb 10, 2021
ac90cb3
Extra logging to confirm OOM source
Feb 10, 2021
7061a0f
better error handling for expired handles
Feb 11, 2021
ee48d1b
Use torch pipe if available in fairseq. (#3149)
pritamdamania Feb 11, 2021
fd7c2a8
More informative exception when numpy version changes (#3231)
mwillwork Feb 11, 2021
66e1803
save task state in the checkpoint (#1562)
alexeib Feb 11, 2021
138265c
Make wav2vec_asr encoder compatible with pyspeech fst decoder
skritika Feb 11, 2021
1d5b075
fix fairseqlm decoder with flashlight chnages (#1617)
alexeib Feb 12, 2021
506a8e0
seq2seq autoregressive flag check (#1618)
alexeib Feb 12, 2021
7ffb40d
Fix typo Wav2Vec2 README.md (#3240)
Feb 12, 2021
f3b6f58
Fix w2v readme (#1621)
alexeib Feb 12, 2021
02803a1
broadcast the whole optimizer state to each rank
Feb 12, 2021
09945b4
Fixes bugs of evaluation with BLEU score when training with multi-gpu…
cordercorder Feb 12, 2021
5ac5e8a
fix sharing objects between tasks (#1623)
alexeib Feb 13, 2021
43415b4
Prepend embedding layer when return_all_hiddens=True in TransformerEn…
Feb 16, 2021
54423d3
refactor RobertaEncoder (#1560)
Feb 16, 2021
7096ac3
Make validate.py work with model parallel (#1570)
Feb 16, 2021
e0788f7
fix bart generation bug (#1629)
sshleifer Feb 17, 2021
7040ce7
LASER training code (#1207)
Celebio Feb 18, 2021
3bc43c1
Fix speed regression after RobertaEncoder refactor (#1626)
Feb 18, 2021
da9eaba
Add support for multi-channel audio and example for mTEDx data (#3253)
esalesky Feb 18, 2021
284a86a
remove the missing _device property
Feb 19, 2021
d2ee588
Simultaneous Speech Translation Model (#1607)
xutaima Feb 19, 2021
523fe83
Integrate Simul ST model into pyspeech
sravyapopuri388 Feb 19, 2021
675f608
Fix LibriSpeech data prep script
kahne Feb 19, 2021
2909ee1
Fix bug for issue (#3211) (#3212)
josephsuh357 Feb 19, 2021
3ef1888
Remove extra arg min_length and fix min_sample_size behavior (#3249)
gazay Feb 19, 2021
c6b5c00
fix criterion name check when resuming from checkpoint
Feb 19, 2021
ae22da6
Correct the estimation of cnn output lengths in convtransformer (#1636)
xutaima Feb 20, 2021
61e46bb
Fix attempt to unlink directory copied into source package (Python 3.…
Feb 20, 2021
4cf7d76
Hydra Integration doc should refer to non legacy task (#1619)
Mortimerp9 Feb 20, 2021
38258a7
Update FairseqSimulSTAgent to make it generic and reusable internally
sravyapopuri388 Feb 22, 2021
808b751
Improve torchscript compatibility of transfomer and transformer pg (#…
madelagua Feb 22, 2021
89cd70c
Fixed scripts and instructions for reproducing the results. (#3264)
mfomicheva Feb 22, 2021
b9778da
Small fixes for flow-cli usage
Feb 22, 2021
ab56066
Fixes circular import as complained by python (#3257)
freewym Feb 22, 2021
c3d2bee
efficient batch level sampling
Feb 24, 2021
55e48f1
downcast indices in TokenBlockDataset (#1647)
sshleifer Feb 24, 2021
5c008e0
make LanguageModelingTask 1% simpler (#1641)
sshleifer Feb 24, 2021
52daa1b
move code to .py files, document usage (#1637)
sshleifer Feb 24, 2021
fb3fadb
Set DynamicLossScaler class defaults to match CLI defaults (#1649)
Feb 24, 2021
b8651bc
actually checking gradnorm consistency
Feb 24, 2021
d3890e5
Add HiveScorer to read data from hive and EverstoreAudioInstance to l…
sravyapopuri388 Feb 25, 2021
f569c02
Relocate simultaneous translation code (#1639)
xutaima Feb 26, 2021
4f881a7
TokenBlockDataset np type promotion issue (#1658)
sshleifer Feb 27, 2021
5354aa3
github CI install pyarrow
Feb 28, 2021
e5e8b3f
Fix nearly all unit-test warnings (#1652)
sshleifer Feb 28, 2021
39e5513
Fix the order of constraints in LanguagePairDataset (#3280)
hiromu Mar 1, 2021
1c0439b
fixes circular imports incurred by a recent commit (#3286)
freewym Mar 2, 2021
3100d0b
ioPath async - opt-in Fairseq integration (#1635)
EricZLou Mar 2, 2021
12e21b9
Add global cmvn for mustc data preparation (#1660)
xutaima Mar 2, 2021
c58af18
Several update on simultaneous translation inference. (#1655)
xutaima Mar 3, 2021
ddc483f
Streaming models for simul ST (#1552)
xutaima Mar 3, 2021
b8786dc
Integrate Augmented memory transformer and emformer based augmented m…
sravyapopuri388 Mar 3, 2021
0c32e25
Update Simultaneous Translation doc (#1659)
xutaima Mar 3, 2021
7d2394b
ioPath async - Fairseq unittests (#1669)
EricZLou Mar 3, 2021
1fed7a8
add unit test for multi_corpus_dataset
Mar 4, 2021
fc2840d
optimize sampling process of multi_corpus_dataset
Mar 4, 2021
f6d60e2
minor fixes and improvements (#1671)
alexeib Mar 4, 2021
f1c595b
Ability to pass attn_mask to TransformerSentenceEncoder
kaushik88 Mar 4, 2021
6d23cc7
Move checkpoint state_dict creation into Trainer (#1666)
Mar 4, 2021
656d7e5
Add support for FullyShardedDataParallel (--ddp-backend=fully_sharded…
Mar 4, 2021
73886ac
Refactor FairseqSimulSTAgent
cndn Mar 4, 2021
7c95746
fix bug on converting stereo audio in audio_utils.py
kahne Mar 5, 2021
16c1a20
Fix Global CMVN path of MustC data preprocessing (#3307)
sarapapi Mar 8, 2021
00d5b7a
Add README/tutorial for Fully Sharded Data Parallel (#3327)
Mar 9, 2021
c600667
Update README for Fully Sharded Data Parallel (#3331)
Mar 9, 2021
05255f9
update audio_utils and fix mTEDx example
kahne Mar 10, 2021
d031611
Update simul trans doc (#1683)
xutaima Mar 11, 2021
2235f86
PlasmaView: don't materialize array in memory (#1645)
sshleifer Mar 12, 2021
252d5a9
Fix a bug that FairseqSimulSTAgent is not an agent (#1690)
xutaima Mar 13, 2021
965240c
optimize memory when loading large checkpoints by deleting state dict…
Mar 15, 2021
4f83334
Improve tpu related utils.
taylanbil Feb 12, 2021
46773af
Improve train.py and trainer.py's tpu capabilities.
taylanbil Feb 12, 2021
dbddbf7
Adapt necessary fairseq_dataset's to support XLA.
taylanbil Feb 12, 2021
10f8605
Make Wav2vec2 Criterion/Task/Model work well with XLA.
taylanbil Feb 12, 2021
ba7ba39
Pass params to model that pretraining task tries to copy from model.
taylanbil Feb 12, 2021
62a96aa
Add warning if mask_channel_prob is 0 on TPUs.
taylanbil Mar 15, 2021
388f420
Add missing import.
taylanbil Mar 16, 2021
f2baa7e
Added content to README about tpus and examples.
taylanbil Mar 17, 2021
1cee791
Default to mask precomputation in dataset when running on tpus.
taylanbil Mar 17, 2021
42932be
Add working example of hydra + config.
taylanbil Mar 19, 2021
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion .github/workflows/build.yml
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,8 @@ jobs:

- name: Install optional test requirements
run: |
python -m pip install fairscale iopath transformers
python -m pip install iopath transformers pyarrow
python -m pip install git+https://github.com/facebookresearch/fairscale.git@master

- name: Lint with flake8
run: |
Expand Down
11 changes: 8 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -61,21 +61,24 @@ We provide reference implementations of various sequence modeling papers:

### What's New:

* March 2021 [Added full parameter and optimizer state sharding + CPU offloading](examples/fully_sharded_data_parallel/README.md)
* February 2021 [Added LASER training code](examples/laser/README.md)
* December 2020: [Added Adaptive Attention Span code](examples/adaptive_span/README.md)
* December 2020: [GottBERT model and code released](examples/gottbert/README.md)
* November 2020: Adopted the [Hydra](https://github.com/facebookresearch/hydra) configuration framework
* [see documentation explaining how to use it for new and existing projects](docs/hydra_integration.md)
* November 2020: [fairseq 0.10.0 released](https://github.com/pytorch/fairseq/releases/tag/v0.10.0)
* October 2020: [Added R3F/R4F (Better Fine-Tuning) code](examples/rxf/README.md)
* October 2020: [Deep Transformer with Latent Depth code released](examples/latent_depth/README.md)
* October 2020: [Added CRISS models and code](examples/criss/README.md)

<details><summary>Previous updates</summary><p>

* September 2020: [Added Linformer code](examples/linformer/README.md)
* September 2020: [Added pointer-generator networks](examples/pointer_generator/README.md)
* August 2020: [Added lexically constrained decoding](examples/constrained_decoding/README.md)
* August 2020: [wav2vec2 models and code released](examples/wav2vec/README.md)
* July 2020: [Unsupervised Quality Estimation code released](examples/unsupervised_quality_estimation/README.md)

<details><summary>Previous updates</summary><p>

* May 2020: [Follow fairseq on Twitter](https://twitter.com/fairseq)
* April 2020: [Monotonic Multihead Attention code released](examples/simultaneous_translation/README.md)
* April 2020: [Quant-Noise code released](examples/quant_noise/README.md)
Expand Down Expand Up @@ -108,6 +111,8 @@ We provide reference implementations of various sequence modeling papers:
* [mixed precision training](https://fairseq.readthedocs.io/en/latest/getting_started.html#training-with-half-precision-floating-point-fp16) (trains faster with less GPU memory on [NVIDIA tensor cores](https://developer.nvidia.com/tensor-cores))
* [extensible](https://fairseq.readthedocs.io/en/latest/overview.html): easily register new models, criterions, tasks, optimizers and learning rate schedulers
* [flexible configuration](docs/hydra_integration.md) based on [Hydra](https://github.com/facebookresearch/hydra) allowing a combination of code, command-line and file based configuration
* [full parameter and optimizer state sharding](examples/fully_sharded_data_parallel/README.md)
* [offloading parameters to CPU](examples/fully_sharded_data_parallel/README.md)

We also provide [pre-trained models for translation and language modeling](#pre-trained-models-and-examples)
with a convenient `torch.hub` interface:
Expand Down
2 changes: 1 addition & 1 deletion docs/hydra_integration.md
Original file line number Diff line number Diff line change
Expand Up @@ -120,7 +120,7 @@ class LanguageModelingConfig(FairseqDataclass):
...

@register_task("language_modeling", dataclass=LanguageModelingConfig)
class LanguageModelingTask(LegacyFairseqTask):
class LanguageModelingTask(FairseqTask):
...
@classmethod
def setup_task(cls, cfg: LanguageModelingConfig):
Expand Down
47 changes: 16 additions & 31 deletions examples/bart/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -179,38 +179,23 @@ with open('glue_data/MNLI/dev_matched.tsv') as fin:
```

#### Evaluating the `bart.large.cnn` model:
Follow instructions [here](https://github.com/abisee/cnn-dailymail) to download and process into data-files such that `test.source` and `test.target` has one line for each non-tokenized sample.
- Follow instructions [here](https://github.com/abisee/cnn-dailymail) to download and process into data-files such that `test.source` and `test.target` has one line for each non-tokenized sample.
- For simpler preprocessing, you can also `wget https://cdn-datasets.huggingface.co/summarization/cnn_dm_v2.tgz`, although there is no guarantee of identical scores
- `huggingface/transformers` has a simpler interface that supports [single-gpu](https://github.com/huggingface/transformers/blob/master/examples/legacy/seq2seq/run_eval.py) and [multi-gpu](https://github.com/huggingface/transformers/blob/master/examples/legacy/seq2seq/run_distributed_eval.py) beam search.
In `huggingface/transformers`, the BART models' paths are `facebook/bart-large-cnn` and `facebook/bart-large-xsum`.

```python
bart = torch.hub.load('pytorch/fairseq', 'bart.large.cnn')
bart.cuda()
bart.eval()
bart.half()
count = 1
bsz = 32
with open('test.source') as source, open('test.hypo', 'w') as fout:
sline = source.readline().strip()
slines = [sline]
for sline in source:
if count % bsz == 0:
with torch.no_grad():
hypotheses_batch = bart.sample(slines, beam=4, lenpen=2.0, max_len_b=140, min_len=55, no_repeat_ngram_size=3)

for hypothesis in hypotheses_batch:
fout.write(hypothesis + '\n')
fout.flush()
slines = []

slines.append(sline.strip())
count += 1
if slines != []:
hypotheses_batch = bart.sample(slines, beam=4, lenpen=2.0, max_len_b=140, min_len=55, no_repeat_ngram_size=3)
for hypothesis in hypotheses_batch:
fout.write(hypothesis + '\n')
fout.flush()
```

Install `files2rouge` from [here](https://github.com/pltrdy/files2rouge).
In `fairseq`, summaries can be generated using:

```bash
cp data-bin/cnn_dm/dict.source.txt checkpoints/
python examples/bart/summarize.py \
--model-dir pytorch/fairseq \
--model-file bart.large.cnn \
--src cnn_dm/test.source \
--out cnn_dm/test.hypo
```

For calculating rouge, install `files2rouge` from [here](https://github.com/pltrdy/files2rouge).

```bash
export CLASSPATH=/path/to/stanford-corenlp-full-2016-10-31/stanford-corenlp-3.7.0.jar
Expand Down
55 changes: 18 additions & 37 deletions examples/bart/README.summarization.md
Original file line number Diff line number Diff line change
Expand Up @@ -80,42 +80,23 @@ Expected training time is about `5 hours`. Training time can be reduced with dis
Use TOTAL_NUM_UPDATES=15000 UPDATE_FREQ=2 for Xsum task

### Inference for CNN-DM test data using above trained checkpoint.
After training the model as mentioned in previous step, you can perform inference with checkpoints in `checkpoints/` directory using following python code snippet:
After training the model as mentioned in previous step, you can perform inference with checkpoints in `checkpoints/` directory using `eval_cnn.py`, for example

```python
import torch
from fairseq.models.bart import BARTModel

bart = BARTModel.from_pretrained(
'checkpoints/',
checkpoint_file='checkpoint_best.pt',
data_name_or_path='cnn_dm-bin'
)

bart.cuda()
bart.eval()
bart.half()
count = 1
bsz = 32
with open('cnn_dm/test.source') as source, open('cnn_dm/test.hypo', 'w') as fout:
sline = source.readline().strip()
slines = [sline]
for sline in source:
if count % bsz == 0:
with torch.no_grad():
hypotheses_batch = bart.sample(slines, beam=4, lenpen=2.0, max_len_b=140, min_len=55, no_repeat_ngram_size=3)

for hypothesis in hypotheses_batch:
fout.write(hypothesis + '\n')
fout.flush()
slines = []

slines.append(sline.strip())
count += 1
if slines != []:
hypotheses_batch = bart.sample(slines, beam=4, lenpen=2.0, max_len_b=140, min_len=55, no_repeat_ngram_size=3)
for hypothesis in hypotheses_batch:
fout.write(hypothesis + '\n')
fout.flush()
```bash
cp data-bin/cnn_dm/dict.source.txt checkpoints/
python examples/bart/summarize.py \
--model-dir checkpoints \
--model-file checkpoint_best.pt \
--src cnn_dm/test.source \
--out cnn_dm/test.hypo
```
For XSUM, which uses beam=6, lenpen=1.0, max_len_b=60, min_len=10:
```bash
cp data-bin/cnn_dm/dict.source.txt checkpoints/
python examples/bart/summarize.py \
--model-dir checkpoints \
--model-file checkpoint_best.pt \
--src cnn_dm/test.source \
--out cnn_dm/test.hypo \
--xsum-kwargs
```
Use beam=6, lenpen=1.0, max_len_b=60, min_len=10 for Xsum Generation
100 changes: 100 additions & 0 deletions examples/bart/summarize.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,100 @@
# Copyright (c) Facebook, Inc. and its affiliates.
#
# This source code is licensed under the MIT license found in the
# LICENSE file in the root directory of this source tree.

import torch
from fairseq.models.bart import BARTModel
import argparse

XSUM_KWARGS = dict(beam=6, lenpen=1.0, max_len_b=60, min_len=10, no_repeat_ngram_size=3)
CNN_KWARGS = dict(beam=4, lenpen=2.0, max_len_b=140, min_len=55, no_repeat_ngram_size=3)


@torch.no_grad()
def generate(bart, infile, outfile="bart_hypo.txt", bsz=32, n_obs=None, **eval_kwargs):
count = 1

# if n_obs is not None: bsz = min(bsz, n_obs)

with open(infile) as source, open(outfile, "w") as fout:
sline = source.readline().strip()
slines = [sline]
for sline in source:
if n_obs is not None and count > n_obs:
break
if count % bsz == 0:
hypotheses_batch = bart.sample(slines, **eval_kwargs)
for hypothesis in hypotheses_batch:
fout.write(hypothesis + "\n")
fout.flush()
slines = []

slines.append(sline.strip())
count += 1

if slines != []:
hypotheses_batch = bart.sample(slines, **eval_kwargs)
for hypothesis in hypotheses_batch:
fout.write(hypothesis + "\n")
fout.flush()


def main():
"""
Usage::

python examples/bart/summarize.py \
--model-dir $HOME/bart.large.cnn \
--model-file model.pt \
--src $HOME/data-bin/cnn_dm/test.source
"""
parser = argparse.ArgumentParser()
parser.add_argument(
"--model-dir",
required=True,
type=str,
default="bart.large.cnn/",
help="path containing model file and src_dict.txt",
)
parser.add_argument(
"--model-file",
default="checkpoint_best.pt",
help="where in model_dir are weights saved",
)
parser.add_argument(
"--src", default="test.source", help="text to summarize", type=str
)
parser.add_argument(
"--out", default="test.hypo", help="where to save summaries", type=str
)
parser.add_argument("--bsz", default=32, help="where to save summaries", type=int)
parser.add_argument(
"--n", default=None, help="how many examples to summarize", type=int
)
parser.add_argument(
"--xsum-kwargs",
action="store_true",
default=False,
help="if true use XSUM_KWARGS else CNN_KWARGS",
)
args = parser.parse_args()
eval_kwargs = XSUM_KWARGS if args.xsum_kwargs else CNN_KWARGS
if args.model_dir == "pytorch/fairseq":
bart = torch.hub.load("pytorch/fairseq", args.model_file)
else:
bart = BARTModel.from_pretrained(
args.model_dir,
checkpoint_file=args.model_file,
data_name_or_path=args.model_dir,
)
bart = bart.eval()
if torch.cuda.is_available():
bart = bart.cuda().half()
generate(
bart, args.src, bsz=args.bsz, n_obs=args.n, outfile=args.out, **eval_kwargs
)


if __name__ == "__main__":
main()
Loading