Non-Monotonic Sequential Text Generation

## Abstract
- propose a framework for training models of text generation in non-monotonic orders
  - generate tokens in a binary tree structure
- learning is framed as imitation learning
- achieves competitive performance with conventional left-to-right generation
  - tasks : language modeling, sentence completion, word reordering and machine translation

## Details
### Non-Monotonic Generation as Binary-Tree
- an example generation of proposed approach.
  - generation can start from any tokens.
  - number in green box is generation order
  - number in blue box is reconstruction order
- conventional left-to-right can be framed as a special case of binary tree
<img width="481" alt="screen shot 2019-02-10 at 8 11 41 pm" src="https://user-images.githubusercontent.com/7529838/52532977-2ca35700-2d70-11e9-8eb7-573f19924f84.png">

### Learning for Non-Monotonic Generation
- Imitation Learning framework where `oracle` policy provides valid distribution over choices of tokens and model parameter learns it via KL Divergence loss
<img width="490" alt="screen shot 2019-02-10 at 8 26 27 pm" src="https://user-images.githubusercontent.com/7529838/52533106-4e054280-2d72-11e9-8080-3fa557aafc60.png">

- `Oracle` policy is defined by
<img width="480" alt="screen shot 2019-02-10 at 8 27 55 pm" src="https://user-images.githubusercontent.com/7529838/52533114-6d03d480-2d72-11e9-8a5d-898ee5aa5cbf.png">

- where we have a choice for `P_a`. 
  - uniform oracle produces uniform distribution over valid tokens. (does not lead to optimal quality)
  - coaching oracle : multiply uniform and current policy
<img width="492" alt="screen shot 2019-02-10 at 8 28 02 pm" src="https://user-images.githubusercontent.com/7529838/52533115-6d9c6b00-2d72-11e9-87fd-1e4d622a6685.png">

- annealed coaching oracle : linear weighted sum of coaching and uniform oracle to provide variety in learning
<img width="492" alt="screen shot 2019-02-10 at 8 28 09 pm" src="https://user-images.githubusercontent.com/7529838/52533116-6d9c6b00-2d72-11e9-8ba7-394e6bac9f68.png">

- in imitation learning, roll-in policy is an stochastic mixture of learned model and oracle policy, but in this task, simply using oracle policy throughout performs better

### Experiments
#### Language Model
- Dataset : Persona-Chat dataset with 133k / 16k / 15k
- Model : 2-layered uni-directional LSTM
- non-monotonic (`annealed`) LM produced more diverse(unique and novel) sentences, with average span 1.3~1.4 (span = avg number of child nodes) 
<img width="493" alt="screen shot 2019-02-10 at 8 36 45 pm" src="https://user-images.githubusercontent.com/7529838/52533201-b0ab0e00-2d73-11e9-9ee1-b1792888fb3b.png">

- POS tag analysis leads to interesting insights
  - non-monotonic (`annealed`) produces in order of PUNCT > PNOUN > VERB > NOUN
  - left-to-right produces in order of PNOUN > VERB > NOUN > PUNCT
<img width="491" alt="screen shot 2019-02-10 at 8 38 51 pm" src="https://user-images.githubusercontent.com/7529838/52533220-ec45d800-2d73-11e9-815c-1330a97ca40b.png">

### Sentence Completion
- non-monotonic generation opens up a new spectrum in sentence completion where generation can take place anywhere
  - left-to-right can only complete sentences to its right
<img width="490" alt="screen shot 2019-02-10 at 8 40 48 pm" src="https://user-images.githubusercontent.com/7529838/52533241-52caf600-2d74-11e9-9d0a-853c16169950.png">

### Machine Translation
- Dataset : IWSLT16 DeEn 196k / TED tst2013 / TED tst2014
- Model : 1-layer bi-LSTM
- End-tuning : since `end` tag is frequent in training, model over-produces `end` tag during inference. `P_a` value for `end` is tuned down with validation set.
- 7~8 points lower BLEU than Left-to-Right due to drop in 4-gram precision. (1,2-gram is higher, 3-gram is equivalent)
- relatively less discrepancy on other metrics but still lower than left-to-right
<img width="907" alt="screen shot 2019-02-10 at 8 43 57 pm" src="https://user-images.githubusercontent.com/7529838/52533257-9b82af00-2d74-11e9-8df9-3743b260089a.png">

## Personal Thoughts
- Left-to-Right seems to be a good inductive bias for generation, that's why there is a big gap in quantitative results on machine translation
- Generating tokens in non-monotonic order is far from human's intuitions, but **VERY** interesting idea
- what is the potential gain of generating machine translation outputs in non-monotonic order?
  - this idea is interesting, but seems to make the problem more difficult for the model to learn. model now has to learn all combinatorial cases of sentence generation

Link : https://arxiv.org/pdf/1902.02192.pdf
Authors : Welleck et al. 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Non-Monotonic Sequential Text Generation #121

Abstract

Details

Non-Monotonic Generation as Binary-Tree

Learning for Non-Monotonic Generation

Experiments

Language Model

Sentence Completion

Machine Translation

Personal Thoughts

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Non-Monotonic Sequential Text Generation #121

Description

Abstract

Details

Non-Monotonic Generation as Binary-Tree

Learning for Non-Monotonic Generation

Experiments

Language Model

Sentence Completion

Machine Translation

Personal Thoughts

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions