Abstract
- propose a novel decoding algorithm -
INDIGO, which generates text in an arbitrary order via insertion operation
- achieve competitive or even better performance in machine translation than conventional left-to-right generation.
- Dataset : WMT16 RoEn, WMT18 EnTr, KFTT EnJa
Details
INDIGO
INsertion based Decoding with Inferred Generation Order
- assumes generation orders as latent variables
- use relative position representation to capture generation order
- use Transformer model with relative position
- maximize evidence lower-bound (ELBO) of the original objective function and study four approximate posterior distribution of generation orders

Neural Autoregressive Decoding
- neural autoregressive model commonly learns the probability of a Y given X via product of probability of each token given X and previously generated tokens Y_t
- a common way to decode such sequence model was from left-to-right, as it is a
natural for most human-beings to read sequences (strong inductive bias)
- however, L2R may not be the optimal option for generating all sequences
- Japanese tend to produce better result in R2L
- code generation is beneficial when generated on abstract syntax tree etc
Ordering as latent variable
- add order function
pi to the conditional probability
- L2R can be recovered if
z_t = t

Relative Representation of Positions
- it is essential to use relative representation to model position because we do not know how many tokens will be generated at the end
- relational vector is used to model relative positions of all tokens at each timestep. accumulating relational vector across all timestep leads to relational matrix
Insertion based Decoding
- INDIGO predicts next token and its relative position at each timestep, shown in Alg 1


Learning
- maximizing marginalized likelihood is intractable because we need to consider all T! permutations of tokens, given that tokens are now order-free
- instead, we maximize the evidence lower bound of original objective by introducing an approximate posterior distribution of generation orders which we can flexibly control


Experiment - Machine Translation
- Datasets
- WMT16 RoEn 620k / 2k / 2k
- WMT18 EnTr 207k / 3k / 3k
- KFTT EnJa 405k / 1k / 1k
- Result
- except for Random order, all pre-defined orders perform relatively similar, but L2R / R2L is best
- Adaptive Order with beam=8 performs better than L2R, R2L in all language pairs

Experiment - Word Order Recovery / Code Generation
- improvement via INDIGO is more vivid in word order recovery and code generation tasks


Personal Thoughts
- paper was bit difficult to read
- predicting tokens and its positions autoregressively is an interesting idea
- wish more ablation was on what kind of tokens model predicts first in terms of POS, frequency etc
- interesting to see common-first approach is worse than L2R/R2L. surprised to find out how strong L2R inductive bias is
Link : https://arxiv.org/pdf/1902.01370.pdf
Authors : Gu et al. 2019
Abstract
INDIGO, which generates text in an arbitrary order via insertion operationDetails
INDIGO
INsertion based Decoding with Inferred Generation OrderNeural Autoregressive Decoding
naturalfor most human-beings to read sequences (strong inductive bias)Ordering as latent variable
pito the conditional probabilityz_t = tRelative Representation of Positions
Insertion based Decoding
Learning
Experiment - Machine Translation
Experiment - Word Order Recovery / Code Generation
Personal Thoughts
Link : https://arxiv.org/pdf/1902.01370.pdf
Authors : Gu et al. 2019