Multi-Agent Dual Learning

## Abstract
- propose `multi-agent dual learning` framework to boost performance of neural machine translation
- dual learning leverages the duality between primal task (`X->Y`) and dual task (`Y->X`)
- SOTA score on WMT 2014 EnDe BLEU : 30.67 (+2.2 compared to Transformer_big)

## Details
### Introduction
- Dual Learning
  - formulated as a two-agent system where primal model learns `f : X -> Y` mapping and dual model learns `g : Y -> X` mapping.
  - given `x in X`, `delta (x, g(f(x)))` is the reconstruction loss function used for training signal.
  - Theoretically, monolingual corpus is sufficient to learn NMT model in dual learning framework.
  - refer to original dual learning paper ([Xia et al. 2016](https://arxiv.org/pdf/1611.00179.pdf)) accepted at NIPS2016 for more details
- Multi-agent Dual Learning
  - instead of single `f` and `g`, multi agent system uses `N - 1` additional agents in each side, pre-trained with parallel corpus via different random seed. Ensemble effect boosts the quality of feedback signal.
### Algorithm
![screen shot 2018-10-05 at 1 19 09 pm](https://user-images.githubusercontent.com/7529838/46515854-43590f00-c8a1-11e8-8fc2-d47d8af85687.png)
### Results
- Experimental Settings
  - Model : Transformer Big
  - compare with Knowledge Distillation (KD), Back Translation (BT) and two-agent Dual Learning (Dual) each with single and multi-agent
- IWSLT En <-> De
  - KD improves BLEU little, BT has no effect, Dual-5 improves BLEU best
![screen shot 2018-10-05 at 1 19 52 pm](https://user-images.githubusercontent.com/7529838/46515871-5bc92980-c8a1-11e8-8255-a0d02934eba8.png)
- IWSLT Es, Ru, He -> En
  - result is consistent throughout various language pairs in IWSLT
![screen shot 2018-10-05 at 1 22 09 pm](https://user-images.githubusercontent.com/7529838/46515923-bbbfd000-c8a1-11e8-96ad-5a9a320f13fc.png)
- WMT 2014 En <-> De Bilingual
  - **KD improves BLEU little, BT has no effect, Dual-5 improves BLEU best (SOTA)**
![screen shot 2018-10-05 at 1 23 22 pm](https://user-images.githubusercontent.com/7529838/46515933-d85c0800-c8a1-11e8-9c7b-0349d0bd07f6.png)
- WMT 2014 En <-> De Monolingual
  - also, performs best in unsupervised NMT (SOTA)
![screen shot 2018-10-05 at 1 23 34 pm](https://user-images.githubusercontent.com/7529838/46515948-f4f84000-c8a1-11e8-90fa-de72a4a436d5.png)
### Image Translation
- compares Multi-Agent Dual Learning with CycleGAN in image translation, with MADL showing more robust and cleaner image translation

## Personal Thoughts
- Multi-Agent pre-trained models provide good initialization point and improve the quality of feedback signal
- Existing dual learning seemed to have only theoretical merit, not practical enough. But this paper uncovers the practical merit as well.
- Seems to work across various languages

Link : https://openreview.net/pdf?id=HyGhN2A5tm
Authors : Anonymous 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multi-Agent Dual Learning #113

Abstract

Details

Introduction

Algorithm

Results

Image Translation

Personal Thoughts

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Multi-Agent Dual Learning #113

Description

Abstract

Details

Introduction

Algorithm

Results

Image Translation

Personal Thoughts

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions