| Trust Region Policy Optimization |
ICML2015 |
1502.05477 |
policy-based |
| The Option-Critic Architecture |
AAAI2017 |
1609.05140 |
HRL, option-critic |
| Learning to Act by Predicting the Future |
ICLR2017 |
1611.01779 |
VizDoom |
| Meta Networks |
ICML2017 |
1703.00837 |
meta-learning, MetaNet, few-shot classification |
| FeUdal Networks for Hierarchical Reinforcement Learning |
ICML2017 |
1703.01161 |
FeUDalNet, HRL |
| Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks |
ICML2017 |
1703.03400 |
meta-learning, MAML |
| Domain Randomization for Transferring Deep Neural Networks from Simulation to the Real World |
IROS2017 |
1703.06907 |
sim to real, domain randomization |
| One-Shot Imitation Learning |
NIPS2017 |
1703.07326 |
imitation, demonstration |
| Multi-Level Discovery of Deep Options |
- |
1703.08294 |
DDO, HRL |
| DART - Noise Injection for Robust Imitation Learning |
CoRL2017 |
1703.09327 |
imitation learning , add noise -> more robust |
| Stochastic Neural Networks for Hierarchical Reinforcement Learning |
ICLR2017 |
1704.03012 |
HRL, StocasticNN |
| Deep Q-learning from Demonstrations |
AAAI2018 |
1704.03732 |
DQfD : imitation + RL, discrete |
| Parameter Space Noise for Exploration |
ICLR2018 |
1706.01905 |
OpenAI NoisyNet |
| Noisy Networks for Exploration |
ICLR2018 |
1706.10295 |
DeepMind NoisyNet, part of Rainbow |
| Deep Reinforcement Learning from Human Preferences |
NIPS2017 |
1706.03741 |
RL + human feedback (easier than demonstration) |
| Hindsight Experience Replay |
NIPS2017 |
1707.01495 |
HER, goal-based env, sparse reward, learn from fail |
| Emergence of Locomotion Behaviours in Rich Environments |
- |
1707.02286 |
PPO |
| Robust Imitation of Diverse Behaviors |
NIPS2017 |
1707.02747 |
imitation learning : VAE (behavioral cloning) + GAIL |
| Imitation from Observation - Learning to Imitate Behaviors from Raw Video via Context Translation |
ICRA2018 |
1707.03374 |
imitation learning from obs, context translation |
| Reverse Curriculum Generation for Reinforcement Learning |
CoRL2017 |
1707.05300 |
reverse curriculum |
| Leveraging Demonstrations for Deep Reinforcement Learning on Robotics Problems with Sparse Rewards |
- |
1707.08817 |
DDPGfD : DDPG + DQfD, off-policy imitation, continuous goal-based env |
| When Waiting is not an Option - Learning Options with a Deliberation Cost |
AAAI2018 |
1709.04571 |
HRL, A2OC : A3C + OC + deliberation cost |
| Autonomous Extracting a Hierarchical Structure of Tasks in Reinforcement Learning and Multi-task Reinforcement Learning |
- |
1709.04579 |
HRL, association rule |
| One-Shot Visual Imitation Learning via Meta-Learning |
CoRL2017 |
1709.04905 |
MIL : meta learning (MAML) + imitation learning (BC) |
| Overcoming Exploration in Reinforcement Learning with Demonstrations |
ICRA2018 |
1709.10089 |
Similar to DDPGfD, imitation + DDPG + HER |