RL - Papers2Python

This repository contains the several successful attempts to implement algorithms from scientific papers in Python from scratch, tested in the OpenAI gym. Currently included:

Deep Q Network (DQN)
Proximal Policy Optimization (PPO)
Deep Deterministic Policy Gradient (DDPG)
Twin Delayed DDPG (TD3)

Deep Q Network

The deep Q network algorithm, as described here, was tested with the cart-pole environment. While the learning curve below shows a relatively stable increase in the rewards the algorithm is able to achieve for each episode, the results are highly dependent on the various hyperparameters.

Proximal Policy Optimization

The proximal policy optimization algorithm, as described here, was tested with the cart-pole environment. The learning curve below shows a steadily increasing reward as the algorithm trains, which is precisely its aim: steady small updates.

Deep Deterministic Policy Gradient

The deep deterministic policy gradient algorithm, as described here, was tested with the inverted pendulum environment. While DDPG was one of the first algorithms to combine policy gradient methods in a continuous action space, it is not overly robust. Running the script multiple times, or slightly changing the hyperparameters can produce rather different looking learning curves from the one shown below.

Twin Delayed DDPG

The twin delayed DDPG algorithm, as described here, was tested with the inverted pendulum environment. It improves the robustness of the DDPG by i) learning 2 Q functions and using the smallest result of the two to reduce value overestimating ii) delaying updates of the policy and target networks iii) adding noise to the target policy's actions to prevent overfitting to possibly incorrect values. A more detailed explanation is provided by OpenAI. A resulting learning curve is shown below shows.

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
DDPG		DDPG
DQN		DQN
Images		Images
PPO		PPO
TD3		TD3
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

RL - Papers2Python

Deep Q Network

Proximal Policy Optimization

Deep Deterministic Policy Gradient

Twin Delayed DDPG

About

Uh oh!

Releases

Packages

Languages

dp90/ReinforcementLearning

Folders and files

Latest commit

History

Repository files navigation

RL - Papers2Python

Deep Q Network

Proximal Policy Optimization

Deep Deterministic Policy Gradient

Twin Delayed DDPG

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages