FrozenLake Deep Q-Learning

As an extension to Q-learning project, this project implements Deep Q-Learning algorithms to solve the FrozenLake-v1 environment from OpenAI's Gymnasium. The goal is to train an agent to navigate a 4x4 grid of ice to reach the goal while avoiding holes.

Overview

This project employs Deep Q-Learning (DQN) to solve the FrozenLake-v1 environment. Deep Q-Learning extends the traditional Q-learning algorithm by using a neural network to approximate the Q-values, making it suitable for environments with larger state spaces. The agent uses a neural network to approximate the Q-values, which allows it to handle larger and more complex environments compared to the traditional Q-table approach. Training involves experience replay and updating the target network periodically to stabilize learning.

Environment: FrozenLake-v1 (4x4 map, deterministic movements)
Algorithm: Deep Q-Learning (DQN) with Experience Replay and Target Networks
Training Output: A neural network that approximates Q-values for each state-action pair
Visualization: The behaviour of the agent after training for 100 episodes is saved in test.png

Code Implementation

The DQN implementation consists of several key components:

Neural Network Model: A neural network approximates the Q-values. It includes an input layer, a hidden layer with ReLU activation, and an output layer with a linear activation function.
Experience Replay: This technique involves storing past experiences in a replay memory and sampling from it to break the correlation between consecutive experiences.
Target Network: A target network provides stable Q-value targets during training. The weights of the target network are periodically updated to match the policy network.
Training Loop: The agent interacts with the environment, stores experiences, and periodically samples from the replay memory to train the neural network.

Resources

The implementation and understanding of the algorithms are based on the following resources:

Update

An alternative to value based approaches such as Q-Learning and Deep Q-Learning (DQN), is Policy based approach for eg. Policy Gradient method, which directly optimizes the policy by learning a probability distribution over actions rather than estimating action values. Unlike DQN’s deterministic approach, Policy Gradient methods use a stochastic policy, sampling actions from a probability distribution, which inherently supports exploration and adapts well to both discrete and continuous action spaces. While DQN relies on experience replay and target networks for stability, Policy Gradient simplifies training by updating the policy based on episode rewards, making it especially effective in environments with stochastic elements or continuous actions.

For more details, please visit Policy Gradient Method (Reinforce Algorithm) Repository.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
.gitattributes		.gitattributes
README.md		README.md
app.ipynb		app.ipynb
policy_network_model.h5		policy_network_model.h5
test.png		test.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

FrozenLake Deep Q-Learning

Overview

Code Implementation

Resources

Update

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

FrozenLake Deep Q-Learning

Overview

Code Implementation

Resources

Update

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages