This project implements a GridWorld environment where two agents learn optimal navigation strategies using the Q-learning algorithm. The environment features walls, kill zones, terminal states, and a graphical user interface (GUI) for real-time visualization of learning and agent movement.
- Develop a grid-based environment for agents to learn reward-maximizing behaviors.
- Implement the Q-learning algorithm to teach agents optimal actions.
- Provide a GUI to visualize learning processes and the agents' paths.
- Dynamic Grid Initialization: Configurable dimensions with random starting positions.
- Kill Zones: Cells penalizing agents upon entry.
- Terminal States: Goal cells with associated rewards.
- Walls: Impassable barriers restricting agent movements.
- Epsilon-Greedy Action Selection: Balances exploration and exploitation.
- Action Selection: Chooses actions based on Q-values.
- Q-value Updates: Rewards and maximum future Q-values guide updates.
- Epsilon Decay: Reduces exploration over time, favoring exploitation.
- Grid Visualization: Displays players, walls, kill zones, and terminal states.
- User Input: Configure grid size and episode count.
- Simulation Control: Start training and observe real-time agent movements.
- Heatmap: Displays state-action Q-values, highlighting optimal strategies.
- Performance Graphs:
- Rewards vs. Episodes: Tracks cumulative reward per episode.
- Steps vs. Episodes: Monitors steps per episode to measure learning efficiency.
- Classes:
GridWorld: Main environment management class.GridWorldGUI: GUI implementation using Tkinter.
- Key Methods:
__init__: Initializes the grid and player positions.place_killzones: Sets up penalizing cells.set_terminal_state: Configures goal states.action_e_greedy: Implements epsilon-greedy strategy.q_learning_algorithm: Trains agents over specified episodes.
- Python 3.8 or higher
- Required libraries:
numpy,tkinter, andmatplotlib
- Clone the repository:
git clone <repository-link> cd GridWorld-Q-Learning
- Run the simulation after installing dependencies:
python dc.py
- Follow the GUI prompts to configure and start the simulation.
- Training Visualization: Observe agents navigating the grid, avoiding penalties, and optimizing movements toward terminal states.
- Graphs:
- Cumulative Rewards: Shows performance improvements over episodes.
- Steps per Episode: Demonstrates increasing efficiency during training.
- Dynamic Walls and Kill Zones: Randomize placements for increased challenge.
- Opponent Interaction: Incorporate adversarial players for strategic complexity.
- Additional Algorithms: Compare Q-learning with other reinforcement learning approaches.
This project combines algorithmic reinforcement learning with an interactive GUI, providing an engaging way to understand Q-learning principles in a grid-based environment. The simulation demonstrates the effectiveness of Q-learning in teaching agents to optimize navigation strategies.