This project investigates the application of Reinforcement Learning (RL) to robotic control, with a focus on sim-to-real transfer—training policies in simula- tion that generalize to real-world conditions. The first phase used the Gym Hopper environment, where foundational (REINFORCE, Actor-Critic) and state- of-the-art (PPO) algorithms were implemented for lo- comotion. To simulate the reality gap, two custom domains were created: the target with nominal dy- namics, and the source with a 30% torso mass re- duction. Uniform Domain Randomization (UDR) was applied to the remaining three link masses in the source domain, using manually defined uniform distributions. Mass values were sampled at each episode to promote robustness across dynamic vari- ations. The project then extended to a Webots-based scenario, where a Tesla Model 3 learned obstacle avoidance using LiDAR, GPS, and IMU. PPO and SAC were employed, with UDR applied to obstacle positions and to the vehicle’s initial conditions.
This repository contains reinforcement learning agents for training a one-legged Hopper robot in simulation using custom environments and widely used RL algorithms.
The project features two Hopper environment variants:
- Source Domain: torso mass reduced by 30%.
- Target Domain: original environment, serving as a proxy for the real world.
Uniform Domain Randomization (UDR) is applied to all link masses except the torso in the source domain to improve policy robustness for sim-to-real transfer.
- Features
- Installation
- Folder Structure
- How It Works
- Training Instructions
- Evaluation
- Logging and Monitoring
- Troubleshooting
- Custom Gym Hopper environments with source and target domains.
- REINFORCE (with and without baseline) and Actor-Critic algorithms implemented from scratch.
- PPO and SAC agents implemented using Stable Baselines3.
- Domain randomization (UDR) on link masses (torso excluded).
- Integrated logging and monitoring with Weights & Biases (wandb) and TensorBoard.
Follow these steps to set up the environment:
-
Clone the repository
git clone https://github.com/Polixide/RL_final_project.git cd RL_final_project/Gym_Hopper -
Create and activate a Python virtual environment
python3 -m venv mldl_env source mldl_env/bin/activate # On Windows: rl_env\Scripts\activate
-
Upgrade pip
python -m pip install --upgrade pip
-
Install Python dependencies
pip install -r requirements.txt
-
Install MuJoCo
- Download MuJoCo from https://mujoco.org/
- Extract it to
~/.mujoco/mujoco210 - Set environment variables:
export MUJOCO_PY_MUJOCO_PATH=~/.mujoco/mujoco210 export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:~/.mujoco/mujoco210/bin
- Install
mujoco-py:pip install mujoco-py==2.1
-
Configure Weights & Biases (optional)
pip install wandb wandb login
-
Verify installation Run the test script to confirm MuJoCo and Gym integration:
python test_random_policy.py
Gym_Hopper/
├── ACTOR_CRITIC/
│ ├── agent_ACTOR_CRITIC.py # Actor-Critic agent implementation
│ ├── train_ACTOR_CRITIC.py # Training script
│ └── test_ACTOR_CRITIC.py # Evaluation script
│
├── REINFORCE/
│ ├── agent_REINFORCE.py # REINFORCE agent implementation
│ ├── train_REINFORCE.py # Training script
│ └── test_REINFORCE.py # Evaluation script
│
├── PPO/
│ ├── train_PPO.py # PPO training using Stable Baselines3
│ └── test_PPO.py # PPO evaluation script
│
├── SAC/
│ ├── train_SAC.py # SAC training using Stable Baselines3
│ └── test_SAC.py # SAC evaluation script
│
├── env/
│ ├── __init__.py # Package initializer
│ ├── custom_hopper.py # Custom Hopper environment with domain randomization
│ ├── mujoco_env.py # MuJoCo wrapper class
│ └── assets/
│ └── hopper.xml # Hopper robot definition
│
├── mldl_env/ # Placeholder for additional environments
├── models/ # Saved models during training
├── checkpoints/ # Training checkpoints
├── tb_logs/ # TensorBoard logs
├── wandb/ # Weights & Biases logs
├── videos/ # Simulation videos
├── test_random_policy.py # Script for testing random policies
├── requirements.txt # Project dependencies
├── .gitignore # Git ignored files
└── README.md # Project documentation
- The
env/custom_hopper.pydefines two Gym environments:CustomHopper-source-v0: torso mass reduced by 30%.CustomHopper-target-v0: original Hopper model.
- REINFORCE and Actor-Critic are implemented from scratch.
- PPO and SAC are implemented using Stable Baselines3.
- UDR randomizes all link masses except the torso in the source domain.
To train an agent, run the corresponding training script for the desired algorithm. Replace ALGO with the algorithm name (e.g., ACTOR_CRITIC, REINFORCE, PPO, SAC):
python ALGO/train_ALGO.pyIf you want to enable Uniform Domain Randomization (UDR) during training, you need to set the enable_udr variable to True directly in the custom_hopper.py file before starting the training script.
To evaluate a trained agent, run the corresponding evaluation script for the desired algorithm. Replace ALGO with the algorithm name (e.g., ACTOR_CRITIC, REINFORCE, PPO, SAC):
python ALGO/test_ALGO.py- Authenticate with wandb:
wandb login
- Confirm MuJoCo is installed and licensed correctly.
- Verify
mujoco-pydependencies. - Run
wandb loginif authentication is required.
- Features
- Installation
- Folder Structure
- How It Works
- Training Instructions
- Evaluation
- Logging and Monitoring
- Troubleshooting
- Authors
- Webots-based Tesla simulation environment
- Custom Gymnasium-compatible Python environment
- Socket-based communication between Webots and Python
- Reinforcement Learning with Stable-Baselines3 (e.g., PPO, SAC)
- Parallel training via
SubprocVecEnv - Reward shaping for safe driving and obstacle avoidance
- Model checkpointing and logging (W&B + TensorBoard)
git clone https://github.com/Polixide/RL_final_project.git
cd RL_project_final/Project_extensionpython3 -m venv rl_env
source rl_env/bin/activate
# On Windows: rl_env\Scripts\activate
pip install -r requirements.txt
Go to the official Cyberbotics website: https://cyberbotics.com/#download , or use the direct link:
wget https://github.com/cyberbotics/webots/releases/download/R2025a/webots-R2025a-x86-64.tar.bz2
Extract and install:
tar -xjf webots-R2025a-x86-64.tar.bz2
sudo mv webots /opt/webots
You can now run Webots using:
webots
Project_extension/
├── controllers/ # Webots controllers (e.g., tesla_controller.py)
├── worlds/ # Webots simulation world (.wbt files)
├── model_dir/ # Final saved models (e.g., best_model.zip)
├── checkpoint_dir/ # Intermediate checkpoints (only contents ignored)
│ └── .gitkeep
├── tb_logs/ # TensorBoard logs (only contents ignored)
│ └── .gitkeep
├── wandb/ # Weights & Biases logs (only contents ignored)
│ └── .gitkeep
├── rl_env/ # your python virtual environment with all the dependencies
│
├── webots_remote_env.py # Webots socket communication handler
├── train_PPO.py # PPO training script
├── train_SAC.py # SAC training script
├── test_PPO.py # PPO evaluation script
├── test_SAC.py # SAC evaluation script
├── run_PPO.sh # Shell script to launch PPO training
├── run_SAC.sh # Shell script to launch SAC training
├── requirements.txt # All Python dependencies
└── README.md # Project documentation
-
Webots runs a 3D world with a Tesla robot.
-
The robot uses tesla_controller.py which acts as a socket server.
-
Python scripts (e.g., train.py) act as socket clients and communicate with the controller.
-
The RL agent sends reset, step, and exit commands.
-
Observations are gathered from sensors (e.g., distances, camera).
-
Rewards guide learning to avoid obstacles and drive safely.
-
Logging is done via W&B and TensorBoard.
Start Training
# For PPO training
./run_PPO.sh
# For SAC training
./run_SAC.sh
Training uses multiple environments in parallel via SubprocVecEnv. You can adjust the number of instances and hyperparameters inside train.py.
Step 1 — Open Webots Launch the simulation:
webots worlds/your_world.wbt
WARNING: Make sure your robot (Tesla) uses tesla_controller.py as the controller.
Step 2 - Launch the test.py file
python test_ALGO.py
You can see the car movements in webots while the test is running.
Before first use:
wandb login
Training logs and plots are available on https://wandb.ai under your account.
-
GitHub Password Auth Failed: Use a Personal Access Token (PAT) instead of your GitHub password. → Create Token
-
Folders Not Being Pushed to Git: Add a .gitkeep file inside empty folders that are otherwise ignored by .gitignore.
-
Socket Connection Error: Ensure Webots is running and listening on the correct port. You can set PORT manually in your environment or use the default (e.g., 10000).
-
Daniele Catalano (@Polixide) - Politecnico di Torino , Data Science & Engineering
-
Samuele Caruso (@Knightmare2002) - Politecnico di Torino , Data Science & Engineering
-
Francesco Dal Cero (@Dalceeee) - Politecnico di Torino , Data Science & Engineering
-
Ramadan Mehmetaj (@Danki02) - Politecnico di Torino , Data Science & Engineering