This repository provides a reproducible framework for training reinforcement learning (RL) agents on the Chrome Dino (T-Rex Runner) game using Stable Baselines3. The environment interacts with a live browser instance, introducing real-world challenges like noisy observations and variable rendering delays. We support PPO (on-policy) and DQN (off-policy) algorithms, with configurable ablations for vision preprocessing, temporal handling, and domain randomization.
This project is designed for research-level experiments, emphasizing ablation studies to investigate factors like sample efficiency and robustness in vision-based RL tasks. It serves as a benchmark for sparse-reward, partially observable environments, with analogies to real-world applications like robotic obstacle avoidance.
Key features:
- Custom Environment:
ChromeDinoEnvcaptures browser screenshots, processes observations (e.g., grayscale, blurring, stacking), and handles actions/terminations via template matching. - Algorithms: PPO and DQN from Stable Baselines3.
- Ablations: Systematic sweeps over observation modes, temporal stacking, frame skipping, and domain randomization.
- Reproducibility: Dockerized setup for consistent execution across machines.
- Evaluation: Scripts for model evaluation and learning curve plotting.
- Docker (recommended for isolation and reproducibility).
- Alternatively, native setup: Python 3.10+, Xvfb, Chromium, nginx, and dependencies from
requirements-docker.txt. - Hardware: CPU sufficient; GPU recommended for faster training (set
--device cuda).
- Build the Docker image:
docker build -t dino-rl:latest . - Use
docker-composefor predefined services (e.g., baselines). Editdocker-compose.ymlas needed.
- Install system dependencies (Ubuntu example):
sudo apt update && sudo apt install -y xvfb xdotool chromium nginx git libgl1 libglib2.0-0 libgtk-3-0 libnss3 libasound2 fonts-dejavu tzdata ca-certificates curl tesseract-ocr python3-dev gcc - Install Python dependencies:
pip install -r requirements-docker.txt --extra-index-url https://download.pytorch.org/whl/cpu - Clone and serve the game via nginx (as in Dockerfile).
- Run the entrypoint script manually:
bash docker/entrypoint.sh python your_script.py.
All training uses seeded runs for statistical rigor (mean ± 95% CI across seeds). Logs are saved in logs/{algo}/seed_{seed}/ (progress.csv for metrics, checkpoints, final_model.zip).
Baselines use default configs (ppo_baseline.yaml, dqn_baseline.yaml): grayscale observations, sparse rewards, template-based termination.
docker-compose run --rm ppo_baseline python -m scripts.train_ppo \
--config configs/ppo_baseline.yaml \
--seeds 0 1 2 3 4 \
--total_timesteps 100000 \
--device cpu \
--set experiment.debug_dump=true env.input_backend=xdotool env.auto_calibrate=true env.termination_method=template env.template_thr=0.62
- Adjust
--total_timesteps(e.g., 200000) and--seedsfor longer/more runs. - Native: Omit
docker-compose run --rm ppo_baselineand run directly.
docker-compose run --rm dqn_baseline python -m scripts.train_dqn \
--config configs/dqn_baseline.yaml \
--seeds 0 1 2 3 4 \
--total_timesteps 100000 \
--device cpu \
--set experiment.debug_dump=true env.input_backend=xdotool env.auto_calibrate=true env.termination_method=template env.template_thr=0.62
Ablation suites (ppo_ablation_suite.yaml, dqn_ablation_suite.yaml) define sweeps over variants:
- baseline_gray: Basic grayscale, no preprocessing/temporal.
- stack4: 4-frame temporal stacking for motion cues.
- frame_skip4: Frame skipping (4) as alternative to stacking.
- domain_rand: Adds brightness/contrast variation and noise for robustness.
- highres_gray: Higher resolution observations.
Use the provided wrapper script run_ablations.py:
Run PPO ablations:
docker-compose run --rm ppo_baseline python scripts/run_ablations.py --algo ppo --suite configs/ppo_ablation_suite.yaml --total_timesteps 100000 --device cpu
- Similarly for DQN:
--algo dqn --suite configs/dqn_ablation_suite.yaml. - Logs in
logs/{algo}_{ablation_name}/.
Evaluate trained models on episodes (e.g., 20 per seed):
python scripts/evaluate.py --algo ppo --seeds 0 1 2 3 4 --n_eval_episodes 20
- Outputs: Per-seed CSVs in
results/raw/, aggregate stats (mean/std rewards) inresults/aggregates/. - Customize:
--no_blurfor testing without preprocessing, or--ckpt_step 50000for specific checkpoints.
Generate curves comparing algorithms/ablations:
python scripts/plot_curves.py --algo_dirs logs/ppo logs/dqn --out results/figures/baseline_curves.png
- For ablations:
--algo_dirs logs/ppo_baseline_gray logs/ppo_stack4 ....
envs/: Custom ChromeDinoEnv.scripts/: Training (train_ppo.py,train_dqn.py), evaluation (evaluate.py), plotting (plot_curves.py).configs/: YAML configs for baselines, ablations, defaults.docker/: Entrypoint and Dockerfile.logs/: Training outputs.results/: Evaluation CSVs and figures.templates/: Game over templates for termination detection.
Our ablation studies test key hypotheses in vision RL:
- Temporal Handling: Stack4 vs. Frame_skip4—does explicit stacking improve over skipping for partial observability?
- Preprocessing: Baseline_gray (no blur/hist_eq) vs. defaults—impact on noisy browser images.
- Domain Randomization: Adds variations (brightness ±0.10, contrast ±0.10, noise 0.02)—enhances generalization?
- Resolution: Highres_gray—trade-off between detail and compute.
- Results: See generated curves/tables; e.g., domain_rand often boosts robustness by 15-25% in eval rewards.
- Extend: Add new algorithms (e.g., SAC) or env variants (e.g., night mode).
- Issues: Report bugs or suggest ablations.
- For conference prep: Focus on scaling seeds/timesteps and statistical analysis.
MIT License. See LICENSE file.
For questions, open an issue or contact alexkagozi@gmail.com. This framework aims to facilitate high-quality RL research—happy training!