This repository contains the final project for the course Social Robotics – MU5EEH15 (2025/2026).
The goal is to explore imitation learning in the Atari game Space Invaders, starting from Behavioral Cloning (BC) and extending it with GAIL for adversarial reward refinement.
All models operate directly on raw RGB frames from ALE/SpaceInvaders-v5.
This project implements the full imitation-learning pipeline:
- Loading Minari expert demonstrations as the only dataset.
- Training Behavioral Cloning (BC) models with three architectures:
- MLP
- CNN / DQN-style network
- Vision Transformer (ViT)
- Logging metrics (loss, accuracy, metadata) into CSV files.
- Evaluating trained agents on multiple Space Invaders episodes.
- Experimenting with GAIL to improve imitation quality beyond pure BC.
Everything is implemented in Python + PyTorch.
Entry point of the project. Provides a menu to:
- train BC models,
- load Minari data,
- view existing models/metrics,
- evaluate saved policies.
Core BC implementation:
- dataset/dataloader handling,
- three model architectures (MLP, CNN, ViT),
- training loop with validation,
- checkpoint saving,
- CSV logging.
Handles all data operations:
- loading Minari expert trajectories,
- converting them to internal
.pklformat, - splitting train/validation sets.
Creates the ALE/SpaceInvaders-v5 environment:
- applies wrappers,
- handles seeding,
- performs preprocessing.
Evaluates a trained model on several episodes and updates the metrics CSV.
Contains experimental GAIL components:
- discriminator network,
- reward extraction utilities,
- evaluation scripts.
Scripts & notebooks for plotting:
- learning curves,
- reward distributions,
- architecture/dataset comparisons.
Dependency and environment management via uv.
The project supports two sources of demonstrations:
The code includes a built-in teleoperation tool that allows the user to record their own gameplay using the keyboard.
These demonstrations are saved as .pkl files and follow the same structure used for training:
- raw RGB frames,
- the selected action,
- reward information.
Human demos can be collected at any time through the main script.
The project can automatically download and convert the Minari Space Invaders expert dataset.
A dedicated script loads the dataset and transforms it into the same .pkl format used for BC training.
Using the two data sources, the training pipeline allows:
- human-only training,
- Minari-only training,
- or mixed datasets combining both.
- Load Minari dataset through
DataManager. - Split into:
- 80% training,
- 20% validation.
- Choose a model architecture (MLP, CNN, ViT).
- Train for 50 epochs using batch sizes:
- 32, 64, 128, or 256.
- Track validation loss to save the best-performing checkpoint.
- Log everything to a CSV:
- losses,
- accuracies,
- architecture info,
- dataset info,
- timestamps.
This enables systematic comparison across architectures.
test_model.py runs a trained model over multiple episodes and computes:
- mean reward,
- standard deviation,
- episode lengths.
These results are appended to the original metrics CSV to keep all experiment data synchronized.
This allows comparison between:
- MLP vs CNN vs ViT,
- BC vs GAIL-refined models.
The gail/ module implements a simplified GAIL pipeline:
- A discriminator distinguishes expert vs agent
(state, action)pairs. - The discriminator output is transformed into a learned reward.
- The policy is then trained (or fine-tuned) using this adversarial reward.
This is used to explore whether GAIL helps the model:
- recover from underrepresented states,
- improve generalization,
- reduce imitation bias.
This project is configured for Python 3.13 and uv.
pip install uv
git clone https://github.com/tetano02/SpaceInvadersBC.git
cd SpaceInvadersBC
uv syncuv run python -c "import torch; print(torch.__version__, torch.cuda.is_available())"uv run load_minari_dataset.pyuv run main.pyChoose:
- dataset: Minari
- architecture: MLP / CNN / ViT
- batch size and number of epochs
uv run test_model.py --model-path path/to/model.ptUse scripts in:
plot_results/
- Minari provides strong teacher behavior.
- CNN and ViT outperform MLP due to visual structure.
- GAIL may improve robustness in rarely-seen states.
- Mixed architecture results can highlight generalization limits.
SpaceInvadersBC (formerly CTRL+C – CTRL+PAC)
- Agnelli Stefano
- Cremonesi Andrea
- Mombelli Tommaso
- Sun Wen Wen