Welcome to the official repository of the paper Improving Robustness of AlphaZero Algorithms to Test-Time Environment Changes! In this repo, we provide the code and resources to reproduce the experiments and results presented in the paper, as well as experimenting with the implementations of our novel Extra-Deep Planning (EDP) Algorithm and standard AlphaZero (AZ).
- Clone the repository:
git clone https://github.com/TheEmotionalProgrammer/az-generalization.git
cd az-generalization- Create a virtual environment and activate it:
python3 -m venv venv
source venv/bin/activate # On Windows use `venv\Scripts\activate`- Install the dependencies:
pip install -r requirements.txtThe source code folder src is organized as follows:
src
│── az
│ ├── nn.py
│ ├── controller.py
│ └── planning.py
│── edp
│ └── planning.py
│── mcts_core
│ ├── policies
│ │ └── ...
│ ├── node.py
│ ├── planning.py
│ ├── runner.py
│ └── utils.py
├── environments
│ ├── gridworld
│ └── ...
│ ├── observation_embeddings.py
│ └── register.py
├── experiments
│ ├── evaluation
│ │ └── evaluate_from_config.py
│ │ └── plotting
│ │ └── ...
│ └── training
│ │ └── train_from_config.py
│ └── parameters.py
├── utils
└── ...
where:
azcontains the implementation of the standard AZ framework, including training and planning algorithms.edpcontains the implementation of the EDP algorithm.mcts_corecontains the core components of the Monte Carlo Tree Search (MCTS) used in both AZ and EDP.environmentscontains the definitions of the custom grid-world environments used for training and evaluation.experimentscontains the scripts for training and evaluating the models.utilscontains plotting and logging utilities.
The folder weights contains the pre-trained weights of the policy-value neural networks used for EDP and AZ evaluation. You can use these weights to reproduce the results presented in the paper, or train your own models from scratch using the provided training script in src/experiments/training/train_from_config.py. The evaluation can be run using the provided evaluation script in src/experiments/evaluation/evaluate_from_config.py. Both training and evaluation scripts include several parameters that can be adjusted directly in the file. The hyperparameter configurations used in the paper are specified in the related appendix.
If you use this code or the results obtained from it in your research, please cite our paper:
@misc{tamassia2025improvingrobustnessalphazeroalgorithms,
title={Improving Robustness of AlphaZero Algorithms to Test-Time Environment Changes},
author={Isidoro Tamassia and Wendelin Böhmer},
year={2025},
eprint={2509.04317},
archivePrefix={arXiv},
primaryClass={cs.AI},
url={https://arxiv.org/abs/2509.04317},
}


