Improving Robustness of AlphaZero Algorithms to Test-Time Environment Changes 🛡️

Welcome to the official repository of the paper Improving Robustness of AlphaZero Algorithms to Test-Time Environment Changes! In this repo, we provide the code and resources to reproduce the experiments and results presented in the paper, as well as experimenting with the implementations of our novel Extra-Deep Planning (EDP) Algorithm and standard AlphaZero (AZ).

Local Installation

Clone the repository:

git clone https://github.com/TheEmotionalProgrammer/az-generalization.git
cd az-generalization

Create a virtual environment and activate it:

python3 -m venv venv
source venv/bin/activate  # On Windows use `venv\Scripts\activate`

Install the dependencies:

pip install -r requirements.txt

Repository Structure

The source code folder src is organized as follows:

src
│── az 
│   ├── nn.py 
│   ├── controller.py
│   └── planning.py
│── edp  
│   └── planning.py
│── mcts_core
│   ├── policies
│   │   └── ...
│   ├── node.py
│   ├── planning.py
│   ├── runner.py
│   └── utils.py
├── environments
│   ├── gridworld
│       └── ...
│   ├── observation_embeddings.py
│   └── register.py
├── experiments
│   ├── evaluation
│   │   └── evaluate_from_config.py
│   │   └── plotting
│   │       └── ...
│   └── training
│   │   └── train_from_config.py
│   └── parameters.py
├── utils
    └── ...

where:

az contains the implementation of the standard AZ framework, including training and planning algorithms.
edp contains the implementation of the EDP algorithm.
mcts_core contains the core components of the Monte Carlo Tree Search (MCTS) used in both AZ and EDP.
environments contains the definitions of the custom grid-world environments used for training and evaluation.
experiments contains the scripts for training and evaluating the models.
utils contains plotting and logging utilities.

Reproducing Results

The folder weights contains the pre-trained weights of the policy-value neural networks used for EDP and AZ evaluation. You can use these weights to reproduce the results presented in the paper, or train your own models from scratch using the provided training script in src/experiments/training/train_from_config.py. The evaluation can be run using the provided evaluation script in src/experiments/evaluation/evaluate_from_config.py. Both training and evaluation scripts include several parameters that can be adjusted directly in the file. The hyperparameter configurations used in the paper are specified in the related appendix.

Citation

If you use this code or the results obtained from it in your research, please cite our paper:

@misc{tamassia2025improvingrobustnessalphazeroalgorithms,
      title={Improving Robustness of AlphaZero Algorithms to Test-Time Environment Changes}, 
      author={Isidoro Tamassia and Wendelin Böhmer},
      year={2025},
      eprint={2509.04317},
      archivePrefix={arXiv},
      primaryClass={cs.AI},
      url={https://arxiv.org/abs/2509.04317}, 
}

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
.vscode		.vscode
assets		assets
src		src
weights		weights
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Improving Robustness of AlphaZero Algorithms to Test-Time Environment Changes 🛡️

Local Installation

Repository Structure

Reproducing Results

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Improving Robustness of AlphaZero Algorithms to Test-Time Environment Changes 🛡️

Local Installation

Repository Structure

Reproducing Results

Citation

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages