ICML 2025: Diversifying Policy Behaviors with Extrinsic Behavioral Curiosity

Zhenglin Wan, Xingrui Yu, David Mark Bossens, Yueming Lyu, Qing Guo, Flint Xiaofeng Fan, Yew-Soon Ong, Ivor W. Tsang

Centre for Frontier AI Research (CFAR), A*STAR, Singapore

1. Short Introduction

Extrinsic Behavior Curiosity (EBC) mechanism is a technique which enables the robot to learn a broad range of high-performing and behavioral-diverse policies via Quality Diversity (QD) Optimization. EBC could be seamlessly integrated into any Reinforcement Learning (RL) and Inverse Reinforcement Learning (IRL) algorithm, providing a generic techniques to enhance robot learning in terms of diversity. This repository contains the implementation of EBC and its QD base algorithm.

2. Quick Start

2.0 Fork and Clone the repository.

[Recommended] Fork this repository to your own github account, and clone it to local:

git clone [the address of forked repository]
cd EBC

Or directly run below command in your terminal:

git clone https://github.com/vanzll/EBC.git
cd EBC

2.1 Installing key packages.

Firstly, run below command to obtain basic packages required:

conda env create -f ebc.yml
conda activate ebc
pip install -e pyribs/

Then install the specific version of jaxlib with CUDA support:

# for CUDA 11 and cuDNN 8.2 or newer
wget https://storage.googleapis.com/jax-releases/cuda11/jaxlib-0.3.25+cuda11.cudnn82-cp39-cp39-manylinux2014_x86_64.whl
pip install jaxlib-0.3.25+cuda11.cudnn82-cp39-cp39-manylinux2014_x86_64.whl

# OR 

# for CUDA 11 and cuDNN 8.0.5 or newer 
wget https://storage.googleapis.com/jax-releases/cuda11/jaxlib-0.3.25+cuda11.cudnn805-cp39-cp39-manylinux2014_x86_64.whl
pip install jaxlib-0.3.25+cuda11.cudnn805-cp39-cp39-manylinux2014_x86_64.whl

2.2 Reproduce the results.

We provided runner files for your convenience to reproduce the results. You can find the results in experiments_4_good_and_diverse_elite_with_measures_top500/{env_method}/{seed}/summary.csv

QD-IRL:

source runners/local/train_il_ebc_halfcheetah.sh
source runners/local/train_il_ebc_humanoid.sh
source runners/local/train_il_ebc_walker2d.sh

QD-RL

source runners/local/train_rl_ebc_humanoid.sh

Inside the runner file, there are three lines that you should modify to get access to results with different settings:

1. SEED: run the experients with multiple random seeds to justify the results.
1. intrinsic_module (for IL methods): gail, vail or diffail, corresponding to three base IL methods in the paper.
1. archive_bonus: true or false, corresponding to EBC-improved version or baseline version of IL methods.

3. Results.

The main results of QD-IRL are shown below. For more results, please refer to our published paper.

3.1 Quantative results.

Method	Halfcheetah QD-Score	Walker2d QD-Score	Humanoid QD-Score
GAIL-EBC (Ours)	2.64 × 10⁶ ± 9.21 × 10⁴	3.42 × 10⁶ ± 1.36 × 10⁵	5.31 × 10⁶ ± 5.78 × 10⁵
GAIL	2.02 × 10⁶ ± 8.36 × 10⁵	2.47 × 10⁶ ± 2.88 × 10⁵	1.86 × 10⁶ ± 4.51 × 10⁵
VAIL-EBC (Ours)	3.78 × 10⁶ ± 7.69 × 10⁴	3.19 × 10⁶ ± 2.15 × 10⁵	7.26 × 10⁶ ± 3.10 × 10⁵
VAIL	3.62 × 10⁶ ± 4.00 × 10⁴	2.40 × 10⁶ ± 2.13 × 10⁵	5.09 × 10⁶ ± 6.86 × 10⁵
DiffAIL-EBC (Ours)	3.99 × 10⁶ ± 3.11 × 10⁵	2.93 × 10⁶ ± 7.59 × 10⁴	8.92 × 10⁶ ± 6.60 × 10⁵
DiffAIL	4.02 × 10⁶ ± 5.82 × 10⁴	1.71 × 10⁶ ± 4.08 × 10⁵	3.56 × 10⁶ ± 3.71 × 10⁵

3.2 Figures.

The figure below shows the performance comparison of all four metrics across three environments.

4. Acknowledgements.

The GPU-accelerated rigid body simulator is adapted from Brax.
The key contribution of our work is built on QD library pyribs.
The vectorized RL training pipeline of this repository is inspired by PPGA paper.
The RL, IL algorithms of this repository are partly adapted from cleanrl.

5. Cite.

If you found our work helpful, please consider to cite our work via:

@misc{wan2025diversifying,
      title={Diversifying Policy Behaviors with Extrinsic Behavioral Curiosity}, 
      author={Zhenglin Wan and Xingrui Yu and David Mark Bossens and Yueming Lyu and Qing Guo and Flint Xiaofeng Fan and Yew Soon Ong and Ivor Tsang},
      year={2025},
      eprint={2410.06151},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2410.06151}, 
}

Name		Name	Last commit message	Last commit date
Latest commit History 64 Commits
RL		RL
algorithm		algorithm
ddpm_disc		ddpm_disc
envs		envs
models		models
pyribs		pyribs
runners/local		runners/local
tests		tests
trajs_good_and_diverse_elite_with_measures_top500/4episodes		trajs_good_and_diverse_elite_with_measures_top500/4episodes
utils		utils
README.md		README.md
combined__All_metrics.png		combined__All_metrics.png
ebc.yml		ebc.yml
framework.png		framework.png
gen_traj.py		gen_traj.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ICML 2025: Diversifying Policy Behaviors with Extrinsic Behavioral Curiosity

1. Short Introduction

2. Quick Start

2.0 Fork and Clone the repository.

2.1 Installing key packages.

2.2 Reproduce the results.

3. Results.

3.1 Quantative results.

3.2 Figures.

4. Acknowledgements.

5. Cite.

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

vanzll/EBC

Folders and files

Latest commit

History

Repository files navigation

ICML 2025: Diversifying Policy Behaviors with Extrinsic Behavioral Curiosity

1. Short Introduction

2. Quick Start

2.0 Fork and Clone the repository.

2.1 Installing key packages.

2.2 Reproduce the results.

3. Results.

3.1 Quantative results.

3.2 Figures.

4. Acknowledgements.

5. Cite.

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages