Yuting Tang, Yivan Zhang, Johannes Ackermann, Yu-Jie Zhang, Soichiro Nishimori, Masashi Sugiyama
Reinforcement Learning Conference 2025
[OpenReview]
You can build and run the Docker container as follows.
cd docker
docker build -t rra_image . -f Dockerfile docker run -dit -p 8888:22 --mount type=bind,source=/path/to/your/RRA,destination=/workspace/RRA --name rra_container -m 16g --gpus all rra_image /bin/bashReplace /path/to/your/RRA with the absolute path to your local RRA directory.
Once the container is up and running, you can access its shell with:
docker exec -it rra_container bashTo run Continuous Control and Portfolio experiments inside the Docker container, you will also need to manually install:
pip install torch stable-baselines3 empyrical tensorboard distrax IPythonMake sure to run this inside the container after building it.
If you encounter issues with Cython, try the following:
pip uninstall Cython
pip install Cython==3.0.0a10This can resolve version conflicts or compatibility issues with certain dependencies.
See this notebook.
See this notebook.
Partially built upon Stable-Baselines3.
You can run the continuous control experiments using the provided shell script.
cd continuous_control
bash run_td3.sh [SEED] [ENV] [RECURSIVE_TYPE]SEED (optional): Random seed for training. Default: 42.
ENV (optional): OpenAI Gym environment name. Default: Ant-v5. Available options: Ant-v5, Walker2d-v5, LunarLanderContinuous-v3.
RECURSIVE_TYPE (optional): Type of recursive aggregation to use in training. Default: dsum. Available options: dsum, dmax, min, dsum_dmax, dsum_variance.
-
dsum: Discounted Sum, computed with discount factor
$\gamma = 0.99$ . -
dmax: Discounted Max, computed with discount factor
$\gamma = 0.99$ . - min: Minimum reward.
-
dsum_dmax: A combination of Discounted Sum and Discounted Max, both using
$\gamma = 0.99$ . -
dsum_variance: Discounted Sum minus the reward Variance, computed with discount factor
$\gamma = 0.99$ .
If no arguments are provided, the script will use the default values.
This script runs the portfolio experiment using pre-defined market environments and settings.
cd portfolio
./run_portfolio.sh| dsum | dmax | min | dsum + dmax | dsum - var |
|---|---|---|---|---|
![]() |
![]() |
![]() |
![]() |
![]() |
| dsum | dmax | min | dsum + dmax | dsum - var |
|---|---|---|---|---|
![]() |
![]() |
![]() |
![]() |
![]() |
| dsum | dmax | min | dsum + dmax | dsum - var |
|---|---|---|---|---|
![]() |
![]() |
![]() |
![]() |
![]() |














