- Known dependencies: Python (3.5.4), OpenAI gym (0.10.5), tensorflow (1.14.0), numpy (1.18.2)
-
--scenario: defines which environment in the MPE is to be used (default:"cn") -
--max-episode-lenmaximum length of each episode for the environment (default:25) -
--num-episodestotal number of training episodes (default:60000) -
--num-adversaries: number of adversaries in the environment (default:0)
-
--lr: learning rate (default:1e-2) -
--gamma: discount factor (default:0.95) -
--batch-size: batch size (default:800) -
--num-units: number of units in the MLP (default:128)
-
--prior-buffer-size: prior network training buffer size -
--prior-num-iter: prior network training iterations -
--prior-training-rate: prior network training rate -
--prior-training-percentile: control threshold for KL value to get labels
-
--exp-name: name of the experiment, used as the file name to save all results (default:None) -
--save-dir: directory where intermediate training results and model will be saved (default:"/tmp/policy/") -
--save-rate: model is saved every time this number of episodes has been completed (default:1000) -
--load-dir: directory where training state and model are loaded from (default:"") -
--plots-dir: directory where training curves are saved (default:"./learning_curves/") -
--restore_all: whether to restore existing I2C network
I2C be learned end-to-end or in a two-phase manner. This code is implemented for end-to-end manner which could take more training time compared with the latter manner
For Cooperative Navigation,
python3 train.py --scenario 'cn' --prior-training-percentile 60 --lr 1e-2
For Predator Prey,
python3 train.py --scenario 'pp' --prior-training-percentile 40 --lr 1e-3
If you are using the codes, please cite our paper.
@inproceedings{ding2020learning,
title={Learning Individually Inferred Communication for Multi-Agent Cooperation},
author={Ding, Ziluo and Huang, Tiejun and Lu, Zongqing},
booktitle={NeurIPS},
year={2020}
}
This code is developed based on the source code of MADDPG by Ryan Lowe