PPO-Q: Proximal Policy Optimization with Parametrized Quantum Policies or Values

Setup

# Ensure that the Python version is 3.10
pip install --editable ./third_party/torchquantum
pip install quarkstudio==7.0.5
pip install gymnasium[box2d]==0.29.1

Usage

We offer a user-friendly Python script and accompanying configuration files to facilitate training hybrid quantum-classical models in diverse reinforcement learning environments.

python main.py <config_file_name>

Replace <config_file_name> with the desired environment from the ./config directory or create a custom configuration of your own.

Description of Configuration Parameters

Parameter	Description	Example Value
`env_name`	Name of the reinforcement learning environment.	`LunarLander-v2`
`n_steps`	Number of steps per environment per update.	1024
`mini_batch_size`	Size of the mini-batch.	64
`max_train_steps`	Maximum number of training steps.	1,750,000
`lr_a`	Learning rate for the actor network.	0.003
`lr_c`	Learning rate for the critic network.	0.0003
`gamma`	Discount factor.	0.999
`lamda`	GAE parameter.	0.98
`epsilon`	PPO clip parameter.	0.2
`K_epochs`	Number of PPO epochs.	4
`entropy_coef`	Entropy coefficient.	0.01
`num_envs`	Number of environments to run in parallel.	16
`n_blocks`	Number of blocks in the quantum reinforcement learning network.	1
`n_wires`	Number of qubits in the quantum circuit.	4
`use_quafu`	Specify whether to use Quafu quantum hardware	True
`key`	Token required for accessing Quafu cloud quantum hardware	' '

Training results can be visualized using TensorBoard:

tensorboard --logdir=./runs

Results

Benchmark reinforcement learning environments have been successfully solved using PPO-Q, as illustrated in the following table and figures.

Environment	State Space Dimension	Action Space Dimension
CartPole	4	2
MountainCar	2	3
Acrobot	6	3
LunarLander	8	4
MountainCar(C)	2	1
Pendulum	3	1
LunarLander(C)	8	2
BipedalWalker	24	4

CartPole	Acrobot	LunarLander

MountainCarC	Pendulum	BipedalWalker

Citation

arxiv is coming soon!

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
config		config
img		img
model		model
third_party/torchquantum		third_party/torchquantum
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
main.py		main.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PPO-Q: Proximal Policy Optimization with Parametrized Quantum Policies or Values

Setup

Usage

Description of Configuration Parameters

Results

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

PPO-Q: Proximal Policy Optimization with Parametrized Quantum Policies or Values

Setup

Usage

Description of Configuration Parameters

Results

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages