guandan-RL

A reinforcement learning system that plays a certain type of collaborative card game, Guandan.

The project is mainly for me you learn the basics of RL and getting my hands dirty. Still feel free to use it if you find that project interesting.

Currently my idea is to use proximal policy gradient for RL algorithm. The policy net first encode the state (more precisely observation, each agent cannot see others' card), then give a valid action. The state consisst of 16 historical hands, the player's current hand, and the cards remaining for everyone. Action masking is used to make sure output is legal by game rule. An state valuation is also produced from the same net, per PPO standard. However, I am not using GAE for advantage estimation, just a simple MC method, since individual trajectory won't get too long. Still, the model should also output a value head besides its action head, to decrease MC variance. I did a bit of reward shaping, where the step level reward is final reward - small penalty from remaining cards, to encourage agent deplete their deck.

I used Transformer as model backbone. The states are encoded as 20 tokens. I added attention masking as padding for history hands, while we not yet as full history. The model itself consists of 2 transformer layers each with 4 heads. It only has around 450k params, and could run in ms on every computer.

Train with python training.py and evaluate with python evaluate.py. These are my results of training the agent for 500 epoches, and playing against agents at different epoches:

Epoch	Win Rate (%)	Avg Score
0	98	2.70
99	63	0.87
199	59	0.49
299	57	0.34
399	49	-0.07

Epoch 0 is in fact the random agent.

700 vs 300: Agent won 585 games. Average score: 0.421

1000 vs 300: Agent won 626 games. Average score: 0.697

1000 vs 700: Agent won 570 games. Average score: 0.35

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
Bests		Bests
.gitignore		.gitignore
README.md		README.md
deck.py		deck.py
evaluate.py		evaluate.py
git_upload.sh		git_upload.sh
guandan_transformer.py		guandan_transformer.py
hf_upload.py		hf_upload.py
play.py		play.py
policy.py		policy.py
ppo_algorithm.py		ppo_algorithm.py
training.py		training.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

guandan-RL

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

guandan-RL

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages