Skip to content

Add MaskablePPOPlayer#297

Open
zarns wants to merge 1 commit intobcollazo:masterfrom
zarns:feature/ppo
Open

Add MaskablePPOPlayer#297
zarns wants to merge 1 commit intobcollazo:masterfrom
zarns:feature/ppo

Conversation

@zarns
Copy link
Copy Markdown
Contributor

@zarns zarns commented Nov 16, 2024

Supersedes #287

I added the SubprocVecEnv to allow multiple games to be played at once, so training data is captured about 5x faster. I trained for 4.96 days straight (100,000,000 timesteps) with this configuration and the model.zip file is 1525MB (too big to upload to git, unfortunately). After 5 days of training, PPOPlayer has an 8% win rate against AB-pruning and an 11% win rate against ValueFunctionPlayer. Attached is the wandb graph output. You can see that the episode_reward_mean is not slowing down, but it's simply not training fast enough on my RTX 4070 to realistically surpass the AB-pruning player. Perhaps the model has too many layers, slowing down training, but I've played around quite a bit with different hyperparameters and model sizes and this is the best I've come up with.

The features_extractor CNN doesn't seem to help much in training shorter runs even with much smaller model sizes. I'm starting to think stablebaselines isn't the best way to go. AlphaZero uses a combo of MCTS with this actor/critic neural net, and maybe we need to pursue recreating it for Catan.

Note that if you want to pull the branch and play around with it, you'll have to delete the model.zip before each run to reset the architecture.

image

@netlify
Copy link
Copy Markdown

netlify bot commented Nov 16, 2024

‼️ Deploy request for catanatron-staging rejected.

Name Link
🔨 Latest commit bc2ec10

@zarns
Copy link
Copy Markdown
Contributor Author

zarns commented Nov 16, 2024

Looks like the build fails anyway bc the sb3_contrib requirements aren't met. We could just leave this as an open pull request too, I guess

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant