Self-Play Reinforcement Model for Connect 4

A simple CNN that learns Connect-4 through self play.

The model predicts the probability of winning given the current state it is in based on previously played games. That's it, no tree search, no policies.

All the training is through self-play, no external data or players are used.

The two key elements of making this work are exploration (driven here by adding noise to the predictions) and sampling to eliminate correlation between training samples.

Exploration:

The exploration part is developed by adding noise to the neural net predictions:

preds = preds + np.random.normal(0, std_noise, len(preds))

That's it. This leads the model to not replay the same games over and over while also allowing it to play relatively good moves.

Sampling:

Model Architecture:

A simple 3 layer convolutional neural network does the trick here.

Evaluation:

A negamax player:

checks for a winning move and takes it
blocks the winning move of an opponent
avoids moves that allow the opponent to win in the next round

So basically the only way to win against a negamax player is by forcing a win.

Setup:

Using uv (Recommended)

Install uv if you haven't already:

curl -LsSf https://astral.sh/uv/install.sh | sh

Create and activate a virtual environment with dependencies:

uv venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate
uv sync

Run the training:
```
uv run python nnplayer.py
```

Alternative Setup (pip)

If you prefer using pip:

python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate
pip install -r requirements.txt
python nnplayer.py

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
board.py		board.py
config.py		config.py
game.py		game.py
main.py		main.py
models.py		models.py
play_against.py		play_against.py
players.py		players.py
pyproject.toml		pyproject.toml
simple_loader.py		simple_loader.py
tests.py		tests.py
transformer.py		transformer.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Self-Play Reinforcement Model for Connect 4

Exploration:

Sampling:

Model Architecture:

Evaluation:

Setup:

Using uv (Recommended)

Alternative Setup (pip)

Play against the model

Gradio Setup

Speeding up the model

About

Uh oh!

Releases

Packages

Languages

License

apapiu/connect_4_cnn

Folders and files

Latest commit

History

Repository files navigation

Self-Play Reinforcement Model for Connect 4

Exploration:

Sampling:

Model Architecture:

Evaluation:

Setup:

Using uv (Recommended)

Alternative Setup (pip)

Play against the model

Gradio Setup

Speeding up the model

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages