AlphaTicTacToe (AlphaGo for Tic-Tac-Toe)

This repository accompanies this blog post that I wrote.

AlphaTicTacToe (AlphaGo for Tic-Tac-Toe)

I recreated the AlphaGo / AlphaZero learning paradigm in a tiny Tic-Tac-Toe environment with no human data and no labels. It's powered with raw self-play, Monte Carlo Tree Search, and a neural net learning from scratch how to play the game.

This project was inspired by the beauty of agents learning through trial and error (and because I never made solid RL project before).

What I Built

A self-play RL agent that learns to play Tic-Tac-Toe from scratch
Combined Monte Carlo Tree Search with a policy + value network
Trained using a loop of self-play → replay buffer → network updates
Evaluated against random agents, minimax bots, and humans
Built a CLI game so you can play against it yourself

Core Concepts (all explained better in the blog post)

Component	Description
MCTS	At every move, the agent runs Monte Carlo Tree Search guided by its neural net.
Policy + Value Network	2 PyTorch models output move probabilities + board value.
Self-Play Loop	The agent plays against itself using MCTS, collects training data, and improves over time.
No Human Supervision	Like AlphaZero, it learns only from its own games.
Training Signal	Targets come from final outcomes of self-play games and MCTS stats.

Demo

Here's the first game I played against the bot

I won here but let's trace the steps that the model took. I got lucky by going first and taking the best position in the game (center) but every move after that, the agent knew to block my move if I'm close to a full row or mirror my move otherwise. This was really cool to see.

Here's the agent after I decided to not start in the middle

The agent was smart enough to know to cancel every move that I made.

Repo Structure (GPT is clutch at making these)

AlphaTicTacToe/
│
├── README.md                 
├── requirements.txt          # dependencies (torch, numpy, wandb, etc.)
├── .gitignore                # .pt files, logs, etc.
│
├── alphattt/                 # core logic package
│   ├── __init__.py
│   ├── config.py             # hyperparameters and constants
│   ├── tictactoe.py          # environment (board, rules, win logic)
│   ├── mcts.py               # Monte Carlo Tree Search algorithm
│   ├── network.py            # policy + value PyTorch model
│   ├── replay_buffer.py      # replay buffer to store game trajectories for training
│   ├── self_play.py          # self-play loop and data collection
│   ├── trainer.py            # training logic using replay buffer
│   ├── evaluate.py           # evaluate model vs random/minimax/human
│   └── utils.py              # ELO rating, replay buffer, misc helpers
│
├── scripts/                  
│   ├── train.sh              # train from scratch
│   └── evaluate.sh           # evaluate trained agent
│
├── play_vs_agent.py          # launch CLI game
├── web_demo.py               # (coming soon) Gradio/Streamlit frontend
│
├── notebooks/                
│   └── analysis.ipynb        # Training curves, MCTS stats, etc.
│
├── wandb/                    # W&B logs (auto-generated)
│
└── assets/                   
    ├── demo.gif              
    └── architecture.png

Training Setup

1000s of self-play games
MCTS with 25–100 simulations per move
Replay buffer of game history
Mini-batch training using PyTorch
Tracked using wandb

Install & Play Against It

Installing it is pretty straightforward:

git clone https://github.com/akhilvreddy/AlphaTicTacToe.git
cd AlphaTicTacToe
pip install -r requirements.txt

You can play against the agent that I've trained through CLI. (Hint: you will probably draw / lose if you decide to play second)

python play_vs_agent.py

References

AlphaGo paper
AlphaGo Zero blog post
Countless threads on RL that I've seen on X

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AlphaTicTacToe (AlphaGo for Tic-Tac-Toe)

What I Built

Core Concepts (all explained better in the blog post)

Demo

Repo Structure (GPT is clutch at making these)

Training Setup

Install & Play Against It

References

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
alphattt		alphattt
assets		assets
notebook		notebook
.gitignore		.gitignore
README.md		README.md
play_vs_agent.py		play_vs_agent.py
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

AlphaTicTacToe (AlphaGo for Tic-Tac-Toe)

What I Built

Core Concepts (all explained better in the blog post)

Demo

Repo Structure (GPT is clutch at making these)

Training Setup

Install & Play Against It

References

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages