Skip to content

akhilvreddy/AlphaTicTacToe

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

This repository accompanies this blog post that I wrote.


AlphaTicTacToe (AlphaGo for Tic-Tac-Toe)

I recreated the AlphaGo / AlphaZero learning paradigm in a tiny Tic-Tac-Toe environment with no human data and no labels. It's powered with raw self-play, Monte Carlo Tree Search, and a neural net learning from scratch how to play the game.

This project was inspired by the beauty of agents learning through trial and error (and because I never made solid RL project before).


What I Built

  • A self-play RL agent that learns to play Tic-Tac-Toe from scratch
  • Combined Monte Carlo Tree Search with a policy + value network
  • Trained using a loop of self-play → replay buffer → network updates
  • Evaluated against random agents, minimax bots, and humans
  • Built a CLI game so you can play against it yourself

Core Concepts (all explained better in the blog post)

Component Description
MCTS At every move, the agent runs Monte Carlo Tree Search guided by its neural net.
Policy + Value Network 2 PyTorch models output move probabilities + board value.
Self-Play Loop The agent plays against itself using MCTS, collects training data, and improves over time.
No Human Supervision Like AlphaZero, it learns only from its own games.
Training Signal Targets come from final outcomes of self-play games and MCTS stats.

Demo

Here's the first game I played against the bot

I won here but let's trace the steps that the model took. I got lucky by going first and taking the best position in the game (center) but every move after that, the agent knew to block my move if I'm close to a full row or mirror my move otherwise. This was really cool to see.

Here's the agent after I decided to not start in the middle

The agent was smart enough to know to cancel every move that I made.


Repo Structure (GPT is clutch at making these)

AlphaTicTacToe/
│
├── README.md                 
├── requirements.txt          # dependencies (torch, numpy, wandb, etc.)
├── .gitignore                # .pt files, logs, etc.
│
├── alphattt/                 # core logic package
│   ├── __init__.py
│   ├── config.py             # hyperparameters and constants
│   ├── tictactoe.py          # environment (board, rules, win logic)
│   ├── mcts.py               # Monte Carlo Tree Search algorithm
│   ├── network.py            # policy + value PyTorch model
│   ├── replay_buffer.py      # replay buffer to store game trajectories for training
│   ├── self_play.py          # self-play loop and data collection
│   ├── trainer.py            # training logic using replay buffer
│   ├── evaluate.py           # evaluate model vs random/minimax/human
│   └── utils.py              # ELO rating, replay buffer, misc helpers
│
├── scripts/                  
│   ├── train.sh              # train from scratch
│   └── evaluate.sh           # evaluate trained agent
│
├── play_vs_agent.py          # launch CLI game
├── web_demo.py               # (coming soon) Gradio/Streamlit frontend
│
├── notebooks/                
│   └── analysis.ipynb        # Training curves, MCTS stats, etc.
│
├── wandb/                    # W&B logs (auto-generated)
│
└── assets/                   
    ├── demo.gif              
    └── architecture.png      

Training Setup

  • 1000s of self-play games
  • MCTS with 25–100 simulations per move
  • Replay buffer of game history
  • Mini-batch training using PyTorch
  • Tracked using wandb

Install & Play Against It

Installing it is pretty straightforward:

git clone https://github.com/akhilvreddy/AlphaTicTacToe.git
cd AlphaTicTacToe
pip install -r requirements.txt

You can play against the agent that I've trained through CLI. (Hint: you will probably draw / lose if you decide to play second)

python play_vs_agent.py

References

  1. AlphaGo paper
  2. AlphaGo Zero blog post
  3. Countless threads on RL that I've seen on X

About

AlphaGo and (AlphaZero) reimplemented for TicTacToe.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors