$ git clone https://github.com/jaybutera/tetrisRL
$ cd tetrisRL
$ uv sync- src/dqn_agent.py - DQN reinforcement learning agent trains on tetris
- src/supervised_agent.py - The same convolutional model as DQN trains on a dataset of user playthroughs
- src/user_engine.py - Play tetris and accumulate information as a training set
- src/run_model.py - Evaluate a saved agent model on a visual game of tetris (i.e.)
$ uv run python src/run_model.py checkpoint.pth.tarThe interface is similar to an OpenAI Gym environment.
Initialize the Tetris RL environment
from src.engine import TetrisEngine
width, height = 10, 20
env = TetrisEngine(width, height)Simulation loop
# Reset the environment
obs = env.clear()
while True:
# Get an action from a theoretical AI agent
action = agent(obs)
# Sim step takes action and returns results
obs, reward, done = env.step(action)
# Done when game is lost
if done:
breakPlay games and accumulate a data set for a supervised learning algorithm to trian on. An element of data stores a (state, reward, done, action) tuple for each frame of the game.
You may notice the rules are slightly different than normal Tetris. Specifically, each action you take will result in a corresponding soft drop This is how the AI will play and therefore how the training data must be taken.
To play Tetris:
$ uv run python src/user_engine.pyControls:
W: Hard drop (piece falls to the bottom)
A: Shift left
S: Soft drop (piece falls one tile)
D: Shift right
Q: Rotate left
E: Rotate right
At the end of each game, choose whether you want to store the information of that game in the data set. Data accumulates in a local file called 'training_data.npy'.
Run the supervised agent file and specify the standard training data file generated in the previous step as a command line argument.
$ uv run python src/supervised_agent.py training_data.npy# Start from a new randomized dqn agent
$ uv run python src/dqn_agent.py
# Start from a the last recorded dqn checkpoint
$ uv run python src/dqn_agent.py resume
# Specify a custom checkpoint
$ uv run python src/dqn_agent.py resume supervised_checkpoint.pth.tarThe DQN agent currently optimizes on a metric of freedom of action. In essence the agent should learn to maximize the entropy of the board. A player in Tetris has the most freedom of action when the area is clear of pieces.
$ uv run python src/run_model.py checkpoint.pth.tar