Chess AI Tutor

A chess coaching system built on a multimodal encoder+LLM architecture. A ResNet CNN board encoder is CLIP-aligned to Qwen3.5-4B's embedding space, then fine-tuned jointly to annotate engine key lines and produce coaching comments.

Current Training Pipeline

encoder-phase0 (CLIP alignment, InfoNCE) ✅ checkpoint-9000
        ↓
phase1-alignment (proj+LoRA, 20 board-reading tasks) ← ACTIVE (step ~9100/32079)
        ↓
phase2-sft  (joint task: annotated lines + coaching comment)
        ↓
phase3-grpo (RL on SF15 annotation accuracy + comment quality)

Architecture

The model injects a flat block of 65 <|vision_pad|> sentinel tokens (64 per-square + 1 global) into the LLM prompt. The CNN encoder maps the board tensor to (65, 2560) embeddings which replace the sentinel token embeddings before the LLM forward pass.

Board (FEN) ──→ ChessEncoder (ResNet, 10 blocks, 256 filters)
                    ├─ 64 per-square tokens ──→ proj MLP (256→2560) ──→ grid embeddings
                    └─ 1 global token ─────→ cross-attn + MLP ────→ global embedding
                                                        ↓
[system] [65 × <|vision_pad|>] [question]  ←── injected into LLM prompt
                                                        ↓
                                              Qwen3.5-4B + LoRA r=64

Why CLIP first: The CNN was initially pretrained on SF15 regression (no language supervision), leaving its features in an arbitrary space unrelated to the LLM's embedding manifold. Direct projector alignment failed — the LLM ignores CNN tokens and solves tasks from text instead (bootstrap deadlock). encoder-phase0 fixes this by contrastively aligning CNN features to Qwen3.5-4B text embeddings via InfoNCE before any projector training.

Installation

uv sync --extra training
export STOCKFISH_PATH="$HOME/.local/bin/stockfish"
export SF15_PATH="$HOME/.local/bin/stockfish-15"
./scripts/test.sh -v

Training

# Phase 1 alignment (active)
./recipes-train/qwen3.5-4b-encoder-phase1-alignment/start.sh
./recipes-train/qwen3.5-4b-encoder-phase1-alignment/start.sh --resume   # resume last checkpoint
./recipes-train/qwen3.5-4b-encoder-phase1-alignment/start.sh --resume checkpoints/.../checkpoint-N

# Monitor
tail -f /tmp/phase1-alignment.log

# encoder-phase0 CLIP training (complete)
./recipes-train/encoder-phase0/start.sh

Project Structure

`src/`

encoder/ — Board tensor builder, BOARD_TOKEN / BOARD_TOKEN_ID constants
verification/ — Move legality validation + GRPO reward functions (rewards.py)
chess_mcp/ — MCP server + async Stockfish wrapper
tutor/ — FastAPI web server, chess.com client, prompt templates

`training/`

encoder_model.py — ChessLMWithEncoder: wraps Qwen3.5-4B + ChessEncoder CNN; handles sentinel injection
encoder_collator.py — Builds board tensors from FEN; injects at <|vision_pad|> positions
lib.py — Shared training utilities

`recipes-train/`

Recipe	Purpose	Status
`encoder-phase0/`	CLIP-style InfoNCE alignment of CNN to Qwen3.5-4B	✅ checkpoint-9000
`qwen3.5-4b-encoder-phase1-alignment/`	proj+LoRA alignment, 20 board-reading tasks	🔄 step ~9100/32079
`qwen3.5-4b-encoder-phase2-sft/`	SFT: annotated key lines + coaching comment	⏳ pending
`qwen3.5-4b-encoder-phase3-grpo/`	GRPO: SF15 annotation + comment quality rewards	⏳ pending

`data/pipeline/`

generate_alignment_board_description.py — 5M board positions × 20 tasks → alignment_board_description.jsonl
generate_encoder_pretrain_sf15.py — 656M positions with SF15 term labels → encoder_pretrain_sf15.jsonl
generate_phase2_data.py — Textbook positions + Stockfish depth-18 → lines_joint_sft.jsonl
generate_grpo_joint_prompts.py — Lichess positions + SF15 annotations → grpo_joint_prompts_sf15.jsonl
sf15_eval.py — Stockfish 15 classical eval term wrapper

`data/processed/` (active)

File	Records	Used by
`alignment_board_description.jsonl`	4.58M	phase1-alignment train
`alignment_board_description_eval.jsonl`	103k	phase1-alignment eval
`encoder_pretrain_sf15.jsonl`	656M	encoder-phase0 CLIP training, phase1-alignment SF15 tasks
`encoder_pretrain_1b.jsonl`	~1B	phase1-alignment main tasks (source)

Phase 1 Alignment — 20 Tasks

The alignment dataset trains the model to answer direct board-reading questions using only the 65 CNN sentinel tokens — no text anchors. Tasks are split by what information they probe:

Per-square (probe grid tokens): piece_abbr_at, piece_name_at, rank_contents, file_contents, count_piece_type, count_total_material, find_piece_type, is_square_occupied, attackers_at, is_pinned, mobility_at

Global (probe summary token): side_to_move, castling_rights, en_passant, move_number, is_check, material_balance, who_is_better

SF15-dependent (from encoder_pretrain_sf15.jsonl): eval_tier, sf15_dominant

Current eval accuracy: ~85–90% at step 9100/32079.

Phase 3 GRPO Rewards

Reinforcement learning from verifiable rewards — no LLM judge needed:

Reward	Weight	Signal
R0 format	0.10	`<line>` tags present
R1 legality	hard gate	All line moves legal (python-chess)
R_think	0.15	Non-empty think block with ≥2 moves
R3b sf15	0.35	Think-block SF15 term interpretations match prompt diffs
RC_tone	0.20	Chess concept vocabulary in coaching comment
RC_educ	0.20	Grounded moves + causal reasoning in comment

Tests

./scripts/test.sh -v

Experiments Log

See EXPERIMENTS.md for a full history of tried approaches, what failed, and why.

Name		Name	Last commit message	Last commit date
Latest commit History 163 Commits
data/pipeline		data/pipeline
docker		docker
recipes-inference		recipes-inference
recipes-train		recipes-train
scripts		scripts
sglang		sglang
src		src
static		static
tests		tests
.env.example		.env.example
.gitignore		.gitignore
.python-version		.python-version
CLAUDE.md		CLAUDE.md
EXPERIMENTS.md		EXPERIMENTS.md
README.md		README.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Chess AI Tutor

Current Training Pipeline

Architecture

Installation

Training

Project Structure

`src/`

`training/`

`recipes-train/`

`data/pipeline/`

`data/processed/` (active)

Phase 1 Alignment — 20 Tasks

Phase 3 GRPO Rewards

Tests

Experiments Log

About

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Chess AI Tutor

Current Training Pipeline

Architecture

Installation

Training

Project Structure

src/

training/

recipes-train/

data/pipeline/

data/processed/ (active)

Phase 1 Alignment — 20 Tasks

Phase 3 GRPO Rewards

Tests

Experiments Log

About

Resources

Uh oh!

Stars

Watchers

Forks

Contributors

Uh oh!

Languages

`src/`

`training/`

`recipes-train/`

`data/pipeline/`

`data/processed/` (active)