Hangman Solver: Statistical HMM + Reinforcement Learning

A sophisticated Hangman game solver that combines Hidden Markov Models (HMM) with Reinforcement Learning (RL) to achieve intelligent letter guessing strategies.

📊 Project Overview

This project implements an AI agent that plays Hangman by learning from a corpus of 50,000 training words and adapting its strategy through reinforcement learning. The system achieves a 26.80% win rate with 50.61% guess accuracy on unseen test data.

Key Features

Statistical HMM: Analyzes letter frequencies, positional patterns, and bigram relationships
Reinforcement Learning Agent: Learns adaptive guessing strategies through trial and error
Multi-Level Probability Model: Combines global, length-specific, positional, and contextual information
Exploratory Data Analysis: Comprehensive analysis of the training corpus
Performance Tracking: Detailed metrics across 2,000 training episodes

🏗️ Architecture

1. Statistical HMM Component

Global Frequency Analysis: Letter occurrence probabilities (e.g., 'e': 10.37%)
Length-Specific Patterns: Different distributions for different word lengths
Position-Based Predictions: First/last letter probabilities and relative positioning
Context-Aware Predictions: Conditional probabilities based on revealed letters
Vowel-Consonant Balancing: Adaptive boosting based on revealed letter ratios

2. Reinforcement Learning Agent

State Representation: Discretized features (length, progress, lives remaining)
Action Space: 26 possible letter guesses
Reward System:
- Correct guess: +10 per revealed letter + 2
- Wrong guess: -10 to -28 (escalating with lives lost)
- Win: +100 | Loss: -100
Exploration Strategy: ε-greedy with decay (0.20 → 0.05)
Learning Mechanism: Letter bias adjustment based on outcomes

📁 Project Structure

hangman-solver/
│
├── Data/
│   └── Data/
│       ├── corpus.txt          # Training data (50,000 words)
│       └── test.txt            # Test data (2,000 words)
│
├── hangman.ipynb               # Main Jupyter notebook
│
├── Generated Files/
│   ├── corpus_eda_summary.csv      # Training corpus statistics
│   ├── statistical_hmm.pkl         # Trained HMM model (~5-10 MB)
│   ├── rl_agent_2000.pkl           # Trained RL agent (~10-50 KB)
│   └── training_metrics_2000.csv   # Training performance metrics
│
├── Analysis_Report.pdf         # Detailed analysis and insights
├── README.md                   # This file
└── requirements.txt            # Python dependencies

🚀 Getting Started

Prerequisites

Python 3.8 or higher
Jupyter Notebook or JupyterLab

Required Packages

pandas>=1.5.0
numpy>=1.23.0
matplotlib>=3.6.0
seaborn>=0.12.0
jupyter>=1.0.0

💻 Usage

Running the Complete Pipeline

Open and run the Jupyter notebook:

jupyter notebook hangman.ipynb

The notebook contains:

Exploratory Data Analysis - Analyzes the training corpus
HMM Training - Builds statistical probability models
RL Agent Training - Trains the reinforcement learning agent (2,000 episodes)
Evaluation - Tests performance on held-out test set
Demo - Shows step-by-step gameplay examples

📈 Performance Metrics

Test Set Results (2,000 games)

Metric	Value
Win Rate	26.80% (536 wins)
Guess Accuracy	50.61% (11,609/22,936)
Avg Wrong Guesses	4.66 per game
Repeated Guesses	0 (perfect tracking)
Final Score	-46,099

Training Performance

Training Episodes: 2,000
Words per Episode: 100
Total Training Games: 200,000
Final Epsilon: 0.0499
Training Win Rate: 24.63% (last 100 episodes)

🔍 Key Insights

What Works Well

✅ Statistical HMM provides strong baseline predictions
✅ Letter bias learning successfully adapts to experience
✅ Multi-level probability weighting balances different information sources
✅ No repeated guesses (perfect game state tracking)

Current Limitations

⚠️ Struggles with uncommon consonants (g, h, m, p, f)
⚠️ Premature convergence around episode 200
⚠️ Difficulty with middle letters in partially revealed words
⚠️ No look-ahead planning beyond immediate guess

🔧 Future Improvements

Short-term (1 week)

Adaptive epsilon decay based on performance
Experience replay buffer
Enhanced reward shaping

Medium-term (1-2 months)

Deep Q-Network (DQN) architecture
Monte Carlo Tree Search (MCTS)
Ensemble of specialist agents

Long-term (3+ months)

Transformer-based architecture with attention
Transfer learning from language models (BERT/GPT)
Curriculum learning strategy

Expected impact: +20-50% win rate improvement

📊 Data Analysis Highlights

Corpus Statistics (50,000 words)

Average Length: 9.50 letters
Most Common Length: 9 letters (13.62%)
Letter Distribution: e (10.37%), a (8.87%), i (8.86%)
Common Bigrams: 'er' (2.08%), 'in' (1.68%), 'ti' (1.56%)
Words with Repeated Letters: 21.24%

Position Analysis

Top Starting Letters: p (10.40%), s (10.37%), c (8.26%)
Top Ending Letters: e (18.61%), y (11.51%), s (10.80%)

👥 Authors

ADITYA CS AADARSH KOUSHIK AAYUSH JHA ADVAITH NAMBIAR

Note: This is an academic/learning project demonstrating AI techniques in game playing. The performance metrics reflect the complexity of word prediction under constrained conditions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Hangman Solver: Statistical HMM + Reinforcement Learning

📊 Project Overview

Key Features

🏗️ Architecture

1. Statistical HMM Component

2. Reinforcement Learning Agent

📁 Project Structure

🚀 Getting Started

Prerequisites

Required Packages

💻 Usage

Running the Complete Pipeline

📈 Performance Metrics

Test Set Results (2,000 games)

Training Performance

🔍 Key Insights

What Works Well

Current Limitations

🔧 Future Improvements

Short-term (1 week)

Medium-term (1-2 months)

Long-term (3+ months)

📊 Data Analysis Highlights

Corpus Statistics (50,000 words)

Position Analysis

👥 Authors

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
Data		Data
Analysis_Report.pdf		Analysis_Report.pdf
Problem_Statement.pdf		Problem_Statement.pdf
README.md		README.md
corpus_eda_summary.csv		corpus_eda_summary.csv
hangman.ipynb		hangman.ipynb
rl_agent_2000.pkl		rl_agent_2000.pkl
statistical_hmm.pkl		statistical_hmm.pkl
training_metrics_2000.csv		training_metrics_2000.csv

Folders and files

Latest commit

History

Repository files navigation

Hangman Solver: Statistical HMM + Reinforcement Learning

📊 Project Overview

Key Features

🏗️ Architecture

1. Statistical HMM Component

2. Reinforcement Learning Agent

📁 Project Structure

🚀 Getting Started

Prerequisites

Required Packages

💻 Usage

Running the Complete Pipeline

📈 Performance Metrics

Test Set Results (2,000 games)

Training Performance

🔍 Key Insights

What Works Well

Current Limitations

🔧 Future Improvements

Short-term (1 week)

Medium-term (1-2 months)

Long-term (3+ months)

📊 Data Analysis Highlights

Corpus Statistics (50,000 words)

Position Analysis

👥 Authors

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages