A sophisticated Hangman game solver that combines Hidden Markov Models (HMM) with Reinforcement Learning (RL) to achieve intelligent letter guessing strategies.
This project implements an AI agent that plays Hangman by learning from a corpus of 50,000 training words and adapting its strategy through reinforcement learning. The system achieves a 26.80% win rate with 50.61% guess accuracy on unseen test data.
- Statistical HMM: Analyzes letter frequencies, positional patterns, and bigram relationships
- Reinforcement Learning Agent: Learns adaptive guessing strategies through trial and error
- Multi-Level Probability Model: Combines global, length-specific, positional, and contextual information
- Exploratory Data Analysis: Comprehensive analysis of the training corpus
- Performance Tracking: Detailed metrics across 2,000 training episodes
- Global Frequency Analysis: Letter occurrence probabilities (e.g., 'e': 10.37%)
- Length-Specific Patterns: Different distributions for different word lengths
- Position-Based Predictions: First/last letter probabilities and relative positioning
- Context-Aware Predictions: Conditional probabilities based on revealed letters
- Vowel-Consonant Balancing: Adaptive boosting based on revealed letter ratios
- State Representation: Discretized features (length, progress, lives remaining)
- Action Space: 26 possible letter guesses
- Reward System:
- Correct guess: +10 per revealed letter + 2
- Wrong guess: -10 to -28 (escalating with lives lost)
- Win: +100 | Loss: -100
- Exploration Strategy: ε-greedy with decay (0.20 → 0.05)
- Learning Mechanism: Letter bias adjustment based on outcomes
hangman-solver/
│
├── Data/
│ └── Data/
│ ├── corpus.txt # Training data (50,000 words)
│ └── test.txt # Test data (2,000 words)
│
├── hangman.ipynb # Main Jupyter notebook
│
├── Generated Files/
│ ├── corpus_eda_summary.csv # Training corpus statistics
│ ├── statistical_hmm.pkl # Trained HMM model (~5-10 MB)
│ ├── rl_agent_2000.pkl # Trained RL agent (~10-50 KB)
│ └── training_metrics_2000.csv # Training performance metrics
│
├── Analysis_Report.pdf # Detailed analysis and insights
├── README.md # This file
└── requirements.txt # Python dependencies
- Python 3.8 or higher
- Jupyter Notebook or JupyterLab
pandas>=1.5.0
numpy>=1.23.0
matplotlib>=3.6.0
seaborn>=0.12.0
jupyter>=1.0.0
Open and run the Jupyter notebook:
jupyter notebook hangman.ipynbThe notebook contains:
- Exploratory Data Analysis - Analyzes the training corpus
- HMM Training - Builds statistical probability models
- RL Agent Training - Trains the reinforcement learning agent (2,000 episodes)
- Evaluation - Tests performance on held-out test set
- Demo - Shows step-by-step gameplay examples
| Metric | Value |
|---|---|
| Win Rate | 26.80% (536 wins) |
| Guess Accuracy | 50.61% (11,609/22,936) |
| Avg Wrong Guesses | 4.66 per game |
| Repeated Guesses | 0 (perfect tracking) |
| Final Score | -46,099 |
- Training Episodes: 2,000
- Words per Episode: 100
- Total Training Games: 200,000
- Final Epsilon: 0.0499
- Training Win Rate: 24.63% (last 100 episodes)
✅ Statistical HMM provides strong baseline predictions
✅ Letter bias learning successfully adapts to experience
✅ Multi-level probability weighting balances different information sources
✅ No repeated guesses (perfect game state tracking)
- Adaptive epsilon decay based on performance
- Experience replay buffer
- Enhanced reward shaping
- Deep Q-Network (DQN) architecture
- Monte Carlo Tree Search (MCTS)
- Ensemble of specialist agents
- Transformer-based architecture with attention
- Transfer learning from language models (BERT/GPT)
- Curriculum learning strategy
Expected impact: +20-50% win rate improvement
- Average Length: 9.50 letters
- Most Common Length: 9 letters (13.62%)
- Letter Distribution: e (10.37%), a (8.87%), i (8.86%)
- Common Bigrams: 'er' (2.08%), 'in' (1.68%), 'ti' (1.56%)
- Words with Repeated Letters: 21.24%
- Top Starting Letters: p (10.40%), s (10.37%), c (8.26%)
- Top Ending Letters: e (18.61%), y (11.51%), s (10.80%)
ADITYA CS AADARSH KOUSHIK AAYUSH JHA ADVAITH NAMBIAR
Note: This is an academic/learning project demonstrating AI techniques in game playing. The performance metrics reflect the complexity of word prediction under constrained conditions.