A sophisticated NFL analytics platform that uses advanced machine learning to identify profitable betting opportunities. This system analyzes historical NFL data with 50+ enhanced features across multiple time scales to predict game outcomes and provides actionable betting insights with proven 60.9% ROI performance on data-leakage-free models.
Now featuring a multi-page Streamlit app with dedicated Historical Data page for advanced filtering and analysis of 196k+ play-by-play records.
🆕 PLAYER PROPS SYSTEM: Individual player performance predictions (Passing Yards, Rushing Yards, Receiving Yards, TDs) for DraftKings Pick 6 and similar prop betting markets. See Player Props Roadmap for details.
A new interactive DraftKings Pick 6 line comparison tool is available in the Player Props page. Enter a player, choose a stat category (e.g., Passing Yards), and paste the DraftKings Pick 6 over/under line (e.g., 242.5). The tool will show:
- A data-driven recommendation (OVER / UNDER)
- Confidence and tier (🔥 ELITE, 💪 STRONG, ✅ GOOD,
⚠️ LEAN) - Prediction source: either the trained XGBoost model ("🤖 ML Model") or a historical hit-rate fallback ("📊 Historical")
- Recent-game averages (L3, L5, L10), season average, and a recent game log showing OVER/UNDER hits
Notes for developers:
- The feature uses the project models in
player_props/modelsand the prediction pipeline inplayer_props/predict.py. - Models are loaded with caching to keep UI responsive; if a model for the selected stat/tier is missing, the code falls back to a simple Laplace-smoothed historical hit rate.
- For full production predictions, regenerate player props with
python player_props/predict.py.
- 🎯 Key Features
- 📈 Model Performance
- 📊 Data Sources
- 🎮 How to Use
- 📁 Enhanced Features
- 🔧 Technical Architecture
- 📁 Recent Updates
- 🎯 Getting Started
- 🔧 Troubleshooting
⚠️ Responsible Gambling Notice- 🤝 Contributing
- 🔥 Next 10 Underdog Bets: Actionable moneyline betting opportunities with complete payout calculations
- 🎯 Next 10 Spread Bets: NEW high-performance spread betting section with 91.9% historical win rate
- Confidence Tiers: Visual indicators (🔥 Elite, ⭐ Strong, 📈 Good) for bet prioritization
- Live Game Predictions: Real-time probability calculations for upcoming NFL games
- Enhanced Betting Signals: Dual-strategy recommendations with proper spread/moneyline logic
- Performance Tracking: Historical betting performance with ROI calculations and survivorship bias awareness
- Edge Analysis: Compare model predictions against sportsbook odds to find value bets
- Monte Carlo Feature Selection: Interactive experimentation with feature combinations for model optimization
- Enhanced Reliability: Robust error handling and graceful fallbacks for uninterrupted analysis
- Triple Strategy Success: Moneyline (65.4% ROI) + Spread (75.5% ROI) + Over/Under betting
- Elite Spread Performance: 91.9% win rate on high-confidence spread bets (≥54% threshold)
- Moneyline Strategy: 59.5% win rate on underdog picks with 28% F1-optimized threshold
- Over/Under Model: NEW totals betting with F1-optimized thresholds and value edge calculations
- Professional-Grade Validation: Data leakage eliminated, realistic performance metrics
- Selective Betting: High-confidence filtering for maximum profitability
- Three Specialized Models: Separate XGBoost models for spread, moneyline, and over/under predictions
- F1-Score Optimization: All models use F1-score maximization to find optimal betting thresholds
- Optimized XGBoost Models with production-ready hyperparameters and probability calibration
- Enhanced Monte Carlo Feature Selection testing 200 iterations with 15-feature subsets
- Data Leakage Prevention: Strict temporal boundaries ensuring only pre-game information
- Class Balancing with computed scale weights for imbalanced datasets
- Multi-Target Prediction: Spread (56.3%), moneyline (64.2%), totals (56.2%) accuracy
| Prediction Type | Cross-Val Accuracy | Betting Performance | ROI | Key Insight |
|---|---|---|---|---|
| Spread | 58.9% | 91.9% win rate (≥54% threshold) | 75.5% | Elite performance through selective betting |
| Moneyline | 64.2% | 59.5% win rate (≥28% threshold) | 65.4% | Strong underdog value identification |
| Over/Under | 56.2% | F1-optimized thresholds | TBD | NEW - Totals betting with value edge analysis |
- Spread Model Fixed: Corrected inverted predictions from 3.6% to 91.9% win rate
- Dual Strategy Success: Both spread and moneyline betting now highly profitable
- Data Leakage Free: Strict temporal boundaries ensure production reliability
- Survivorship Bias Awareness: 91.9% rate is on selective 33% of games, not overall performance
- Confidence Calibration: Model knows when it's likely to be right vs wrong
- Source: NFLverse Project - The most comprehensive NFL dataset available
- Coverage: Play-by-play data from 2020-2024 seasons
- Data Quality: Professional-grade data used by NFL analysts and researchers
- Update Frequency: Updated weekly during NFL season
- Source: ESPN NFL scores and betting data
- Includes: Point spreads, moneylines, over/under totals, and odds
- Coverage: Historical betting lines for model training and backtesting
- All-Time Rolling Statistics: Win percentages, point differentials, scoring averages
- Current Season Performance: Weekly updated team form and scoring trends
- Historical Season Context: Prior season records for baseline team quality
- Head-to-Head Matchups: Team-specific historical performance data
- Situational Data: Home/away performance, division games, weather conditions
- Advanced Metrics: Blowout rates, close game performance, coaching records
# Build historical data and train models (single-step):
python build_and_train_pipeline.py
# Or run the steps separately:
# Train the models (build features and train)
python nfl-gather-data.py
# Launch the dashboard
streamlit run predictions.py- Place any new helper or diagnostic Python scripts in the
scripts/folder. Examples:scripts/check_moneyline_calibration.py,scripts/analyze_underdog_impact.py. - Scripts should be import-safe (no heavy data loads at module import time), include a short header comment describing purpose, and provide a
if __name__ == '__main__':entrypoint so they can be run from CI or manually.
The repository includes a GitHub Actions workflow (.github/workflows/nightly-update.yml) that automatically updates predictions during football season:
- Schedule: Runs nightly at 3:00 AM UTC (Sept 1 - Feb 15)
- Season Detection: Automatically skips runs outside football season (March-August)
- Update Process:
- Fetches latest ESPN scores and betting lines
- Smart update of NFLverse play-by-play data (only when new games detected)
- Runs the complete prediction pipeline (
build_and_train_pipeline.py) - Uploads updated predictions as artifacts
- Optionally commits changes back to the repository
Smart PBP Updates: The system uses intelligent detection to only download play-by-play data when new games are likely available, avoiding unnecessary bandwidth usage on non-game days.
Manual Trigger: You can manually run the workflow from the Actions tab in GitHub.
Data Sources Update Cadence:
- ESPN API (scores/odds): Real-time, polled nightly at 3 AM
- NFLverse PBP (play-by-play): Smart updates nightly - only downloads when new games detected
- Predictions CSV: Regenerated after each pipeline run (~5 minutes)
- The dashboard now includes a "🔄 Generate Predictions" button in the "Upcoming Games Schedule" expander when the app detects scheduled games without model outputs. Clicking this button runs
python build_and_train_pipeline.pylocally (shows a spinner and displays success/error output) and refreshes the page when finished. This is intended as a convenient local fallback to the nightly GitHub Actions workflow.
Local Alternative: For local development, run the update manually:
# Fetch latest ESPN data
python fetch_espn_weekly_scores.py
# Rebuild predictions
python build_and_train_pipeline.py- The app reads environment variables for optional features such as emailing and RSS feed URL. For local development you can create a
.envfile in the project root and the app will load it automatically. python-dotenvis recommended and included inrequirements.txt. Ifpython-dotenvis not installed, the app falls back to a minimal.envparser that will still read basicKEY=VALUElines.
Example: copy .env.example to .env and update values:
EMAIL_FROM=you@example.com
EMAIL_TO=recipient@example.com
EMAIL_PASSWORD=your_app_password_here
SMTP_SERVER=smtp.gmail.com
SMTP_PORT=587
ALERTS_SITE_URL=http://localhost:8501/
- PowerShell example to set environment variables for a single session (optional):
$env:EMAIL_FROM = 'you@example.com'
$env:EMAIL_TO = 'recipient@example.com'
$env:EMAIL_PASSWORD = 'your_app_password_here'-
Security note: Do NOT commit real secrets. Add
.envto your.gitignore(the repo currently provides.env.exampleas a template). -
Deployment tip: For Streamlit Cloud or other hosting, prefer the platform's secrets manager (e.g.,
st.secretson Streamlit Cloud) rather than a.envfile.
The dashboard uses modern multi-page navigation with a main predictions page and dedicated historical data page. Each section displays data with professional formatting including percentages, proper date formats, and descriptive column labels.
The primary predictions interface with tab-based navigation for betting analysis.
- Model Predictions vs Actual Results: Historical game outcomes with checkbox indicators
- Formatted columns: Game Date (MM/DD/YYYY), Team names, Scores, Spread/O/U lines
- Checkbox columns show predicted vs actual spread coverage and totals
- Displays 50 most recent completed games with proper date formatting
- Upcoming Game Probabilities: Shows betting opportunities with model confidence
- Percentage Display: All probabilities shown as percentages (e.g., "45.6%" instead of "0.456")
- Spread Probabilities: Model confidence underdog will cover spread
- Moneyline Probabilities: Model confidence underdog wins outright
- Over/Under Probabilities: Model confidence for totals betting
- Edge Calculations: Value identification (model % - implied %)
- Compact Labels: "Spread Prob", "ML Prob", "Over Edge" with helpful tooltips
- Key Metrics Table: Clean, organized display of model performance
- Spread, Moneyline, and Totals accuracy (formatted to 3 decimals)
- Mean Absolute Error for each model
- Optimal thresholds (28% for ML, 54% for spread)
- Compact 600px width for easy scanning
- Performance Tracking: Historical betting performance with ROI calculations
- Survivorship Bias Awareness: Transparent about selective betting strategy
- Next 10 Underdog Betting Opportunities: Moneyline strategy recommendations
- Chronological order: Upcoming games where model has ≥28% confidence
- Complete betting info: Favored team, underdog, spread, model confidence, expected payout
- Real payout calculations: Exact profit amounts for $100 bets using live moneyline odds
- Example: "Vikings (H) +180 ($180 profit on $100)" when Chargers are favored
- Next 15 Spread Betting Opportunities: High-confidence spread recommendations
- Confidence Tiers: 🔥 Elite (75%+), ⭐ Strong (65-74%), 📈 Good (54-64%)
- Historical Performance: 91.9% win rate on high-confidence bets, 75.5% ROI
- Smart Sorting: Ordered by confidence level for optimal bet selection
- Spread Explanation: Team can lose game but still "cover" (e.g., lose by less than spread)
- Totals Betting Opportunities: Model predictions for over/under betting
- Confidence Tiers: Elite (≥65%), Strong (60-65%), Good (55-60%), Standard (<55%)
- Value Edge Calculation: Expected profit percentage based on model probability vs odds
- Smart Bet Selection: Recommendations sorted by value edge for optimal selection
- Complete Payout Info: Expected returns on $100 bets for both over and under options
- Automated Bet Tracking: Logs all betting recommendations with timestamps
- Results Integration: Automatically updates with game outcomes
- Performance Analysis: Win/loss tracking for accountability
- Dedicated Page: Separate navigation page for historical data analysis
- 196k+ Play Records: Complete play-by-play data from 2020-2024 seasons
- Single Authoritative Table: The page now displays one filter-driven table (the top snapshot table was removed). Use the sidebar filters to refine results; the table always applies a calendar-date guard (only games on or before today) and displays results sorted by
game_datedescending. - Quick Presets: One-click filter shortcuts at the top of the sidebar:
- Red Zone: Automatically sets yardline filter to 0-20 yards from opponent endzone
- 3rd & Short: Sets down filter to 3rd down and yards-to-go to 0-3 yards
- Pass Attempts Only: Checks the pass-only filter for pass play analysis
- Advanced Filtering: 12+ filter controls including:
- Team filters (offense/defense)
- Game context (down, quarter, play type)
- Field position sliders (yards to go, yardline)
- Score filters (differential, team scores)
- Advanced metrics (EPA, win probability)
- Reset Functionality: One-click filter reset with session state management
- Paginated Display: 50-500 rows per page with navigation controls
- Rich Column Configuration: Formatted dates, percentages, checkboxes for play outcomes
- Back to Predictions: Easy navigation button to return to main page
- Feature Importances: Top model features with mean/std importance (3 decimals)
- Monte Carlo Results: Feature selection testing with formatted metrics
- Filtered Historical Data: Play-by-play data with percentage formatting for win probabilities
- Error Metrics: Model accuracy and MAE displayed with consistent 3-decimal formatting
Traditional betting advice says "always bet favorites" (70% win rate), but this system finds value in selective underdog betting:
- Model identifies underdogs with higher win probability than betting odds suggest
- 24% threshold optimization maximizes F1-score for better long-term profitability
- Risk management through selective betting (only ~72% of games get betting signals)
Game: Chiefs @ Raiders
Model Probability (Underdog Win): 30%
Sportsbook Implied Probability: 22%
Betting Signal: 1 (BET ON RAIDERS)
Threshold: ≥28% (F1-optimized, leak-free)
Expected Value: Positive due to 8% edge
ROI Expectation: 60.9% based on historical performance
- Optimized XGBoost Classifiers with production-tuned hyperparameters:
n_estimators=300,learning_rate=0.05,max_depth=6- L1/L2 regularization (
reg_alpha=0.1,reg_lambda=1.0) - Subsampling (
subsample=0.8,colsample_bytree=0.8)
- Calibrated Probability Outputs using sigmoid/isotonic scaling
- Enhanced Feature Engineering: 50+ leak-free temporal features
- Time-Series Aware Cross-Validation preventing future data leakage
- Real-time Team Form:
homeTeamCurrentSeasonWinPct,awayTeamCurrentSeasonWinPct - Scoring Trends:
homeTeamCurrentSeasonAvgScore,awayTeamCurrentSeasonAvgScore - Defensive Performance:
homeTeamCurrentSeasonAvgScoreAllowed,awayTeamCurrentSeasonAvgScoreAllowed
- Prior Season Records:
homeTeamPriorSeasonRecord,awayTeamPriorSeasonRecord - Annual Stability Metrics: Complete previous season win percentages for baseline team quality
- Team-Specific Advantages:
headToHeadHomeTeamWinPct - Historical Matchup Performance: How teams have performed against each other over time
- Temporal Integrity: All features maintain strict data leakage prevention
- Multi-Scale Analysis: Current season trends + historical season records + specific matchup history
- Automated Integration: Features automatically included in Monte Carlo feature selection
- Automated Feature Creation: Rolling averages, win percentages, trends with strict temporal controls
- Data Leakage Prevention: All statistics use only prior games (
season < currentORweek < current) - Enhanced Validation: Real-time feature availability and XGBoost compatibility checks
- Quality Assurance: Removes games with missing betting lines, validates data integrity
- Threshold Optimization: F1-score maximization finds optimal decision boundary (28%)
- Robust Error Handling: Graceful fallbacks for missing data and feature inconsistencies
- Production Ready: No future information leakage, realistic performance expectations
nfl-predictions/
├── nfl-gather-data.py # Main model training script
├── predictions.py # Streamlit dashboard
├── data_files/ # Data storage directory
- Primary Algorithm: XGBoost with production-optimized parameters
- Training Parameters: 300 estimators, 0.05 learning rate, max depth 6
- Regularization: L1=1, L2=1 for overfitting prevention
- Monte Carlo Parameters: 100 estimators, 0.1 learning rate for feature selection
- Model Calibration: CalibratedClassifierCV with sigmoid method for accurate probabilities
- Feature Selection: Monte Carlo optimization (200 iterations, 15-feature subsets)
- Cross-Validation: Time-series aware splits preventing data leakage
- Class Handling: Weighted models addressing favorite/underdog imbalance (~70/30 split)
- Primary Source: NFLverse (nfl_data_py) - official NFL play-by-play data
- Betting Data: ESPN odds and lines (2020-2024 seasons)
- Feature Types: Rolling statistics, head-to-head records, seasonal performance
- Data Integrity: Strict temporal boundaries prevent future data leakage
- Data Quality: ~4,000+ games with complete betting line information
- Storage: Git LFS integration for large datasets (>50MB)
- Threshold Optimization: F1-score maximization finds 28% probability threshold (not 50%)
- Selective Betting: ~72% of games generate betting signals (selective strategy)
- Edge Calculation:
model_prob - implied_odds_probfor value identification - ROI Focus: Optimizes for profit margin, not raw accuracy percentage
- Critical Model Fix: Resolved inverted spread predictions caused by a mislabeled target in the training pipeline; applied the correction
prob_underdogCovered = 1 - prob_underdogCoveredimmediately after model prediction in the data pipeline. - Impact: Model betting ROI improved dramatically (from -90% → +60%), 62 of 63 remaining games flagged as profitable, maximum model confidence increased to 89.5%, and calibration error improved from 45% → 28%.
- New Features (18 total): Added momentum (8), rest-advantage (5), and weather-impact (3) features. All features were engineered to avoid data leakage — see
NEW_FEATURES_DEC13.mdfor details. - UI & Workflow Improvements: Added an EV explanation expander, changed spread-bet sorting to date-ascending, fixed unicode/icon issues, and improved PDF/CSV export UX.
- Documentation: Full technical analysis and remediation plan available in
MODEL_FIX_PLAN.md. The copilot instructions and README have been updated to reflect the fix and next steps. - Next Steps: Optional hyperparameter tuning and ensemble approaches are documented for incremental gains; momentum features will strengthen as the 2025 season progresses.
- Problem Solved: Users miss high-value betting opportunities when they open the app.
- Solution: Automatic, actionable in-app notifications that highlight Elite and Strong opportunities:
- Elite notifications (🔥): bets with ≥65% model confidence
- Strong notifications (⭐): bets with 60–65% model confidence
- Notifications are shown as toasts and are deduplicated using
st.session_stateso the same game doesn't re-notify in the same session. - Each alert links to a per-alert page (query param
?alert=<guid>) that shows friendly bet details, logos, bet type, gameday/time, and a small recommendation table. - The app also persists a detected public base URL (when available) into
data_files/app_config.jsonso external tools (and the RSS generator) can create working per-alert links.
- Impact: Improves engagement by surfacing high-value bets and providing one-click access to full alert details.
- Technical: Uses
st.toast()for toasts, session-state deduplication, per-alert pages rendered fromdata_files/betting_recommendations_log.csv, and persisted base-URL for external link generation.
- Purpose: Provide external notifications via RSS with per-alert links pointing back to the app.
- Implementation:
scripts/generate_rss.pybuildsdata_files/alerts_feed.xml. It prefers the persistedapp_base_urlindata_files/app_config.json(if present), otherwise falls back to theALERTS_SITE_URLenvironment variable orhttp://localhost:8501/.- The running app exposes a sidebar
"🔁 Rebuild RSS"button that runs the generator in-place and reports success/errors.
- Files:
- Persisted config:
data_files/app_config.json(contains{"app_base_url": "https://..."}when detected) - RSS output:
data_files/alerts_feed.xml
- Persisted config:
- Problem Solved: Excessive vertical space at the top of the main page reduced available screen real estate
- Solutions Implemented:
- Compact Header Layout: Logo and title now arranged in columns [1, 4] instead of stacked vertically
- Logo Size Reduction: Logo reduced from 250px to 150px width for better proportions
- Logo Positioning: Logo positioned at top-left of its column with minimal spacing
- Loading Progress Optimization: Reduced verbose loading messages and progress indicators
- Debug Message Cleanup: Removed unnecessary debug print statements from UI
- Impact: Significantly more content visible above the fold, improved user experience
- Technical: Uses
st.columns()for responsive layout and optimized spacing
- Problem Solved: App exceeded Streamlit Cloud resource limits due to high memory usage (1.5GB+)
- Solutions Implemented:
- Data Type Optimization:
float32for numeric columns (50% memory reduction),Int8for boolean columns - DataFrame Views: Replaced
.copy()operations with views to eliminate memory duplication - Lazy Loading: All data loading uses
@st.cache_datadecorators with lazy initialization - Pagination: Added pagination for large datasets (>10k rows) with user warnings
- Spinner Configuration: Suppress cache messages on technical pages, show on main UI
- Variable Cleanup: Added
delstatements to clean up progress bars and temporary variables
- Data Type Optimization:
- Impact: Reduced memory usage while maintaining full functionality, enabled Streamlit Cloud deployment
- Technical: Memory-efficient patterns prevent resource limit violations
- Startup Time: Reduced cold-start time by avoiding heavy import-time loads and using lazy, chunked CSV scanning; typical cold-starts are dramatically faster for end users.
- Memory Footprint: Numeric dtype and view optimizations (e.g.,
float32andInt8) cut memory usage substantially, enabling deployment on Streamlit Cloud where memory was previously constrained. - Rendering Responsiveness: Pagination, reduced preview sizes, and DataFrame view usage reduce UI thread lag when interacting with large tables and filters.
- Deterministic Exports: On-demand PDF/CSV generation prevents pre-building large assets at startup, keeping initial memory and CPU usage low.
- Validation & Monitoring: Added a lightweight
smoke_test.pyand CI-friendly lazy-loading checks to detect import-time data loading and preserve startup performance.
- Problem Solved: Users couldn't manually refresh cached data or check cache status
- Solution: Added Settings panel in sidebar with:
- "🔄 Refresh Data" button to clear all cached data and reload
- Helpful tooltips explaining functionality
- Impact: Gives users control over data freshness
- Technical: Uses
st.cache_data.clear()for cache management
- Problem Solved: Users didn't know how much to bet on each recommendation for optimal risk management
- Solution: Added dedicated "💰 Bankroll Management" tab with:
- Bankroll input field with configurable amounts ($100-$1M+)
- Risk tolerance selector (Conservative 1%, Moderate 2%, Aggressive 3%, Very Aggressive 5%)
- Elite bet identification (≥65% confidence threshold)
- Position sizing calculator showing recommended bet amounts
- Expected payout calculations for each bet
- Expected value analysis (positive EV bets only)
- Bankroll impact tracking (total exposure percentage)
- Impact: Enables responsible betting with Kelly Criterion-inspired position sizing
- Technical: Filters predictions for elite opportunities and calculates optimal bet sizes
- Problem Solved: Users couldn't see if betting recommendations were actually accurate
- Solution: Added dedicated "📈 Model Performance" tab with:
- Overall metrics: Total bets, win rate, ROI, units won
- Performance breakdown by confidence tier (Elite/Strong/Good/Standard)
- Weekly performance tracking with line charts
- Best performing bet types analysis
- Impact: Builds trust and transparency in model performance
- Technical: Reads from
betting_recommendations_log.csvand calculates ROI metrics
- Problem Solved: Users experienced 5-10 second load times with no feedback on progress
- Solution: Added detailed progress bar showing specific loading steps:
- 25%: "Loading historical games..." - loads game-level data with predictions
- 50%: "Loading model predictions..." - loads betting predictions CSV
- 75%: "Loading play-by-play data..." - loads historical play-by-play records
- 100%: "Ready!" - shows completion with brief pause
- Impact: Significantly improved perceived performance and user experience
- Technical: Uses Streamlit progress bar with text updates and automatic cleanup
- Memory-Efficient Data Types: Reduced memory usage by ~50% using float32 and Int8 instead of float64/int64
- Eliminated Data Copying: Removed unnecessary
.copy()operations on 196k row dataset - DataFrame Views: Filter operations now use views instead of copies for instant performance
- Streamlit Cloud Config: Added
.streamlit/config.tomlwith increased message size limits (500MB) and optimizations - Smart Warnings: Alert users when displaying large result sets to encourage filtering
- Result: Historical Data page now loads significantly faster on Streamlit Cloud, avoiding timeout issues
- Three-Model System: Added dedicated over/under (totals) betting predictions alongside spread and moneyline
- F1-Score Optimization: Optimal threshold calculation for over/under predictions using F1-score maximization
- Value Edge Analysis: Calculates expected profit percentage based on model probability vs betting odds
- Confidence Tiers: Elite/Strong/Good/Standard classification for bet prioritization
- Complete Payout Calculations: Shows expected returns for both over and under bets on each game
- Integrated Dashboard: New "🎯 Over/Under Bets" tab with top 15 opportunities sorted by value edge
- Dedicated Historical Data Page: Separate navigation page for 196k+ play-by-play records with advanced filtering
- Clean Interface: Main predictions page focuses on betting analysis, historical data on separate page
- Easy Navigation: "🏈 Back to Predictions" button for quick return to main page
- 12+ Filter Controls: Team filters, game context, field position, scores, and advanced metrics
- Session State Reset: Reliable filter reset functionality using flag pattern
- Pagination: Display 50-500 rows per page with navigation controls
- Fixed Over/Under Column Names: Corrected KeyError for
pred_totalsProb→prob_overHit - Added Moneyline Return Calculation: Implemented missing
moneyline_bet_returncolumn computation - Fixed Indentation Issues: Resolved Python syntax errors in complex nested tab structures
- Improved Error Handling: Better validation for column existence before accessing dataframe columns
- Type Safety: Enhanced Pylance compatibility with proper variable extraction patterns
- Issue Discovered: Historical statistics were using ALL-TIME data (including future games during training)
- Impact: Models appeared to have 70%+ accuracy but would fail in production
- Solution: Implemented strict temporal boundaries - only prior games used for all statistics
- Result: More realistic 56-64% accuracy BUT 60.9% ROI (up from 27.8%) due to higher quality signals
- Production Ready: Models now perform consistently with live data
- Upgraded: From basic
eval_metric='logloss'to production-tuned hyperparameters - Parameters: 300 estimators, 0.05 learning rate, depth 6, with L1/L2 regularization
- Benefits: Better generalization, reduced overfitting, more stable predictions
- Monte Carlo: Separate lighter parameters for faster feature selection (100 estimators, depth 4)
- Upgraded: From 8-feature subsets to 15-feature subsets for better coverage
- Iterations: Increased from 100 to 200 iterations for more thorough optimization
- Results: Found optimal 7-8 feature models through comprehensive search space
- Next 10 Underdog Bets: New prominent section showing actionable betting opportunities
- Chronological order of next 10 recommended underdog bets
- Complete betting info: favored team, underdog, spread, model confidence
- Real payout calculations with exact profit amounts for $100 bets
- Enhanced Recent Bets Display: Added "Favored" column showing who sportsbooks favored
- Corrected Spread Logic: Fixed favorite/underdog identification (spread_line interpretation)
- Improved User Experience: Clear explanations, better formatting, actionable insights
- Quality: Better feature combinations leading to improved model stability
- Discovered: Dashboard was showing inconsistent threshold information
- Fixed: Updated to reflect actual F1-optimized threshold (now 28% after leakage fixes)
- Impact: Users now see accurate betting strategy information
- Fixed: Deprecated
use_container_widthandwidth='stretch'parameters - Updated: All dataframe displays now use modern Streamlit best practices
- Result: Dashboard works with latest Streamlit versions without errors
- Issue: "Next 10 Underdog Bets" and "Next 10 Spread Bets" sections showing 2020 games instead of upcoming games
- Root Cause:
predictions_dfvariable was being modified by earlier dashboard sections (converting gameday to strings, filtering for past dates) - Solution: Each betting section now reloads fresh data from CSV and properly filters for future games only (
gameday > today) - Impact: Betting recommendations now correctly display only upcoming games in chronological order
- Technical: Fixed variable mutation issues across Streamlit sections through data isolation
- Added: Large file support for
nfl_play_by_play_historical.csv.gz(73.95MB) - Benefit: Enables deployment to Streamlit Cloud with access to large datasets
- Setup: Properly configured
.gitignoreand.gitattributesfor optimal repo management
- Current Season Performance: Added real-time team form tracking within current season
- Prior Season Records: Incorporated previous season's final win percentages for baseline metrics
- Head-to-Head History: Added historical matchup performance between specific teams
- Technical Benefits: Multi-scale temporal analysis with strict data leakage prevention
- Model Impact: Richer context for predictions across multiple time horizons
- Fixed All KeyError Issues: Resolved feature list mismatches between training and dashboard
- Synchronized Feature Sets: Ensured consistency across all 50+ features in both
nfl-gather-data.pyandpredictions.py - Enhanced Monte Carlo Selection: Fixed feature sampling to only use available numeric features
- Robust Model Retraining: Updated dashboard to handle dynamic feature selection correctly
- Data Pipeline Validation: Added comprehensive checks for feature availability and data integrity
- Graceful Error Handling: System now handles missing features and data inconsistencies smoothly
- Optimized Feature Loading: Streamlined data processing for faster dashboard startup times
- Memory Management: Improved handling of large datasets with better resource utilization
- Cross-Platform Compatibility: Enhanced PowerShell and terminal command support
- Real-Time Validation: Added live feature availability checking during Monte Carlo experiments
- Automated Fallbacks: System degrades gracefully when optional features are unavailable
-
Install Dependencies
pip install streamlit pandas numpy xgboost scikit-learn
-
Train Initial Models
# Single-step: build historical data and train models
python build_and_train_pipeline.py-
Launch Dashboard
streamlit run predictions.py
-
Deploy to Streamlit Cloud (Recommended)
- All data files are committed to repository
- Memory optimizations ensure compatibility with Streamlit Cloud resource limits
- Python 3.12 required for deployment
-
Start Analyzing
- Check "Show Betting Analysis & Performance" for ROI metrics
- Look for
pred_underdogWon_optimal = 1in upcoming games - Use the edge calculations to find the best value bets
"TypeError: 'str' object cannot be interpreted as an integer"
- Cause: Old Streamlit version compatibility issue
- Fix: Updated in recent version - use
git pullto get latest fixes
"Missing file: nfl_play_by_play_historical.csv.gz"
- Cause: Large file not available locally
- Fix: Dashboard includes error handling and fallback data display
"Betting signals don't match 30% threshold"
- Cause: Documentation was incorrect (now fixed)
- Reality: Model uses 24% F1-optimized threshold, not 30%
- Training time: ~5-10 minutes with optimized parameters (300 estimators)
- Monte Carlo time: ~3-5 minutes for feature selection (200 iterations, 15-feature subsets)
- Dashboard load time: ~10-30 seconds for full data processing
- Memory usage: ~1.5GB during data loading (optimized with float32/Int8 dtypes and DataFrame views)
- Streamlit Cloud: Fully compatible with memory optimizations for cloud deployment
-#### KeyError: Features not in index
- Cause: Feature list mismatch between training and dashboard files
- Solution: Run
python build_and_train_pipeline.py(orpython nfl-gather-data.pyto run only the training step) to regenerate feature files and ensure synchronization - Prevention: Both files now auto-sync feature lists to prevent future mismatches
- Cause: Old Streamlit version compatibility issue
- Solution: System now includes modern Streamlit compatibility - use
git pullfor latest fixes
- Cause: Large file not available locally (Git LFS)
- Solution: Dashboard includes graceful error handling and fallback data display
- Alternative: Download manually or use Git LFS:
git lfs pull
- Cause: Trying to sample features not available in processed dataset
- Solution: Fixed - system now validates feature availability before sampling
- Features: Enhanced error handling with automatic feature filtering
- Cause: XGBoost receiving non-numeric columns (object data types like team names, coaches)
- Solution: Fixed - automatic filtering to numeric/boolean/categorical data types only
- Prevention: All model inputs now validated for XGBoost compatibility before training
- Solution: Use different port:
streamlit run predictions.py --server.port=XXXX - Default ports: Try 8501, 8502, 8503, etc.
- Terminal: Check for running Streamlit processes with
tasklist | findstr streamlit
- Cause: Variable mutation across Streamlit sections -
predictions_dfmodified by earlier filters - Solution: Fixed in latest version - sections now reload fresh data and filter properly
- Verification: Betting recommendations should show games with future dates only
- Technical: Each section uses isolated dataframe copy to prevent cross-section contamination
- Faster Loading: Use
@st.cache_datadecorator for expensive operations (already implemented) - Memory Management: Dashboard automatically handles large datasets with chunking
- Feature Selection: Start with smaller Monte Carlo iterations (50-100) for faster testing
This tool is for educational and analytical purposes. While our backtesting shows strong historical performance:
- Past performance doesn't guarantee future results
- Only bet what you can afford to lose
- Consider this one factor in your betting decisions
- Gambling involves risk - bet responsibly
This project welcomes contributions! Areas for improvement:
- Additional data sources (weather, injuries, etc.)
- Enhanced feature engineering
- Alternative modeling approaches
- UI/UX improvements
Built with: Python • Streamlit • XGBoost • Scikit-learn • NFLverse Data
- Where to find exports: The app exposes always-visible download controls in the sidebar for:
Predictions CSV— current predictions shown in the app (with embedded CSV icon)Betting Log— thebetting_recommendations_log.csvfile (with embedded CSV icon)Generate Predictions PDF— on-demand PDF generation with embedded PDF icon
- Why you might not see them: The Streamlit sidebar can be collapsed by default. If you do not see the download buttons, click the small chevron in the top-left of the page to expand the sidebar.
- Behavior: Download buttons are rendered after the app finishes loading data (the app reserves lightweight placeholders while data loads and populates the real download controls once
predictions_dfand the betting log are available). - Icon consistency: All download buttons and the PDF generate button use embedded icons (
csv_icon.png,pdf_icon.png) with fallback tofavicon.icofor a professional appearance.
Automated HTML email notifications with clear, actionable betting recommendations. Emails show:
Enhanced Format Features:
- Clear bet recommendations: "TEN +2.5 to cover (69.1%)" instead of cryptic "SP: 69.06%"
- Individual confidence badges: Each bet shows its tier (🔥 ELITE ≥65%, ⭐ STRONG 60-65%, 📈 GOOD 55-60%)
- Full bet names: "Money Line" instead of "ML" for clarity
- Smart filtering: Only shows bets above thresholds (Spread ≥50%, Moneyline ≥28%, Totals ≥50%)
- Team colors: Visual team markers for quick identification
Setup:
Environment variables (recommended):
EMAIL_FROM— sender email address (e.g.youremail@gmail.com)EMAIL_TO— comma-separated recipient addresses (e.g.alice@example.com,bob@example.com)EMAIL_PASSWORD— Gmail App Password (create in your Google Account's security settings)SMTP_SERVER— SMTP server (defaultsmtp.gmail.com)SMTP_PORT— SMTP port (default587)
Testing:
# Preview email format in browser without sending
python scripts/preview_email.py
# Send test email
python scripts/send_rich_email_now.pyExample PowerShell export (set for current session):
$env:EMAIL_FROM = "youremail@gmail.com"
$env:EMAIL_TO = "recipient@example.com"
$env:EMAIL_PASSWORD = "<your-app-password>"
$env:SMTP_SERVER = "smtp.gmail.com"
$env:SMTP_PORT = "587"Notes:
- Do NOT commit credentials to the repository. Use environment variables or a secure secrets store.
- For production use, consider the Gmail API (OAuth2) or a transactional email provider (SendGrid, Mailgun) for better deliverability and key rotation.
To validate the core imports and data-loading behavior without running the full Streamlit UI, use the included smoke test:
# Run a lightweight import-and-load smoke test
python smoke_test.pyThis prints basic progress and a final SMOKE OK line when successful. For full interactive use:
python build_and_train_pipeline.py # builds predictions/data if needed (single-step)
streamlit run predictions.py- Icon Consistency: All download buttons and PDF generate button now use embedded icons (
csv_icon.png,pdf_icon.png) with proper fallback tofavicon.icofor professional appearance. Fixed incorrect fallback icon path fromfavicon.pngtofavicon.ico. - PDF Generate Button: Updated to use HTML with embedded PDF icon instead of emoji, matching the visual style of download buttons.
- Download UX: Sidebar download controls are rendered from placeholders and populated only after
predictions_dfand the betting log finish loading to avoid rendering heavy widgets during initial load.
- Per-Game Detail Page: Added a dedicated per-game view reachable with
?game=<game_id>showing matchup summary, model predictions, and a shareable link. The per-game page uses lazy loading and intentionally avoids loading the full play-by-play dataset by default to reduce memory pressure. - Underdog Labeling: Per-game header now highlights the underdog in bold using spread-first logic and a moneyline fallback when spreads are unavailable.
- Schedule Links: Schedule and table links now use path-relative
?game=query parameters andtarget="_self"so links open in the same tab and remain compatible with subpath deployments. - Season-aware Matching: Schedule→prediction matching was tightened to prefer predictions from the same season, preventing accidental linking to historical game IDs.
- Download UX: Sidebar download buttons are rendered from lightweight placeholders and populate once
predictions_dfand related data finish loading to avoid early widget creation and reduce perceived load-time issues. - QB Names & Header Polish: Away/home QB names included in the per-game header; full team names now show before logos and gameday times set to
00:00:00are hidden.
If you'd like, I can also add screenshots or a short demo GIF for the per-game page to the README.
- Per‑Game UI polish & layout fixes: The per‑game detail page (
?game=<game_id>) was updated to improve readability and alignment:- Metrics and info now start at the left edge (removed the centered spacer that pushed content right).
- Spread/Total and probability metrics were re-aligned for clearer visual grouping under the
@marker.
- Team & QB presentation: Team names use a larger, bold display (approx. 30px) with responsive CSS; QB lines include extra vertical spacing to separate them from the first-level metrics.
- Responsive CSS: Inline CSS classes (
.team-name,.team-qb) and a mobile media query were added to keep the per‑game header tidy on small screens. - Memory & loading behavior: The per‑game page no longer loads the large play‑by‑play dataset by default and avoids reading the betting log CSV during initial per‑game rendering to reduce memory pressure.
- Betting log removed from per‑game view: The betting-log table and per‑game CSV download were removed from the per‑game page UI by design; the Betting Performance / Performance dashboard still uses the centralized betting log for analytics.
- Bug fix: Fixed a
NameErrorby ensuring UI columns are always created before use (prevents occasional crashes when conditions changed).
