🛡️ SprintGuard

Predictive Sprint Planning & Risk Mitigation Platform

SprintGuard uses machine learning to predict risk levels of user stories, helping Agile teams avoid estimation failure and scope creep.

🚀 Quick Start

Prerequisites

Python 3.9 or higher
pip (Python package manager)

Installation

Clone or navigate to the project directory:

cd /home/jovyan/SprintGuard

Create and activate virtual environment:

python3 -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

Install dependencies incrementally:

# Core web application (required)
pip install -r requirements.txt

# Data augmentation pipeline (required for first-time setup)
pip install -r requirements-augmentation.txt

# ML model training and inference (required for risk prediction)
pip install -r requirements-ml.txt
python -m spacy download en_core_web_sm

# Development tools (optional)
pip install -r requirements-dev.txt

First-Time Setup: Data Augmentation

Before running the application, you need to augment the NeoDataset (~20K user stories from HuggingFace) with risk labels:

# This downloads NeoDataset and applies weak supervision pipeline
# Takes ~15-30 minutes
python scripts/augment_neodataset.py

This creates:

data/neodataset_augmented.csv - Full augmented dataset
data/neodataset_augmented_high_confidence.csv - High-confidence subset

Train the ML Model

After augmentation, train the DistilBERT-XGBoost risk model:

./scripts/train_ml_model.sh

This script will:

Check for and create augmented dataset if needed
Download spaCy model if missing
Train the model with proper PYTHONPATH
Run validation tests

Model artifacts are saved to the models/ directory.

Start the Application

python app.py

Open your browser: http://localhost:5001

📊 Features

1. Data Health Check

Assesses the quality and quantity of your historical data to set realistic expectations about prediction accuracy.

2. Probabilistic Story Assessor (PSA)

Analyzes new user stories and assigns risk levels (Low/Medium/High) based on ML models trained on real-world data.

3. Scope Impact Simulator (SIS)

Models the timeline impact of adding new work to a sprint, making scope creep costs tangible.

🏗️ Architecture

Backend: Python 3.9+ with Flask 3.0
Data Source: Augmented NeoDataset (~20K real user stories)
ML Pipeline: Snorkel (weak supervision) + Cleanlab (noise filtering)
Risk Model: DistilBERT-XGBoost with SHAP explainability

📚 Documentation

Comprehensive documentation is available in the docs/ directory:

SETUP.md - Detailed installation and configuration guide
AUGMENTATION_STATUS.md - NeoDataset augmentation pipeline details
ML_MODEL_GUIDE.md - ML model training and usage
ML_ARCHITECTURE.md - Technical architecture of ML components
IMPLEMENTATION_SUMMARY.md - Full implementation overview
research/ - Research notes on ML techniques

📡 API Endpoints

GET /api/health-check - Data quality assessment
POST /api/assess-risk - Story risk prediction
POST /api/simulate-scope - Timeline impact simulation
GET /api/stories - Historical stories retrieval
GET /api/info - System information

🧪 Running Tests

pip install -r requirements-dev.txt
pytest
pytest --cov=src --cov-report=html  # With coverage

📁 Project Structure

SprintGuard/
├── app.py                          # Flask application
├── config.py                       # Configuration
├── requirements*.txt               # Dependencies (core, augmentation, ML, dev)
├── src/
│   ├── analyzers/                  # Risk assessment, health check, scope simulation
│   ├── ml/                         # ML pipeline (augmentation, training, inference)
│   ├── models/                     # Data models (Story)
│   └── utils/                      # Utilities
├── scripts/
│   ├── augment_neodataset.py       # Main augmentation script
│   ├── train_ml_model.sh           # Model training script
│   └── explore_neodataset.py       # Data exploration tool
├── tests/                          # Unit tests
├── docs/                           # Documentation
└── data/                           # Data files (generated)

🔮 Future Enhancements

Dynamic Resource Forecaster - Skill-based bottleneck detection
Jira Cloud Integration - Real-time API connection
Team Calibration Tool - Improve estimation consistency
Advanced ML Models - Deep learning for pattern recognition
Custom Dashboards - Exportable reports for stakeholders

🐛 Troubleshooting

Augmented dataset not found

python scripts/augment_neodataset.py

Port 5001 already in use

Edit config.py and change PORT = 5001 to another value.

ModuleNotFoundError

Use the training script which handles PYTHONPATH automatically:

./scripts/train_ml_model.sh

Or if running Python directly, set PYTHONPATH first:

export PYTHONPATH="$(pwd):$PYTHONPATH"
python src/ml/train_risk_model.py

spaCy model not found

python -m spacy download en_core_web_sm

📄 License

This Proof of Concept is provided as-is for educational purposes.

Built with ❤️ to help Agile teams break the cycle of estimation failure and scope creep.

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
Context_and_info/Text_files		Context_and_info/Text_files
data/neodataset		data/neodataset
docs		docs
logs		logs
scripts		scripts
src		src
tests		tests
visualizations		visualizations
.cursorignore		.cursorignore
.gitignore		.gitignore
PROJECT_FLOW.md		PROJECT_FLOW.md
README.md		README.md
app.py		app.py
config.py		config.py
model_architecture.pdf		model_architecture.pdf
pytest.ini		pytest.ini
requirements-augmentation.txt		requirements-augmentation.txt
requirements-dev.txt		requirements-dev.txt
requirements-ml.txt		requirements-ml.txt
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🛡️ SprintGuard

🚀 Quick Start

Prerequisites

Installation

First-Time Setup: Data Augmentation

Train the ML Model

Start the Application

📊 Features

1. Data Health Check

2. Probabilistic Story Assessor (PSA)

3. Scope Impact Simulator (SIS)

🏗️ Architecture

📚 Documentation

📡 API Endpoints

🧪 Running Tests

📁 Project Structure

🔮 Future Enhancements

🐛 Troubleshooting

Augmented dataset not found

Port 5001 already in use

ModuleNotFoundError

spaCy model not found

📄 License

About

Uh oh!

Releases

Packages

Languages

venkat1924/SprintGuard

Folders and files

Latest commit

History

Repository files navigation

🛡️ SprintGuard

🚀 Quick Start

Prerequisites

Installation

First-Time Setup: Data Augmentation

Train the ML Model

Start the Application

📊 Features

1. Data Health Check

2. Probabilistic Story Assessor (PSA)

3. Scope Impact Simulator (SIS)

🏗️ Architecture

📚 Documentation

📡 API Endpoints

🧪 Running Tests

📁 Project Structure

🔮 Future Enhancements

🐛 Troubleshooting

Augmented dataset not found

Port 5001 already in use

ModuleNotFoundError

spaCy model not found

📄 License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages