A production-ready machine learning framework for building, training, deploying, and monitoring ML models at scale.
- Modular Architecture: Clean separation of concerns with data layer, ML components, and applications
- Multiple Data Sources: Built-in connectors for PostgreSQL, MongoDB, AWS S3, Azure Blob Storage
- ML Models: Support for Random Forest, XGBoost, and easy extensibility for custom models
- Data Processing: Complete preprocessing pipeline with feature engineering and scaling
- Cross-Validation: K-Fold and Stratified K-Fold validation with comprehensive metrics
- Experiment Tracking: Integration with MLflow for experiment management
- Configuration Management: Pydantic-based configuration with validation
- Testing: Comprehensive test suite with pytest
- Monitoring: Model performance tracking and data drift detection
- API Server: FastAPI-based REST API for model serving
- Production Ready: Docker support, CI/CD pipelines, and best practices
- Installation
- Quick Start
- Project Structure
- Usage Examples
- Configuration
- Testing
- API Documentation
- Contributing
- License
pip install ml-service-frameworkNote: The GitHub repository is named MLOps-Boilerplate, but the PyPI package is ml-service-framework.
After installation, create a new ML project using the template:
# Create a new project
ml-create-project my-ml-project
# Navigate to your project
cd my-ml-project
# Set up virtual environment
python -m venv venv
source venv/bin/activate # On Windows: venv\\Scripts\\activate
# Install dependencies
pip install -r requirements.txt
# Configure environment
cp .env.example .env
# Edit .env with your configuration# Clone the repository
git clone https://github.com/kython220282/MLOps-Boilerplate.git
cd MLOps-Boilerplate
# Create virtual environment
python -m venv venv
source venv/bin/activate # On Windows: venv\\Scripts\\activate
# Install in development mode
pip install -e .docker build -t ml-service-framework .
docker run -p 8000:8000 ml-service-framework# Copy environment template
cp .env.example .env
# Edit .env with your configuration# Using CLI
ml-train --config config/training_config.json
# Or with Python
python -m ml_service.applications.training --config config/training_config.jsonml-inference --model-path models/model.joblib \\
--input-path data/test.csv \\
--output-path predictions.csvml-serve --model-path models/model.joblib --port 8000machine_learning_service/
βββ ml_service/ # Main package
β βββ applications/ # Application entry points
β β βββ training.py # Training CLI application
β β βββ inference.py # Inference CLI application
β βββ data_layer/ # Data connectors
β β βββ data_connector.py # Database connectors
β β βββ object_connector.py # Cloud storage connectors
β βββ machine_learning/ # ML components
β β βββ data_processor.py # Data preprocessing
β β βββ model.py # Model definitions
β β βββ training_pipeline.py # Training orchestration
β β βββ cross_validator.py # Model validation
β βββ config.py # Configuration management
βββ config/ # Configuration files
β βββ training_config.json
β βββ training_config.yaml
βββ tests/ # Test suite
βββ docs/ # Documentation
βββ requirements.txt # Dependencies
βββ setup.py # Package setup
βββ pyproject.toml # Project configuration
βββ .env.example # Environment template
βββ README.md # This file
From CSV File:
from ml_service.machine_learning.training_pipeline import TrainingPipeline
config = {
"data_source": {"type": "file", "path": "data/train.csv"},
"target_column": "target",
"model": {"type": "random_forest", "n_estimators": 100},
"task_type": "classification"
}
pipeline = TrainingPipeline(config)
metrics = pipeline.run_pipeline()From Database:
config = {
"data_source": {
"type": "database",
"connector_type": "postgresql",
"connection_config": {
"host": "localhost",
"database": "ml_db"
},
"query": "SELECT * FROM training_data"
},
"target_column": "target"
}from ml_service.machine_learning.model import BaseModel
class CustomModel(BaseModel):
def build_model(self):
# Your model architecture
pass
def train(self, X_train, y_train):
# Training logic
pass
def predict(self, X):
# Prediction logic
passfrom ml_service.machine_learning.data_processor import DataProcessor
processor = DataProcessor()
df = processor.load_data("data/train.csv")
X_train, X_test, y_train, y_test = processor.preprocess_pipeline(
df, target_column="target"
)Create a JSON or YAML configuration file:
{
"data_source": {
"type": "file",
"path": "data/train.csv"
},
"target_column": "target",
"model": {
"type": "random_forest",
"n_estimators": 100,
"max_depth": 10
},
"data_processing": {
"missing_value_strategy": "mean",
"scaling_method": "standard"
},
"cross_validation": {
"type": "kfold",
"n_splits": 5
}
}Set in .env file:
# Database
DB_TYPE=postgresql
DB_HOST=localhost
DB_PORT=5432
# MLflow
MLFLOW_TRACKING_URI=http://localhost:5000
# API
API_HOST=0.0.0.0
API_PORT=8000Run the test suite:
# Run all tests
pytest
# Run with coverage
pytest --cov=ml_service --cov-report=html
# Run specific test file
pytest tests/test_model.py
# Run with markers
pytest -m unit
pytest -m integrationStart the API server and visit http://localhost:8000/docs for interactive Swagger documentation.
Health Check:
curl http://localhost:8000/healthMake Prediction:
curl -X POST http://localhost:8000/predict \\
-H "Content-Type: application/json" \\
-d '{"features": [1.5, 2.3, 3.1, 4.2, 5.0]}'Contributions are welcome! Please follow these steps:
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
# Install dev dependencies
pip install -r requirements.txt
pip install -e ".[dev]"
# Install pre-commit hooks
pre-commit install
# Run code formatting
black ml_service tests
isort ml_service tests
# Run linting
flake8 ml_service tests
mypy ml_serviceThis project is licensed under the MIT License - see the LICENSE file for details.
- scikit-learn for ML algorithms
- MLflow for experiment tracking
- FastAPI for API framework
- Pydantic for configuration management
Created by: Karan Raj Sharma
GitHub: @kython220282
Repository: MLOps-Boilerplate
If you use this framework in your projects, please consider:
- β Star this repository on GitHub
- π Add credits in your project documentation:
Built with [MLOps-Boilerplate](https://github.com/kython220282/MLOps-Boilerplate) by Karan Raj Sharma
- π Link back to this repository
- π¬ Share your project - Open an issue to showcase what you've built!
Your support helps maintain and improve this framework for everyone. Thank you! π
For questions and support:
- Create an issue on GitHub
- Email: karan.rajsharma@yahoo.com
- Repository: https://github.com/kython220282/MLOps-Boilerplate
Happy Model Building! π