Skip to content

kython220282/MLOps-Boilerplate

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

3 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

MLOps-Boilerplate

A production-ready machine learning framework for building, training, deploying, and monitoring ML models at scale.

Python Version License Code style: black PyPI

πŸš€ Features

  • Modular Architecture: Clean separation of concerns with data layer, ML components, and applications
  • Multiple Data Sources: Built-in connectors for PostgreSQL, MongoDB, AWS S3, Azure Blob Storage
  • ML Models: Support for Random Forest, XGBoost, and easy extensibility for custom models
  • Data Processing: Complete preprocessing pipeline with feature engineering and scaling
  • Cross-Validation: K-Fold and Stratified K-Fold validation with comprehensive metrics
  • Experiment Tracking: Integration with MLflow for experiment management
  • Configuration Management: Pydantic-based configuration with validation
  • Testing: Comprehensive test suite with pytest
  • Monitoring: Model performance tracking and data drift detection
  • API Server: FastAPI-based REST API for model serving
  • Production Ready: Docker support, CI/CD pipelines, and best practices

πŸ“‹ Table of Contents

πŸ”§ Installation

Install from PyPI (Recommended)

pip install ml-service-framework

Note: The GitHub repository is named MLOps-Boilerplate, but the PyPI package is ml-service-framework.

Create a New Project

After installation, create a new ML project using the template:

# Create a new project
ml-create-project my-ml-project

# Navigate to your project
cd my-ml-project

# Set up virtual environment
python -m venv venv
source venv/bin/activate  # On Windows: venv\\Scripts\\activate

# Install dependencies
pip install -r requirements.txt

# Configure environment
cp .env.example .env
# Edit .env with your configuration

Install from Source (For Development)

# Clone the repository
git clone https://github.com/kython220282/MLOps-Boilerplate.git
cd MLOps-Boilerplate

# Create virtual environment
python -m venv venv
source venv/bin/activate  # On Windows: venv\\Scripts\\activate

# Install in development mode
pip install -e .

Install with Docker

docker build -t ml-service-framework .
docker run -p 8000:8000 ml-service-framework

πŸƒ Quick Start

1. Setup Environment

# Copy environment template
cp .env.example .env

# Edit .env with your configuration

2. Train a Model

# Using CLI
ml-train --config config/training_config.json

# Or with Python
python -m ml_service.applications.training --config config/training_config.json

3. Run Inference

ml-inference --model-path models/model.joblib \\
             --input-path data/test.csv \\
             --output-path predictions.csv

4. Start API Server

ml-serve --model-path models/model.joblib --port 8000

πŸ“ Project Structure

machine_learning_service/
β”œβ”€β”€ ml_service/                 # Main package
β”‚   β”œβ”€β”€ applications/          # Application entry points
β”‚   β”‚   β”œβ”€β”€ training.py       # Training CLI application
β”‚   β”‚   └── inference.py      # Inference CLI application
β”‚   β”œβ”€β”€ data_layer/           # Data connectors
β”‚   β”‚   β”œβ”€β”€ data_connector.py # Database connectors
β”‚   β”‚   └── object_connector.py # Cloud storage connectors
β”‚   β”œβ”€β”€ machine_learning/     # ML components
β”‚   β”‚   β”œβ”€β”€ data_processor.py # Data preprocessing
β”‚   β”‚   β”œβ”€β”€ model.py          # Model definitions
β”‚   β”‚   β”œβ”€β”€ training_pipeline.py # Training orchestration
β”‚   β”‚   └── cross_validator.py # Model validation
β”‚   └── config.py             # Configuration management
β”œβ”€β”€ config/                    # Configuration files
β”‚   β”œβ”€β”€ training_config.json
β”‚   └── training_config.yaml
β”œβ”€β”€ tests/                     # Test suite
β”œβ”€β”€ docs/                      # Documentation
β”œβ”€β”€ requirements.txt          # Dependencies
β”œβ”€β”€ setup.py                  # Package setup
β”œβ”€β”€ pyproject.toml           # Project configuration
β”œβ”€β”€ .env.example             # Environment template
└── README.md                # This file

πŸ’‘ Usage Examples

Training with Different Data Sources

From CSV File:

from ml_service.machine_learning.training_pipeline import TrainingPipeline

config = {
    "data_source": {"type": "file", "path": "data/train.csv"},
    "target_column": "target",
    "model": {"type": "random_forest", "n_estimators": 100},
    "task_type": "classification"
}

pipeline = TrainingPipeline(config)
metrics = pipeline.run_pipeline()

From Database:

config = {
    "data_source": {
        "type": "database",
        "connector_type": "postgresql",
        "connection_config": {
            "host": "localhost",
            "database": "ml_db"
        },
        "query": "SELECT * FROM training_data"
    },
    "target_column": "target"
}

Custom Model Development

from ml_service.machine_learning.model import BaseModel

class CustomModel(BaseModel):
    def build_model(self):
        # Your model architecture
        pass
    
    def train(self, X_train, y_train):
        # Training logic
        pass
    
    def predict(self, X):
        # Prediction logic
        pass

Data Processing Pipeline

from ml_service.machine_learning.data_processor import DataProcessor

processor = DataProcessor()
df = processor.load_data("data/train.csv")
X_train, X_test, y_train, y_test = processor.preprocess_pipeline(
    df, target_column="target"
)

βš™οΈ Configuration

Training Configuration

Create a JSON or YAML configuration file:

{
  "data_source": {
    "type": "file",
    "path": "data/train.csv"
  },
  "target_column": "target",
  "model": {
    "type": "random_forest",
    "n_estimators": 100,
    "max_depth": 10
  },
  "data_processing": {
    "missing_value_strategy": "mean",
    "scaling_method": "standard"
  },
  "cross_validation": {
    "type": "kfold",
    "n_splits": 5
  }
}

Environment Variables

Set in .env file:

# Database
DB_TYPE=postgresql
DB_HOST=localhost
DB_PORT=5432

# MLflow
MLFLOW_TRACKING_URI=http://localhost:5000

# API
API_HOST=0.0.0.0
API_PORT=8000

πŸ§ͺ Testing

Run the test suite:

# Run all tests
pytest

# Run with coverage
pytest --cov=ml_service --cov-report=html

# Run specific test file
pytest tests/test_model.py

# Run with markers
pytest -m unit
pytest -m integration

πŸ“š API Documentation

Start the API server and visit http://localhost:8000/docs for interactive Swagger documentation.

Example API Endpoints

Health Check:

curl http://localhost:8000/health

Make Prediction:

curl -X POST http://localhost:8000/predict \\
  -H "Content-Type: application/json" \\
  -d '{"features": [1.5, 2.3, 3.1, 4.2, 5.0]}'

🀝 Contributing

Contributions are welcome! Please follow these steps:

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/amazing-feature)
  3. Commit your changes (git commit -m 'Add amazing feature')
  4. Push to the branch (git push origin feature/amazing-feature)
  5. Open a Pull Request

Development Setup

# Install dev dependencies
pip install -r requirements.txt
pip install -e ".[dev]"

# Install pre-commit hooks
pre-commit install

# Run code formatting
black ml_service tests
isort ml_service tests

# Run linting
flake8 ml_service tests
mypy ml_service

πŸ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.

πŸ™ Acknowledgments

  • scikit-learn for ML algorithms
  • MLflow for experiment tracking
  • FastAPI for API framework
  • Pydantic for configuration management

οΏ½β€πŸ’» Credits

Created by: Karan Raj Sharma GitHub: @kython220282
Repository: MLOps-Boilerplate

🌟 If You Use This Framework

If you use this framework in your projects, please consider:

  • ⭐ Star this repository on GitHub
  • πŸ“ Add credits in your project documentation:
    Built with [MLOps-Boilerplate](https://github.com/kython220282/MLOps-Boilerplate) by Karan Raj Sharma
  • πŸ”— Link back to this repository
  • πŸ’¬ Share your project - Open an issue to showcase what you've built!

Your support helps maintain and improve this framework for everyone. Thank you! πŸ™

πŸ“ž Support

For questions and support:


Happy Model Building! πŸŽ‰

About

A production-ready MLOps package for building, training, deploying, and monitoring ML models at scale

Resources

License

Contributing

Stars

Watchers

Forks

Packages

 
 
 

Contributors