Skip to content

AMANSINGH1674/Semantic-Powered--MLES_Platform

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Semantic-Powered Machine Learning Execution System (MLES)

A production-ready ML platform for the entire model lifecycle

Python 3.8+ MLflow Sematic License: MIT

Overview

Semantic-MLES is an end-to-end machine learning platform that enables teams to prototype ML pipelines locally and run the same pipeline on Kubernetes without code changes ("laptop-to-cloud parity"). It covers the full ML lifecycle: data ingestion, validation, feature engineering, training, evaluation, model serving, monitoring, and deployment.


Key Features

Feature Description
End-to-End ML Pipeline Data ingestion → preprocessing → training → evaluation → deployment in a single Sematic DAG
Laptop-to-Cloud Parity Identical DAG definition and container images across dev and prod environments
Multi-Strategy Deployment Blue-green, canary, and rolling Kubernetes deployments with automatic rollback
Model Monitoring Real-time drift detection (KS-test, Chi-squared, PSI), performance tracking, and Prometheus metrics
Feature Store Feast integration for both historical and online feature retrieval
Data Validation Great Expectations and Pandera schema validation at every pipeline stage
Explainability & Fairness SHAP-based feature attribution and Fairlearn fairness auditing
Experiment Tracking MLflow and Weights & Biases (W&B) for parameter, metric, and artifact logging
Configurable Resources Per-stage CPU/GPU/memory allocation via ResourceSpec

Quickstart: MLflow Demo

The fastest way to see the platform in action is the standalone MLflow demo, which trains 10 RandomForest experiments and logs them to the MLflow tracking server.

1. Set up the environment

python -m venv mles_env
source mles_env/bin/activate
pip install -r requirements.txt

seaborn, deepmerge, and tf-keras are now included in requirements.txt.

2. Run the demo

python simple_mlflow_example.py

Expected output:

Starting experiment: mles_demo - baseline_model
Completed baseline_model - R2: 0.9732, RMSE: 0.4176
...
Generated 10 runs in MLflow!

3. Explore in the MLflow UI

mlflow ui --port 5000

Open http://localhost:5000 to see the mles_demo experiment with 10 completed runs. Each run contains logged parameters, metrics (R², RMSE, MSE), a trained model artifact, and a feature importance plot.

Run name Description
baseline_model Default parameters
optimized_model Tuned hyperparameters
deep_model Deeper architecture
wide_model Wider architecture
balanced_model Balanced performance
fast_model Optimised for inference speed
accurate_model Maximum accuracy
efficient_model Resource efficiency
robust_model Improved outlier robustness
final_model Production-ready best tradeoff

CLI Usage

After pip install -e ., the mles command is available with six subcommands.

Train a model

mles train \
  --data-source s3://my-bucket/data.csv \
  --target-column price \
  --model-type xgboost \
  --cluster                  # omit to run locally
  --tune-hyperparameters

Supported model types: xgboost, random_forest, logistic_regression, neural_network.

Run batch inference

mles predict \
  --model-name xgboost_price \
  --model-version latest \
  --data-source s3://my-bucket/inference.csv \
  --output-path s3://my-bucket/predictions.parquet

Serve a model

mles serve \
  --model-name xgboost_price \
  --model-version 3 \
  --host 0.0.0.0 \
  --port 8000

This starts a FastAPI server with /predict, /batch_predict, /health, and /metrics endpoints.

Deploy to Kubernetes

mles deploy \
  --model-name xgboost_price \
  --model-version 3 \
  --environment prod \
  --namespace mles

Deployment strategy is set in config/production.yaml (blue_green, canary, or rolling). Canary releases automatically shift traffic in steps (10% → 25% → 50% → 75% → 100%) and roll back if the error rate or P99 latency exceeds the configured thresholds.

Monitor a deployed model

mles monitor \
  --model-name xgboost_price \
  --model-version 3 \
  --interval 300   # check every 5 minutes

Initialise a new project

mles init --config-path ./config

Local Development Stack

docker-compose.yml starts all supporting services locally:

docker-compose up -d
Service Port Purpose
MLflow 5000 Experiment tracking and model registry
Sematic 8001 Pipeline DAG orchestration
MLES Model Server 8000 FastAPI model serving
Prometheus 9090 Metrics scraping
Grafana 3000 Metrics dashboards
AlertManager 9093 Alerting
Redis 6379 Feature store backend and caching
PostgreSQL 5432 Metadata storage
Kafka 9092 Streaming data ingestion
Zookeeper 2181 Kafka coordination

Project Structure

├── simple_mlflow_example.py    # Standalone MLflow demo
├── src/                        # Core platform code
│   ├── cli.py                  # mles CLI entry point
│   ├── config.py               # Configuration dataclasses
│   ├── models/                 # BaseModel, ModelRegistry, XGBoostModel, etc.
│   ├── pipelines/              # TrainingPipeline, InferencePipeline (Sematic DAGs)
│   ├── preprocessing/          # DataLoader, DataValidator, FeatureEngineer
│   ├── monitoring/             # ModelMonitor, DriftDetector, MetricsCollector
│   └── deployment/             # DeploymentManager, ModelServer, InferenceService
├── production/                 # Kubernetes manifests, Terraform, prod Docker config
│   ├── kubernetes/
│   │   ├── base/               # Base deployment and ConfigMap
│   │   ├── overlays/           # prod and staging Kustomize overlays
│   │   └── monitoring/         # Prometheus and Grafana configs
│   ├── docker/                 # Dockerfile.model-server
│   └── scripts/                # deploy.sh, health-check.sh
├── development/                # Local dev configs and scripts
│   ├── config/                 # local.yaml, development.yaml
│   ├── docker/                 # docker-compose.dev.yml
│   └── scripts/                # setup_dev.sh, run_tests.sh
├── ci-cd/                      # GitHub Actions workflows
├── config/                     # Config templates (example_config.yaml)
├── samples/                    # Example scripts and tutorials
├── scripts/                    # Utility scripts
├── tests/                      # Test suite
├── docs/                       # Documentation
├── docker-compose.yml          # Full local service stack
├── requirements.txt
└── setup.py

Configuration

Copy env.example to .env and set the relevant variables:

cp env.example .env

Key environment variables:

Variable Default Description
MLFLOW_TRACKING_URI http://localhost:5000 MLflow server URL
S3_BUCKET mles-data Default data bucket
KAFKA_BROKERS localhost:9092 Kafka broker addresses
FEATURE_STORE_URI redis://localhost:6379 Feast online store backend
PROMETHEUS_PUSHGATEWAY Prometheus Pushgateway URL
CLUSTER_MODE false Set true to submit to Kubernetes

For full configuration options see Configuration Guide.

Pipeline behaviour, deployment thresholds, and resource allocation are controlled via YAML:

# config/example_config.yaml
monitoring:
  drift_threshold: 0.15
  accuracy_threshold: 0.8
  evaluation_interval_minutes: 15

deployment:
  canary_percentage: 10
  rollback_error_threshold: 0.05
  rollback_latency_p99_ms: 1000

training_resources:
  cpu: "4"
  memory: "8Gi"
  gpu: "1"

Architecture

The platform is built around four layers:

CLI (src/cli.py)
  └── Pipelines (Sematic DAGs)
        ├── TrainingPipeline  — 8 stages: load → validate → split → engineer → validate → train → evaluate → register
        └── InferencePipeline — 6 stages: load model → load data → engineer → predict → monitor → save

  └── Components
        ├── DataLoader        — S3, local, Kafka, HTTP
        ├── DataValidator     — Great Expectations, Pandera, built-in quality checks
        ├── FeatureEngineer   — scaling, encoding, selection, Feast integration
        ├── ModelRegistry     — XGBoost, RandomForest, LogisticRegression
        ├── ModelMonitor      — Prometheus metrics, drift detection, SHAP, Fairlearn
        └── DeploymentManager — Blue-green, canary, rolling via Kubernetes API

For more detail see Architecture Overview.


Model Serving API

When running mles serve, the following endpoints are available:

Method Endpoint Description
GET /health Model and server health check
POST /predict Single prediction
POST /batch_predict Batch predictions
GET /metrics Latency and throughput metrics
POST /reload Hot-reload model from new path

Single prediction request:

curl -X POST http://localhost:8000/predict \
  -H "Content-Type: application/json" \
  -d '{"features": {"feature_1": 1.5, "feature_2": "category_a"}}'

See API Documentation for full schema.


Running Tests

source mles_env/bin/activate
python -m pytest tests/test_pipeline.py -v

Expected result: 15 passed, 2 skipped.

The 2 skipped tests are for optional integrations that require specific environment setup:

Skipped test Reason Fix
TestBentoMLServing deepmerge missing from the active environment pip install deepmerge
TestExplainabilityFairness::test_shap_explainability Keras 3 installed without tf-keras pip install tf-keras

Both packages are listed in requirements.txt and will be present in a clean install.


Troubleshooting

Issue Solution
ModuleNotFoundError: No module named 'src' Run commands from the project root or install with pip install -e .
MLflow UI not starting Ensure the virtual environment is active: source mles_env/bin/activate
Missing dependencies Run pip install -r requirements.txt
Port conflict on 5000 Use mlflow ui --port 5001
Kubernetes config not found Ensure ~/.kube/config exists or run inside a cluster pod
W&B init failure Set WANDB_MODE=disabled to skip W&B without affecting other tracking

For more see Troubleshooting Guide.


Documentation


License

MIT License — see the LICENSE file for details.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors