Semantic-Powered Machine Learning Execution System (MLES)

A production-ready ML platform for the entire model lifecycle

Overview

Semantic-MLES is an end-to-end machine learning platform that enables teams to prototype ML pipelines locally and run the same pipeline on Kubernetes without code changes ("laptop-to-cloud parity"). It covers the full ML lifecycle: data ingestion, validation, feature engineering, training, evaluation, model serving, monitoring, and deployment.

Key Features

Feature	Description
End-to-End ML Pipeline	Data ingestion → preprocessing → training → evaluation → deployment in a single Sematic DAG
Laptop-to-Cloud Parity	Identical DAG definition and container images across dev and prod environments
Multi-Strategy Deployment	Blue-green, canary, and rolling Kubernetes deployments with automatic rollback
Model Monitoring	Real-time drift detection (KS-test, Chi-squared, PSI), performance tracking, and Prometheus metrics
Feature Store	Feast integration for both historical and online feature retrieval
Data Validation	Great Expectations and Pandera schema validation at every pipeline stage
Explainability & Fairness	SHAP-based feature attribution and Fairlearn fairness auditing
Experiment Tracking	MLflow and Weights & Biases (W&B) for parameter, metric, and artifact logging
Configurable Resources	Per-stage CPU/GPU/memory allocation via `ResourceSpec`

Quickstart: MLflow Demo

The fastest way to see the platform in action is the standalone MLflow demo, which trains 10 RandomForest experiments and logs them to the MLflow tracking server.

1. Set up the environment

python -m venv mles_env
source mles_env/bin/activate
pip install -r requirements.txt

seaborn, deepmerge, and tf-keras are now included in requirements.txt.

2. Run the demo

python simple_mlflow_example.py

Expected output:

Starting experiment: mles_demo - baseline_model
Completed baseline_model - R2: 0.9732, RMSE: 0.4176
...
Generated 10 runs in MLflow!

3. Explore in the MLflow UI

mlflow ui --port 5000

Open http://localhost:5000 to see the mles_demo experiment with 10 completed runs. Each run contains logged parameters, metrics (R², RMSE, MSE), a trained model artifact, and a feature importance plot.

Run name	Description
baseline_model	Default parameters
optimized_model	Tuned hyperparameters
deep_model	Deeper architecture
wide_model	Wider architecture
balanced_model	Balanced performance
fast_model	Optimised for inference speed
accurate_model	Maximum accuracy
efficient_model	Resource efficiency
robust_model	Improved outlier robustness
final_model	Production-ready best tradeoff

CLI Usage

After pip install -e ., the mles command is available with six subcommands.

Train a model

mles train \
  --data-source s3://my-bucket/data.csv \
  --target-column price \
  --model-type xgboost \
  --cluster                  # omit to run locally
  --tune-hyperparameters

Supported model types: xgboost, random_forest, logistic_regression, neural_network.

Run batch inference

mles predict \
  --model-name xgboost_price \
  --model-version latest \
  --data-source s3://my-bucket/inference.csv \
  --output-path s3://my-bucket/predictions.parquet

Serve a model

mles serve \
  --model-name xgboost_price \
  --model-version 3 \
  --host 0.0.0.0 \
  --port 8000

This starts a FastAPI server with /predict, /batch_predict, /health, and /metrics endpoints.

Deploy to Kubernetes

mles deploy \
  --model-name xgboost_price \
  --model-version 3 \
  --environment prod \
  --namespace mles

Deployment strategy is set in config/production.yaml (blue_green, canary, or rolling). Canary releases automatically shift traffic in steps (10% → 25% → 50% → 75% → 100%) and roll back if the error rate or P99 latency exceeds the configured thresholds.

Monitor a deployed model

mles monitor \
  --model-name xgboost_price \
  --model-version 3 \
  --interval 300   # check every 5 minutes

Initialise a new project

mles init --config-path ./config

Local Development Stack

docker-compose.yml starts all supporting services locally:

docker-compose up -d

Service	Port	Purpose
MLflow	5000	Experiment tracking and model registry
Sematic	8001	Pipeline DAG orchestration
MLES Model Server	8000	FastAPI model serving
Prometheus	9090	Metrics scraping
Grafana	3000	Metrics dashboards
AlertManager	9093	Alerting
Redis	6379	Feature store backend and caching
PostgreSQL	5432	Metadata storage
Kafka	9092	Streaming data ingestion
Zookeeper	2181	Kafka coordination

Project Structure

├── simple_mlflow_example.py    # Standalone MLflow demo
├── src/                        # Core platform code
│   ├── cli.py                  # mles CLI entry point
│   ├── config.py               # Configuration dataclasses
│   ├── models/                 # BaseModel, ModelRegistry, XGBoostModel, etc.
│   ├── pipelines/              # TrainingPipeline, InferencePipeline (Sematic DAGs)
│   ├── preprocessing/          # DataLoader, DataValidator, FeatureEngineer
│   ├── monitoring/             # ModelMonitor, DriftDetector, MetricsCollector
│   └── deployment/             # DeploymentManager, ModelServer, InferenceService
├── production/                 # Kubernetes manifests, Terraform, prod Docker config
│   ├── kubernetes/
│   │   ├── base/               # Base deployment and ConfigMap
│   │   ├── overlays/           # prod and staging Kustomize overlays
│   │   └── monitoring/         # Prometheus and Grafana configs
│   ├── docker/                 # Dockerfile.model-server
│   └── scripts/                # deploy.sh, health-check.sh
├── development/                # Local dev configs and scripts
│   ├── config/                 # local.yaml, development.yaml
│   ├── docker/                 # docker-compose.dev.yml
│   └── scripts/                # setup_dev.sh, run_tests.sh
├── ci-cd/                      # GitHub Actions workflows
├── config/                     # Config templates (example_config.yaml)
├── samples/                    # Example scripts and tutorials
├── scripts/                    # Utility scripts
├── tests/                      # Test suite
├── docs/                       # Documentation
├── docker-compose.yml          # Full local service stack
├── requirements.txt
└── setup.py

Configuration

Copy env.example to .env and set the relevant variables:

cp env.example .env

Key environment variables:

Variable	Default	Description
`MLFLOW_TRACKING_URI`	`http://localhost:5000`	MLflow server URL
`S3_BUCKET`	`mles-data`	Default data bucket
`KAFKA_BROKERS`	`localhost:9092`	Kafka broker addresses
`FEATURE_STORE_URI`	`redis://localhost:6379`	Feast online store backend
`PROMETHEUS_PUSHGATEWAY`	—	Prometheus Pushgateway URL
`CLUSTER_MODE`	`false`	Set `true` to submit to Kubernetes

For full configuration options see Configuration Guide.

Pipeline behaviour, deployment thresholds, and resource allocation are controlled via YAML:

# config/example_config.yaml
monitoring:
  drift_threshold: 0.15
  accuracy_threshold: 0.8
  evaluation_interval_minutes: 15

deployment:
  canary_percentage: 10
  rollback_error_threshold: 0.05
  rollback_latency_p99_ms: 1000

training_resources:
  cpu: "4"
  memory: "8Gi"
  gpu: "1"

Architecture

The platform is built around four layers:

CLI (src/cli.py)
  └── Pipelines (Sematic DAGs)
        ├── TrainingPipeline  — 8 stages: load → validate → split → engineer → validate → train → evaluate → register
        └── InferencePipeline — 6 stages: load model → load data → engineer → predict → monitor → save

  └── Components
        ├── DataLoader        — S3, local, Kafka, HTTP
        ├── DataValidator     — Great Expectations, Pandera, built-in quality checks
        ├── FeatureEngineer   — scaling, encoding, selection, Feast integration
        ├── ModelRegistry     — XGBoost, RandomForest, LogisticRegression
        ├── ModelMonitor      — Prometheus metrics, drift detection, SHAP, Fairlearn
        └── DeploymentManager — Blue-green, canary, rolling via Kubernetes API

For more detail see Architecture Overview.

Model Serving API

When running mles serve, the following endpoints are available:

Method	Endpoint	Description
`GET`	`/health`	Model and server health check
`POST`	`/predict`	Single prediction
`POST`	`/batch_predict`	Batch predictions
`GET`	`/metrics`	Latency and throughput metrics
`POST`	`/reload`	Hot-reload model from new path

Single prediction request:

curl -X POST http://localhost:8000/predict \
  -H "Content-Type: application/json" \
  -d '{"features": {"feature_1": 1.5, "feature_2": "category_a"}}'

See API Documentation for full schema.

Running Tests

source mles_env/bin/activate
python -m pytest tests/test_pipeline.py -v

Expected result: 15 passed, 2 skipped.

The 2 skipped tests are for optional integrations that require specific environment setup:

Skipped test	Reason	Fix
`TestBentoMLServing`	`deepmerge` missing from the active environment	`pip install deepmerge`
`TestExplainabilityFairness::test_shap_explainability`	Keras 3 installed without `tf-keras`	`pip install tf-keras`

Both packages are listed in requirements.txt and will be present in a clean install.

Troubleshooting

Issue	Solution
`ModuleNotFoundError: No module named 'src'`	Run commands from the project root or install with `pip install -e .`
MLflow UI not starting	Ensure the virtual environment is active: `source mles_env/bin/activate`
Missing dependencies	Run `pip install -r requirements.txt`
Port conflict on 5000	Use `mlflow ui --port 5001`
Kubernetes config not found	Ensure `~/.kube/config` exists or run inside a cluster pod
W&B init failure	Set `WANDB_MODE=disabled` to skip W&B without affecting other tracking

For more see Troubleshooting Guide.

Documentation

License

MIT License — see the LICENSE file for details.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Semantic-Powered Machine Learning Execution System (MLES)

Overview

Key Features

Quickstart: MLflow Demo

1. Set up the environment

2. Run the demo

3. Explore in the MLflow UI

CLI Usage

Train a model

Run batch inference

Serve a model

Deploy to Kubernetes

Monitor a deployed model

Initialise a new project

Local Development Stack

Project Structure

Configuration

Architecture

Model Serving API

Running Tests

Troubleshooting

Documentation

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
ci-cd		ci-cd
config		config
development		development
docs		docs
examples		examples
mles_env		mles_env
production		production
samples		samples
scripts		scripts
src		src
tests		tests
.gitignore		.gitignore
DEPLOYMENT.md		DEPLOYMENT.md
README.md		README.md
REORGANIZATION_SUMMARY.md		REORGANIZATION_SUMMARY.md
docker-compose.yml		docker-compose.yml
env.example		env.example
logs.txt		logs.txt
reorganize_structure.md		reorganize_structure.md
requirements.txt		requirements.txt
run_inference.py		run_inference.py
setup.py		setup.py
setup_directories.sh		setup_directories.sh
simple_mlflow_example.py		simple_mlflow_example.py
test_basic_functionality.py		test_basic_functionality.py

Folders and files

Latest commit

History

Repository files navigation

Semantic-Powered Machine Learning Execution System (MLES)

Overview

Key Features

Quickstart: MLflow Demo

1. Set up the environment

2. Run the demo

3. Explore in the MLflow UI

CLI Usage

Train a model

Run batch inference

Serve a model

Deploy to Kubernetes

Monitor a deployed model

Initialise a new project

Local Development Stack

Project Structure

Configuration

Architecture

Model Serving API

Running Tests

Troubleshooting

Documentation

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages