This project demonstrates a complete Machine Learning Operations (MLOps) pipeline for predicting Boston Housing prices using a Random Forest Regressor. The pipeline showcases industry best practices for ML workflow automation, version control, and continuous integration.
Task: Regression - Predict median house values in Boston
Model: Random Forest Regressor (100 estimators)
Evaluation: R² Score
Dataset: Boston Housing Dataset
- Kubeflow Pipelines (KFP): ML workflow orchestration on Kubernetes
- DVC (Data Version Control): Dataset versioning and management
- Minikube: Local Kubernetes cluster
- GitHub Actions: Automated CI/CD testing
- Python 3.9: Core language
- scikit-learn: ML library
- Load Data: Fetch dataset from remote URL
- Preprocess: Clean data, split train/test (80/20)
- Train Model: Train Random Forest on training data
- Evaluate: Calculate R² score on test data
mlops_kubeflow/
├── .github/workflows/
│ └── ci.yml # GitHub Actions CI/CD
├── components/ # Compiled Kubeflow components
│ ├── load_data.yaml
│ ├── preprocess_data.yaml
│ ├── train_model.yaml
│ └── evaluate_model.yaml
├── data/ # Dataset (DVC tracked)
├── src/
│ └── pipeline_components.py # Component definitions
├── pipeline.py # Main pipeline definition
├── pipeline.yaml # Compiled pipeline
├── requirements.txt # Python dependencies
├── Jenkinsfile # Jenkins CI/CD
├── test-ci-locally.bat # Local CI test script
└── README.md # This file
---
## Setup Instructions
### Prerequisites
- Python 3.9+
- Docker
- kubectl
- Git
### 1. Install Minikube
**Windows**:
```powershell
choco install minikube
macOS:
brew install minikubeLinux:
curl -LO https://storage.googleapis.com/minikube/releases/latest/minikube-linux-amd64
sudo install minikube-linux-amd64 /usr/local/bin/minikubeminikube start --cpus=4 --memory=8192 --disk-size=20g
minikube statusexport PIPELINE_VERSION=1.8.5
kubectl apply -k "github.com/kubeflow/pipelines/manifests/kustomize/cluster-scoped-resources?ref=$PIPELINE_VERSION"
kubectl wait --for condition=established --timeout=60s crd/applications.app.k8s.io
kubectl apply -k "github.com/kubeflow/pipelines/manifests/kustomize/env/platform-agnostic-pns?ref=$PIPELINE_VERSION"
# Wait for pods to be ready
kubectl get pods -n kubeflow -wkubectl patch configmap workflow-controller-configmap -n kubeflow --type merge -p '{"data":{"containerRuntimeExecutor":"emissary"}}'
kubectl rollout restart deployment workflow-controller -n kubeflowpip install -r requirements.txtdvc init
dvc remote add -d local_storage ../dvc_storage_simulation
dvc add data/
dvc pushpython src/pipeline_components.pyThis generates YAML files in components/:
load_data.yamlpreprocess_data.yamltrain_model.yamlevaluate_model.yaml
python pipeline.pyThis generates pipeline.yaml - the complete workflow.
kubectl -n kubeflow create -f pipeline.yaml --dry-run=client -o json | python -c "import sys, json; d=json.load(sys.stdin); d['spec']['arguments']={'parameters':[{'name':'url','value':'https://raw.githubusercontent.com/selva86/datasets/master/BostonHousing.csv'}]}; print(json.dumps(d))" | kubectl create -f - -n kubeflow# Check workflow status
kubectl get workflow -n kubeflow
# Watch pods
kubectl get pods -n kubeflow | grep boston-housing
# View logs
kubectl logs <pod-name> -n kubeflow
# Get workflow details
kubectl describe workflow <workflow-name> -n kubeflowFile: .github/workflows/ci.yml
Stages:
- Environment Setup: Install Python and dependencies
- Pipeline Compilation: Compile components and pipeline
- Verification: Verify YAML files generated
Triggers: Push to main, pull requests, manual
# Windows
test-ci-locally.bat
# Linux/macOS
python src/pipeline_components.py
python pipeline.py
ls -l pipeline.yaml components/minikube start
minikube stop
minikube status
minikube deletekubectl get pods -n kubeflow
kubectl get workflow -n kubeflow
kubectl logs <pod-name> -n kubeflow
kubectl delete workflow <workflow-name> -n kubeflowpython src/pipeline_components.py # Compile components
python pipeline.py # Compile pipeline
kubectl create -f pipeline.yaml -n kubeflow # Submitdvc add data/
dvc push
dvc pull
dvc statusSolution: Switch to emissary executor
kubectl patch configmap workflow-controller-configmap -n kubeflow --type merge -p '{"data":{"containerRuntimeExecutor":"emissary"}}'
kubectl rollout restart deployment workflow-controller -n kubeflowSolution: Submit with parameters (see Step 3 above)
Solution: Check events
kubectl describe pod <pod-name> -n kubeflowSolution:
minikube delete
minikube start --driver=docker --cpus=4 --memory=8192python src/pipeline_components.py
echo "Components compiled successfully!"test-ci-locally.batpython pipeline.py
ls -l pipeline.yaml- Data versioning with DVC
- Remote storage configuration
.dvcfiles
- 4 components (Load, Preprocess, Train, Evaluate)
- Component YAML files
- Function-based components
- Pipeline definition
- Minikube deployment
- Pipeline execution
- GitHub Actions workflow
- Jenkinsfile
- 3-stage pipeline
- Comprehensive README
- Setup instructions
- Pipeline walkthrough
kfp==1.8.22
dvc
pandas
scikit-learn
# 1. Start Minikube
minikube start --cpus=4 --memory=8192
# 2. Deploy Kubeflow
kubectl apply -k "github.com/kubeflow/pipelines/manifests/kustomize/cluster-scoped-resources?ref=1.8.5"
kubectl wait --for condition=established --timeout=60s crd/applications.app.k8s.io
kubectl apply -k "github.com/kubeflow/pipelines/manifests/kustomize/env/platform-agnostic-pns?ref=1.8.5"
# 3. Fix executor
kubectl patch configmap workflow-controller-configmap -n kubeflow --type merge -p '{"data":{"containerRuntimeExecutor":"emissary"}}'
kubectl rollout restart deployment workflow-controller -n kubeflow
# 4. Install dependencies
pip install -r requirements.txt
# 5. Compile and run
python src/pipeline_components.py
python pipeline.py
# 6. Submit pipeline
kubectl -n kubeflow create -f pipeline.yaml --dry-run=client -o json | python -c "import sys, json; d=json.load(sys.stdin); d['spec']['arguments']={'parameters':[{'name':'url','value':'https://raw.githubusercontent.com/selva86/datasets/master/BostonHousing.csv'}]}; print(json.dumps(d))" | kubectl create -f - -n kubeflow
# 7. Monitor
kubectl get workflow -n kubeflow -wStudent ID: i222141
Course: MLOps
Assignment: 4
Semester: 7
Educational project for MLOps coursework.
Last Updated: November 29, 2025