I'm a Data Scientist and Machine Learning Engineer focused on turning complex datasets into decisions that drive measurable business value. I build end-to-end ML systems — from raw data pipelines to deployed models — with a strong emphasis on analytical rigour and practical impact.
Currently deepening expertise in data engineering, MLOps, and scalable data pipelines.
| Area | Tools & Technologies |
|---|---|
| Languages | Python, SQL |
| Machine Learning | scikit-learn, XGBoost, K-Means, feature engineering, model evaluation |
| Deep Learning | TensorFlow / Keras, Neural Networks, NLP |
| Data Engineering | PySpark, Apache Airflow, ETL pipelines, Docker |
| Data & Analytics | Pandas, NumPy, statistical modelling, EDA |
| Visualisation | Matplotlib, Seaborn, Plotly, Metabase |
| Cloud & Infra | Azure, PostgreSQL, Docker Compose |
| Explainability | SHAP, model interpretability |
PySpark · Apache Airflow · PostgreSQL · Docker · Python
Built a production-grade, end-to-end data engineering pipeline processing 33.8 million MovieLens ratings. Orchestrated a 15-task Airflow DAG with MD5 hash-based change detection that cut weekly run time from 90 minutes to 2 minutes on unchanged data. Outputs a PostgreSQL star schema powering Metabase dashboards — running on a fully automated weekly schedule at zero cloud cost.
TensorFlow · Keras · NLP · Python
Developed an end-to-end deep learning classifier to categorise e-commerce customer reviews as positive or negative. Achieved ~91% validation accuracy using a TextVectorization + Embedding + Dense architecture trained on 11,158 labelled reviews. Demonstrated full ML lifecycle from preprocessing through inference, with a foundation suitable for real-time customer feedback monitoring.
scikit-learn · K-Means · Python · EDA
Applied unsupervised machine learning to segment retail customers by purchasing behaviour and demographics. Used the elbow method and silhouette scoring (0.55) to identify 5 well-defined customer clusters, delivering actionable targeting strategies for high-income/low-spend vs. low-income/high-spend segments.
scikit-learn · SHAP · Linear Regression · Python
Modelled heating and cooling energy demand from architectural design parameters. Achieved R² ~0.92 for heating load and ~0.89 for cooling load using interpretable linear regression. Applied SHAP analysis to confirm physical drivers (compactness, glazing area), producing stakeholder-ready, explainable outputs for early-stage design decisions.
I'm actively looking for opportunities in:
- Data Science — predictive modelling, experimentation, statistical analysis
- Machine Learning Engineering — model development and deployment
- Data Engineering — pipeline design, orchestration, scalable data systems
- Analytics & Decision Science — business intelligence, insight generation
If you're building data-driven products and need someone who bridges strong analytical thinking with hands-on implementation, let's talk.
