I am a Data Scientist with 5+ years of experience building production-grade machine learning and analytics systems across healthcare, IT services, and consulting environments.
I specialize in end-to-end ML ownership β from problem framing and hypothesis-driven analysis to model development, evaluation, deployment, and stakeholder adoption. My work focuses on converting complex, high-dimensional data (EHRs, behavioral signals, operational logs) into trustworthy, decision-ready insights with measurable business and clinical impact.
π©Ί Predictive and causal modeling on large-scale longitudinal healthcare data
π Designing statistically rigorous ML pipelines aligned with real-world workflows
π Building explainable & monitored ML systems for daily decision-making
π Translating models into executive dashboards and narratives
π Advanced statistical modeling & quasi-experimental methods
β±οΈ Time-series modeling & forecasting for operational systems
π§ͺ Model calibration, diagnostics & evaluation in high-stakes settings
βοΈ Scalable Python + SQL pipelines on cloud infrastructure
Data Scientist | Dec 2024 β Nov 2025
- Built and deployed statistical, causal, and ML models over 50K+ longitudinal EHR records, achieving ~20% PR-AUC improvement
- Led hypothesis-driven analyses and quasi-experiments using cohort analysis and EDA
- Owned pipelines end-to-end (SQL β Python β production) and presented insights to senior leadership
Machine Learning Engineer / Data Scientist | Sep 2021 β Jul 2023
- Developed risk prediction and prioritization models for high-volume operational workflows
- Applied forecasting, regression, and feature ablation to evaluate system changes
- Productionized ML solutions reducing reactive handling and improving proactive outcomes by ~30%
Junior Data Scientist / Data Science Intern | 2019 β 2021
- Applied ML and statistical techniques to customer and system-level data
- Built scalable SQL pipelines and analysis-ready datasets
- Delivered insights via dashboards and data stories influencing planning decisions
π Machine Learning & Statistics
- Classification, Regression, Tree-Based Models
- Time Series Modeling & Forecasting
- Model Calibration, Diagnostics, Statistical Analysis
βοΈ Cloud & Distributed Systems
- AWS (EC2, S3), Spark, Distributed Data Processing
π Visualization & Reporting
- Tableau, Executive Dashboards, Stakeholder Reporting
βοΈ Engineering & Tooling
- REST APIs, CI/CD, Docker, Git, Agile/Scrum
This GitHub contains real-world, end-to-end ML projects focused on:
β
Clear problem framing
β
Reproducible data pipelines
β
Interpretable model evaluation
β
Business-aligned metrics & outcomes
π data/ β Sample or synthetic datasets
π notebooks/ β EDA & modeling
π src/ β Reusable pipeline code
π metrics/ β Evaluation & diagnostics
π README.md β Problem, approach, results