Skip to content
View ayusyagol11's full-sized avatar
🎯
Focusing
🎯
Focusing

Highlights

  • Pro

Block or report ayusyagol11

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
ayusyagol11/README.md

Hi, I'm Aayush Yagol πŸ‘‹

Insurance Data Analyst based in Canberra, Australia.

I build predictive models and analytical pipelines for insurance and financial services β€” and I work inside the industry I model. Currently a Claims Advisor at Suncorp Group, I bring frontline domain knowledge of claims operations, reserve setting, and regulatory compliance directly into my data work.


πŸ› οΈ Tech Stack

Languages & Libraries

Python SQL Pandas NumPy Scikit-learn Plotly

BI & Visualisation

Tableau Power BI Streamlit

Data Engineering

SQL Server Databricks Git


πŸ“Œ Featured Projects

Can Australian macroeconomic indicators (CPI, WPI, PPI, RBA cash rate) predict insurance claim severity with a 2–6 quarter lead time?

Built a full analytics pipeline ingesting ABS, RBA, and APRA data β†’ OLS/Ridge/Lasso regression modelling β†’ four-page Streamlit dashboard with lag correlation heatmaps and a stagflation stress test scenario.

Python Scikit-learn Streamlit Plotly SQL


Forecasting Pure Premium across 677,991 motor insurance policies.

Implemented a Tweedie Regressor (compound Poisson-Gamma) to model zero-inflated insurance claims data. Deployed as a live Streamlit dashboard with a full Scikit-learn pipeline including exposure weighting and OHE.

Python Tweedie Regression Scikit-learn Streamlit Pandas


End-to-end Medallion Architecture data warehouse in SQL Server.

Bronze β†’ Silver β†’ Gold ETL pipeline, star schema design, analytics-ready Gold layer views, full data catalog, naming conventions, and SQL validation tests.

T-SQL SQL Server Medallion Architecture ETL Draw.io


Selecting the optimal classifier across five ML algorithms.

Evaluated Logistic Regression, SVM, KNN, Random Forest, and Gradient Boosting on 10,000 bank customer records. Selected GBM on recall (catch rate) β€” the correct metric for imbalanced churn datasets.

Python Scikit-learn GBM Imbalanced Classification


πŸ“ˆ GitHub Stats

Aayush's GitHub Stats

Top Languages


πŸŽ“ Certifications

  • 🟠 Databricks Fundamentals β€” Databricks (2026)
  • πŸ”΅ IBM Data Analyst Professional Certificate β€” IBM / Coursera (2025)
  • 🟒 Deloitte Australia Data Analytics Job Simulation β€” Deloitte (2025)

πŸ“« Let's Connect

Portfolio LinkedIn

Pinned Loading

  1. macroeconomic-impact-insurance-claims macroeconomic-impact-insurance-claims Public

    Jupyter Notebook

  2. claims-liability-predictor claims-liability-predictor Public

    Tweedie regression pipeline predicting Pure Premium (Expected Annual Liability) across 677k+ French Motor TPL insurance policies, with an interactive Streamlit dashboard.

    Jupyter Notebook

  3. sql-data-warehouse-project sql-data-warehouse-project Public

    Building a modern data warehouse with SQL Server, including ETL processes, data modeling, and analytics.

    TSQL

  4. multimodal_churn_prediction multimodal_churn_prediction Public

    Jupyter Notebook

  5. Australia_Wildfire_Dashboard Australia_Wildfire_Dashboard Public

    Australian Wildfire 2005 - 2020

    Python

  6. 2024_StackOverflow_Developer_Survey 2024_StackOverflow_Developer_Survey Public

    Jupyter Notebook