📊 E-commerce Return Rate Reduction Analysis

📌 Project Overview

This project analyzes e-commerce product returns to identify the key drivers of high return rates and provide actionable insights for reduction.

Objective:

Understand why customers return products
Explore how return rates vary by category, geography, and marketing channel
Build a predictive model for return probability
Create interactive dashboards in Power BI for visualization

Tools & Tech:

SQL → Data Cleaning & Preprocessing
Python (Pandas, Scikit-learn, Matplotlib, Seaborn) → Modeling & Analysis
Power BI → Dashboards & KPI Visuals

🛠️ Dataset

Synthetic dataset: ecommerce_returns_synthetic_data

After Cleaning in SQL → `updated_ecommerce_returns`

Columns used:

Order_ID, Product_ID, User_ID, Order_Date
Product_Category, Product_Price, Order_Quantity
Return_Reason, Return_Status
User_Age, User_Gender, User_Location
Payment_Method, Shipping_Method, Discount_Applied
Calculated fields:
- overall_return_rate
- category_return_rate
- product_return_rate
- geography_return_rate
- reason_pct_of_returns

🧹 SQL Data Preparation

Checked for missing values
Dropped irrelevant columns (Return_Date, Days_to_Return)
Imputed missing Return_Reason → "Not Mentioned"
Calculated:
- Overall return rate
- Category-level return rate
- Product-level return rate
- Geography-level return rate
- Return reason % contribution

-- Example: Return % by Category
SELECT 
    Product_Category,
    COUNT(*) AS total_orders,
    SUM(CASE WHEN Return_Status = 'Returned' THEN 1 ELSE 0 END) AS returned_orders,
    ROUND(SUM(CASE WHEN Return_Status = 'Returned' THEN 1 ELSE 0 END) * 100.0 / COUNT(*), 2) AS return_rate_pct
FROM ecommerce_returns_synthetic_data 
GROUP BY Product_Category 
ORDER BY return_rate_pct DESC;

🤖 Machine Learning (Python - Logistic Regression)

Steps

Target Variable:
- Return_Flag → (1 = Returned, 0 = Not Returned)
Features:
- Categorical: Product_Category, Return_Reason, User_Gender, User_Location, Payment_Method, Shipping_Method
- Numerical: Product_Price, Order_Quantity, User_Age, Discount_Applied
Preprocessing:
- One-Hot Encoding for categorical columns
- Standardization for numerical columns
Model: Logistic Regression (max_iter=1000)

Results

ROC-AUC Score: 0.84 (after removing leaked engineered features)
Classification Report: Balanced precision & recall
Feature Importance:
- Discounts and shipping method drive returns the most
- Younger users show higher return likelihood

📊 Power BI Dashboard Visuals

KPI Cards

Total Orders
Returned Orders
Overall Return Rate %
Average Discount Applied
Top Returning Category

Charts

Area Chart → Impact of Discounts on Return Rate
- X-axis: Discount_Applied (binned)
- Y-axis: Return Rate %
Line Chart → Returns Over Time (Yearly trend)
- X-axis: Order_Date (Year)
- Y-axis: Total Orders
- Legend: Return_Status
Pie Chart → Return Reasons Breakdown
- Values: Count of Return_Reason
- Legend: Return_Reason
Bar Chart → Return % by Product Category
- X-axis: Product_Category
- Y-axis: Return Rate %
Stacked Bar Chart → Return Rate by Payment Method + Shipping Method
- X-axis: Payment_Method
- Y-axis: Return Rate %
- Legend: Shipping_Method
Table Chart → Category, Return Count, Return %
- Columns: Product_Category | Returned Orders | Total Orders | Return Rate %

📂 Deliverables

SQL scripts → Data cleaning, aggregations
Python notebook → Logistic regression, feature importance, predictions export
Power BI dashboard → Interactive return rate analysis
CSV export → Predicted return probabilities

🚀 Insights

Discounts strongly influence return likelihood.
Specific product categories have disproportionately high return rates.
Return reasons are concentrated around 3–4 major issues.
Payment + Shipping combinations reveal behavioral return patterns.
Logistic regression can successfully predict which orders are most likely to be returned.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
E-Commerce_return_analysis.ipynb		E-Commerce_return_analysis.ipynb
E-commerce Return Rate Reduction Dashboard.pbix		E-commerce Return Rate Reduction Dashboard.pbix
E-commerce Return Rate Reduction Dashboard.pdf		E-commerce Return Rate Reduction Dashboard.pdf
README.md		README.md
ecommerce_returns.sql		ecommerce_returns.sql
ecommerce_returns_synthetic_data.csv		ecommerce_returns_synthetic_data.csv
updated_ecommerce_returns.csv		updated_ecommerce_returns.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

📊 E-commerce Return Rate Reduction Analysis

📌 Project Overview

🛠️ Dataset

After Cleaning in SQL → `updated_ecommerce_returns`

🧹 SQL Data Preparation

🤖 Machine Learning (Python - Logistic Regression)

Steps

Results

📊 Power BI Dashboard Visuals

KPI Cards

Charts

📂 Deliverables

🚀 Insights

About

Uh oh!

Releases

Packages

Languages

Progati00/Return-Rate-Reduction-Analysis

Folders and files

Latest commit

History

Repository files navigation

📊 E-commerce Return Rate Reduction Analysis

📌 Project Overview

🛠️ Dataset

After Cleaning in SQL → updated_ecommerce_returns

🧹 SQL Data Preparation

🤖 Machine Learning (Python - Logistic Regression)

Steps

Results

📊 Power BI Dashboard Visuals

KPI Cards

Charts

📂 Deliverables

🚀 Insights

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

After Cleaning in SQL → `updated_ecommerce_returns`

Packages