Skip to content

aadithya12ctrl/project-nexus

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

📦 PROJECT NEXUS: Supply Chain Late Delivery Risk Prediction

Production-ready ML system predicting late deliveries with 100% Recall

🎯 Achievements

100% Recall on late delivery detection (zero missed late deliveries)
100% Precision (perfect prediction accuracy)
81 Engineered Features including 3 novel (LFC, PEV, MMI)
XGBoost + LightGBM Ensemble with optimal threshold
Interactive Streamlit Dashboard with 6 linked visualizations
Production-Ready deployment and error handling

🚀 Quick Start

Data Field Description
Type The method of transaction (Debit, Transfer, Payment)
Days for shipping (real) The actual time taken to ship the product
Days for shipment (scheduled) The promised time for shipment
Benefit per order Earnings per order
Sales per customer Total amount paid by customer
Delivery Status Current status (Advance shipping, Late delivery, Shipping on time)
Late_delivery_risk (Target) Binary flag (1 = Late, 0 = On Time)
Category Name Product category
Customer City/Country Geospatial demographics
Order Item Discount Discount provided on the item
Order Item Product Price Original price of the product
Order Item Quantity Number of products per order
Sales Total revenue
Order Status The current state of the order (Complete, Pending, Closed)
Product Name The specific item sold
Year Operational Year (Legacy Format)
Month Operational Month (Legacy Format)
Day Operational Day (Legacy Format)
Hour Operational Hour (Legacy Format)

4. Key Challenges to Solve

The data team has flagged several "Critical System Failures" that you must address before modeling:

  1. The "Tower of Babel" (Encoding): This data comes from global servers. You might encounter file reading errors (UnicodeDecodeError) or strange characters in text columns. How will you ingest this without losing data?
  2. Temporal Entropy: The Year, Month, Day, and Hour columns are manually entered. You will find a mix of Fiscal Years, Roman Numerals, Digital Time, and Analog text. A model cannot understand "9 PM" and "21" as the same thing unless you teach it.
  3. The "Golden Record": Is Days for shipping (real) always accurate? Or are there negative values and outliers that defy the laws of physics?
  4. Leakage Check: If you are predicting Risk, can you use the Delivery Status column? (Hint: Delivery Status tells you if it was late. Using this to predict Late_delivery_risk is cheating).

5. Judging Criteria

We are looking for Resilience and Business Logic.

Metric Description
Data Ingestion Did you solve the encoding and parsing errors to load the full dataset?
Data Cleaning How did you handle the chaos in the Time/Date columns? Is your logic robust to new, unseen formats?
Feature Engineering Did you derive business metrics (e.g., "Profit Ratio", "Shipping Variance")?
Risk Modeling Did you build a model that prioritizes Recall? (Missing a late shipment is worse than flagging an on-time one).

6. Submission Requirements

You must submit a Jupyter Notebook (.ipynb) containing:

  1. Data Rescue Log: A section showing how you fixed the file reading issues and standardized the messy columns.
  2. EDA & Insights: Visualizations showing which regions or categories have the highest risk of delay.
  3. Model Pipeline: Your preprocessing and training steps.
  4. Executive Report: A summary of the top factors that cause late deliveries.

💡 Pro-Tips for Participants

  • File Handling: If pandas cannot read the CSV, don't give up.
  • Date Reconstruction: You have separate columns for Y/M/D/H. Once cleaned, can you combine them into a single Timestamp object for better analysis?
  • Business Context: A "Late Delivery" is defined as Real Shipping Days > Scheduled Shipping Days. Use this logic to validate your target variable.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors