Skip to content

JayeshJadhav28/zomathon

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

7 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

๐Ÿฝ๏ธ Zomathon 2026 โ€” PS1: KPT Prediction Improvement

Improving Kitchen Prep Time (KPT) Prediction Through Signal Quality & System Design

Team Competition Track Problem


๐Ÿ’ฐ Rs 620 Cr โฑ๏ธ 55% ๐Ÿšจ 83%
Annual Cost Saving Rider Wait Reduced Severe Cases Eliminated

๐Ÿš€ Open in Google Colab

Notebook Description Open
Day1_Problem_Understanding.ipynb Root cause analysis of KPT failures Open In Colab
Day2_Generate_Data.ipynb Synthetic dataset generation (500 restaurants, 50K orders) Open In Colab
Day3_Analysis_and_Charts.ipynb Data analysis + 5 evidence charts Open In Colab
Day4_Simulation.ipynb Baseline vs improved system simulation Open In Colab
Day5_PDF_Numbers_Reference.ipynb All final numbers & calculations reference Open In Colab

๐Ÿ“Ž Quick Links

Resource Link
๐Ÿ“„ Full 25-Page Report (PDF) View Report
๐Ÿ“ฆ Dataset (Google Drive) merchants.csv + orders.csv
๐Ÿ™ GitHub Repository github.com/JayeshJadhav28/zomathon

โš ๏ธ Data Disclaimer: All data in this repository is synthetically generated using numpy and pandas. No proprietary Zomato data is included or referenced.


๐Ÿ“‹ Table of Contents


๐ŸŽฏ Problem Statement

Kitchen Prep Time (KPT) is the elapsed time between order confirmation and food being genuinely ready for pickup. Zomato must predict this before the food is even made so riders are dispatched at exactly the right moment.

The current system relies on a single signal โ€” the Food Order Ready (FOR) button pressed by restaurant merchants in the Mx app. Our analysis of 50,000 orders reveals this signal is corrupted in 27% of cases, causing systematic prediction failures across the entire delivery pipeline.

ORDER PLACED โ†’ KITCHEN STARTS โ†’ FOOD READY โœ— โ†’ RIDER ARRIVES โ†’ CUSTOMER RECEIVES
                    โ—„โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€ KPT โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–บ
                                      โ†‘
                         FOR button pressed here
                         (often WRONG timing โ€” 58% of restaurants)

๐Ÿ” The Core Problem

We identified 3 root causes behind KPT prediction failure:

# Problem Scale
01 Rider-Influenced FOR Signal โ€” Merchants press FOR when rider arrives, not when food is ready, corrupting the KPT label by the full rider travel time 58% of restaurants
02 No Kitchen-Wide Visibility โ€” Zomato sees only its own orders; Swiggy, dine-in, and takeaway orders are completely invisible yet share the same kitchen 59% of kitchen load hidden
03 Human Inconsistency in Labeling โ€” 300,000+ restaurants press FOR at different times, creating a training dataset with inconsistently generated labels 27% of orders with >5 min error

Downstream Impact

KPT too HIGH โ†’ Rider dispatched LATE โ†’ food gets cold โ†’ bad ratings
KPT too LOW  โ†’ Rider dispatched EARLY โ†’ waits 7.7 min avg โ†’ Rs 620 Cr wasted/year

๐Ÿ—๏ธ Our Solution โ€” 4 Layer Architecture

Our solution does not change the KPT model. It gives the existing model better, cleaner, more complete inputs.

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚                  KPT PREDICTION MODEL                    โ”‚
โ”‚              (existing model, unchanged)                 โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                        โ”‚ Better inputs
        โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
        โ–ผ               โ–ผ               โ–ผ               โ–ผ
  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”   โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”   โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”   โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
  โ”‚ LAYER 1  โ”‚   โ”‚ LAYER 2  โ”‚   โ”‚ LAYER 3  โ”‚   โ”‚ LAYER 4  โ”‚
  โ”‚ Fix FOR  โ”‚   โ”‚New Signalsโ”‚  โ”‚ Kitchen  โ”‚   โ”‚Merchant  โ”‚
  โ”‚ Signal   โ”‚   โ”‚          โ”‚   โ”‚Load Indexโ”‚   โ”‚Workflow  โ”‚
  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜   โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜   โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜   โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
  โ‚น0 cost        App-first      Zero hardware   Mx App update
  Deploy now     300K+ merchants Covers 59%      Optional stages

Layer 1 โ€” De-Noising the FOR Signal (FREE TO DEPLOY)

Idea 1.1 โ€” FOR Trustworthiness Score

A per-merchant reliability metric computed from existing data at zero infrastructure cost:

FOR_Reliability_Score = (
    orders_where_FOR_pressed_BEFORE_rider_arrived_by_gte_2min
    / total_orders_past_30_days
)
Score Tier % of Restaurants Action
< 0.3 UNRELIABLE 58% Discount FOR; use rider pickup + historical median
0.3โ€“0.7 AVERAGE 39% Moderate trust; cross-check against rider pickup
> 0.7 RELIABLE 3% Trust FOR fully as primary label

Idea 1.2 โ€” Statistical Outlier Detection

Per-merchant IQR-based outlier filter applied before any label reaches the training pipeline:

lower_bound = median_kpt - 1.5 * IQR
upper_bound = median_kpt + 1.5 * IQR

if reported_kpt < lower_bound or reported_kpt > upper_bound:
    replace_with = historical_median_kpt  # clean label

Idea 1.3 โ€” Multi-Signal Fusion

Instead of one corrupted signal, fuse four signals with adaptive weights:

True_KPT = wโ‚ร—FOR_KPT + wโ‚‚ร—Rider_Pickup_KPT + wโ‚ƒร—Historical_KPT + wโ‚„ร—App_Activity_KPT
Signal Unreliable (<0.3) Average (0.3โ€“0.7) Reliable (>0.7)
FOR Timestamp (wโ‚) 0.10 0.35 0.60
Rider Pickup (wโ‚‚) 0.40 0.30 0.20
Historical Median (wโ‚ƒ) 0.30 0.20 0.10
App Activity (wโ‚„) 0.20 0.15 0.10

Layer 2 โ€” Introducing New Signals (APP-FIRST)

Idea 2.1 โ€” App-Based Kitchen Sensing (Zero Hardware)

Uses the existing Mx app device as a passive kitchen sensor. All signals captured without any new hardware:

Signal Source Indicates
Ambient Noise Level Microphone (on-device ML) Kitchen activity level
Screen Interaction Patterns App telemetry Merchant attentiveness
Order Acknowledgment Delay Notification โ†’ accept timestamp Kitchen busyness
App Background Switches App focus telemetry Swiggy/competitor usage โ†’ hidden load

๐Ÿ”’ Privacy: Audio processed entirely on-device. Only a numerical score (0โ€“10) is transmitted. No audio is recorded or stored.

Idea 2.2 โ€” IoT Instrumentation (Top 10% Merchants)

For ~30,000 high-volume merchants handling 45% of all orders:

Sensor โ†’ Edge Device โ†’ Zomato API โ†’ KPT Model Input
Sensor Cost Signal
Thermal/IR Sensor โ‚น800/unit Cooking activity detection
Weight Sensor โ‚น800/unit Food packaging detection
BLE Beacon โ‚น100/unit True rider pickup timestamp
Smart Label Printer Already exists Packaging start signal

ROI: โ‚น39 Cr investment โ†’ โ‚น620 Cr annual saving = 15ร— ROI in Year 1

Idea 2.3 โ€” Computer Vision on Existing CCTV

CCTV Feed โ†’ Edge Device (Jetson Nano โ‚น12K) โ†’ Kitchen Metrics API โ†’ KPT Model

4 CV models run on-device: People counter ยท Activity classifier ยท Packaging detector ยท Queue counter

๐Ÿ”’ Privacy: Raw video never transmitted. Zomato receives only: {staff_count: 3, activity: "high", bags_waiting: 2}


Layer 3 โ€” Kitchen Load Index (ZERO HARDWARE)

The Visibility Gap:

What Zomato Sees:        Zomato orders + history + FOR button  =  41% of kitchen load
What Actually Exists:    + Swiggy + Dine-in + Takeaway         = 100% of kitchen load

KLI Formula:

KLI = f(
    zomato_active_orders,           # 100% known, free, exact
    estimated_competitor_orders,    # pattern anomaly detection
    google_maps_busyness_score,     # free API, foot traffic proxy
    time_of_day_multiplier,         # 1.3ร— during 12-2pm, 7-10pm
    day_of_week_multiplier,         # 1.2ร— weekends
    special_event_flag,             # IPL/rain/festivals โ†’ 2-3ร— spike
    merchant_self_reported_busyness # 1-5 slider in Mx app
)
# Updated every 5 minutes per restaurant
# KLI is NOT a model change โ€” it is a new input feature

Merchant Incentive System to improve self-reporting accuracy:

Tier Threshold Reward
Accuracy Bonus >80% accuracy on >80% of orders โ‚น500โ€“โ‚น2,000/month
Visibility Boost >80% for 2 months Better Zomato search ranking
Gold Kitchen Badge >90% for 3 months Badge displayed on listing
Warning <30% accuracy consistently Reduced order allocation

Layer 4 โ€” Merchant Workflow Redesign (MX APP UPDATE)

Idea 4.1 โ€” Granular Kitchen Progress Tracking

Replaces the single FOR button with an optional 5-stage tracker:

BEFORE:  [Order Received] โ†’ [Order Accepted] โ†’ [Food Ready โœ—]
                                                 (single button, often wrong)

AFTER:   [Order Received] โ†’ [Prep Started] โ†’ [Cooking] โ†’ [Packaging] โ†’ [Ready โœ“]
          AUTO              OPTIONAL          OPTIONAL     OPTIONAL      REQUIRED
Stage Pressed Signal to Model Benefit
Prep Started Start timer from real zero Accurate KPT baseline
Cooking Kitchen is active Confirm not idle
Packaging Food 1โ€“3 min from ready Pre-dispatch rider earlier
Ready โœ“ Ground truth KPT label Clean training data

Idea 4.2 โ€” Item-Level Prep Time Configuration

Bottom-up KPT calculation from menu item config:

Order_KPT = max(prep_time for each item in order) \
           + packaging_time_fixed   # 1.5 min
           + concurrent_order_penalty  # KLI ร— 0.3

# Example: Butter Chicken (20) + Naan (5) + Raita (2)
# = max(20, 5, 2) + 1.5 + KLI_penalty = 21.5 min + load adjustment

Solves the cold-start problem for new restaurants with zero historical data.


๐Ÿ“Š Key Results (Simulation โ€” 10,000 Orders)

Metric Baseline Improved Reduction
Avg Rider Wait 7.7 min 3.4 min โ–ผ 55%
Rider Wait P50 6.2 min 2.6 min โ–ผ 58%
Rider Wait P90 17.9 min 8.2 min โ–ผ 54%
ETA Error P50 7.0 min 3.1 min โ–ผ 56%
ETA Error P90 17.9 min 8.2 min โ–ผ 54%
Orders with wait > 5 min 56.0% 27.9% โ–ผ 50%
Orders with wait > 10 min 32.0% 5.6% โ–ผ 83%

Annual Cost Saving Calculation

Daily Zomato orders:     2,000,000
Rider wait reduced by:   ร— 4.3 min/order
Daily minutes saved:     = 8,600,000 min/day
Rider cost per minute:   ร— โ‚น2
Daily cost saving:       = โ‚น1.72 Cr/day
Annual cost saving:      ร— 365 days = โ‚น620 Cr/year

๐Ÿ“ Repository Structure

zomathon/
โ”‚
โ”œโ”€โ”€ ๐Ÿ““ Day1_Problem_Understanding.ipynb    # Root cause analysis & pipeline mapping
โ”œโ”€โ”€ ๐Ÿ““ Day2_Generate_Data.ipynb            # Synthetic dataset generation
โ”œโ”€โ”€ ๐Ÿ““ Day3_Analysis_and_Charts.ipynb      # EDA + 5 evidence charts
โ”œโ”€โ”€ ๐Ÿ““ Day4_Simulation.ipynb               # Discrete event simulation
โ”œโ”€โ”€ ๐Ÿ““ Day5_PDF_Numbers_Reference.ipynb    # Final numbers & calculations
โ”‚
โ”œโ”€โ”€ ๐Ÿ“Š merchants.csv                       # 500 synthetic restaurants
โ”œโ”€โ”€ ๐Ÿ“Š orders.csv                          # 50,000 synthetic orders
โ”‚
โ”œโ”€โ”€ ๐Ÿ–ผ๏ธ chart1_kpt_distribution.png         # Actual vs FOR-based KPT histogram
โ”œโ”€โ”€ ๐Ÿ–ผ๏ธ chart2_error_by_reliability.png     # Label error by reliability tier
โ”œโ”€โ”€ ๐Ÿ–ผ๏ธ chart3_rider_wait_heatmap.png       # Rider wait by restaurant size & time
โ”œโ”€โ”€ ๐Ÿ–ผ๏ธ chart4_reliability_distribution.png # FOR reliability score distribution
โ”œโ”€โ”€ ๐Ÿ–ผ๏ธ chart5_kitchen_visibility.png       # Kitchen load visibility pie chart
โ”œโ”€โ”€ ๐Ÿ–ผ๏ธ simulation_results.png              # 4-chart simulation comparison grid
โ”‚
โ””โ”€โ”€ .gitignore

๐Ÿ—ƒ๏ธ Dataset Schema

merchants.csv โ€” 500 Synthetic Restaurants

Column Type Description
merchant_id string Unique restaurant identifier (M0001โ€“M0500)
restaurant_type string small / medium / large
true_reliability_score float Ground truth FOR reliability (0โ€“1)
avg_actual_kpt float True average kitchen prep time (minutes)
has_swiggy bool Whether restaurant is on Swiggy
has_dine_in bool Whether restaurant has dine-in
zomato_visibility_fraction float What fraction of kitchen load Zomato sees

orders.csv โ€” 50,000 Synthetic Orders

Column Type Description
order_id string Unique order identifier
merchant_id string Foreign key to merchants.csv
actual_kpt float Ground truth prep time (minutes) โ€” never observable in reality
for_kpt float FOR-button-reported KPT (noisy label used in training)
label_error float for_kpt - actual_kpt โ€” measures corruption
is_peak_hour bool Whether order was during 12โ€“2pm or 7โ€“10pm
rider_travel_time float Time for rider to reach restaurant (minutes)
is_label_corrupted bool Whether `

โš™๏ธ How to Run Locally

Prerequisites

Python 3.8+
pip install pandas numpy matplotlib seaborn scikit-learn lightgbm jupyter

Clone & Run

git clone https://github.com/JayeshJadhav28/zomathon.git
cd zomathon
jupyter notebook

Run in Order

1. Day2_Generate_Data.ipynb          # Creates merchants.csv and orders.csv
2. Day3_Analysis_and_Charts.ipynb    # Requires merchants.csv + orders.csv
3. Day4_Simulation.ipynb             # Requires merchants.csv + orders.csv
4. Day5_PDF_Numbers_Reference.ipynb  # Aggregates all final numbers

๐Ÿ’ก Running in Colab? Each notebook has a setup cell at the top that downloads the CSVs directly from this GitHub repo โ€” no manual uploads needed.


๐Ÿ”ฌ Simulation Methodology

We built a discrete event simulation that runs both systems (baseline and improved) across 10,000 orders under identical conditions.

Simulation Parameters

Parameter Value Rationale
Total orders 10,000 Statistically significant sample
Restaurant types Small / Medium / Large Reflects real Zomato distribution
Peak hour ratio 40% of orders Matches actual Zomato peak data
FOR reliability distribution Beta(2, 5) Fitted to our 500-restaurant dataset
Rider travel time 4โ€“14 minutes Urban delivery range estimate

Baseline vs Improved System

BASELINE SYSTEM                    IMPROVED SYSTEM (OUR SOLUTION)
โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€          โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
โœ— FOR timestamp as only label  โ†’   โœ“ FOR Trustworthiness Score applied
โœ— No reliability filtering     โ†’   โœ“ Statistical outlier filtering
โœ— No outlier detection         โ†’   โœ“ Multi-signal fusion (4 signals)
โœ— No Kitchen Load Index        โ†’   โœ“ Kitchen Load Index integrated
โœ— No peak hour adjustment      โ†’   โœ“ Peak hour multiplier factored in
โœ— No multi-signal fusion       โ†’   โœ“ Item-level prep time baseline
= High noise, high rider wait  =   Low noise, low rider wait

๐Ÿ“ˆ Scalability Strategy

A 3-tier deployment strategy for 300,000+ merchants:

Tier Target Solutions Cost Timeline Coverage
Tier 1 ALL 300,000+ merchants FOR reliability score, outlier detection, multi-signal fusion, KLI v1, Google Maps API โ‚น0 4โ€“6 weeks 100% merchants
Tier 2 Top 30% by volume (~90,000) Mx app tracker, item-level config, accuracy bonus, app sensing ~โ‚น15 Cr 3โ€“6 months ~65% of orders
Tier 3 Top 10% by volume (~30,000) IoT sensors, edge CV on CCTV, full multi-signal fusion โ‚น39 Cr 6โ€“12 months ~45% of orders

Total: โ‚น54 Cr investment โ†’ โ‚น620 Cr annual saving = 11ร— ROI in Year 1


๐Ÿ—“๏ธ Implementation Roadmap

M1โ”€โ”€โ”€โ”€โ”€โ”€M3โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€M6โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€M12
โ”‚  Phase 1  โ”‚      Phase 2      โ”‚          Phase 3          โ”‚
โ”‚  Software โ”‚  App + Incentives โ”‚   IoT + AI Vision         โ”‚
โ”‚   โ‚น0      โ”‚     ~โ‚น15 Cr      โ”‚        โ‚น39 Cr             โ”‚
โ”‚  100% cvg โ”‚    65% orders     โ”‚       45% orders          โ”‚
โ”‚ 30% labelโ†‘โ”‚   45% waitโ†“       โ”‚      55% waitโ†“            โ”‚

Phase 1 โ€” Months 0โ€“3 (Software Foundation, Zero Cost)

  • โœ… FOR Reliability Score for all 300,000+ merchants
  • โœ… Statistical outlier detection pipeline
  • โœ… Kitchen Load Index v1 (Zomato orders + time factors)
  • โœ… Google Maps busyness API as KPT feature

Phase 2 โ€” Months 3โ€“6 (App Redesign & Incentives)

  • โœ… Mx app progress tracker (optional stages)
  • โœ… Item-level prep time config in menu onboarding
  • โœ… Accuracy bonus programme for Tier 2 merchants
  • โœ… App-based kitchen sensing rollout (opt-in)

Phase 3 โ€” Months 6โ€“12 (IoT + Full Multi-Signal Fusion)

  • โœ… IoT sensor kit shipped to top 30,000 merchants
  • โœ… Edge CV pilot at 100 volunteer restaurants
  • โœ… Full multi-signal fusion model in production
  • โœ… Self-calibrating item prep times across all menus

๐Ÿ› ๏ธ Tech Stack

Category Tools
Language Python 3.8+
Data Generation numpy, pandas, scipy
Analysis & Visualization matplotlib, seaborn
Machine Learning scikit-learn, lightgbm
Simulation Custom discrete event simulation (pure Python)
Notebooks Jupyter Notebook / Google Colab
Distribution Fitting scipy.stats.beta (Beta(2,5) for FOR reliability)

๐Ÿ‘ฅ Team

Team ByteWise โ€” Dnyanshree Institute of Engineering & Technology

Name Role
Jayesh Jadhav Data Science & System Design
Omkar Khade Data Science & Analysis
Vinayak Kharade System Design & Simulation

โš ๏ธ Limitations

Limitation Mitigation
Synthetic dataset โ€” real distributions may differ Methodology validated on real data before production
Competitor order estimation is inferred, not direct Used as 1 of 6 KLI inputs, not primary signal
Merchant behaviour change requires habit adoption All intermediate stages optional; Tier 1 needs zero merchant action
IoT sensors need field maintenance Tier 3 targets only highest-ROI merchants

๐Ÿ”ฎ Future Work

  • LLM Anomaly Detection โ€” Fine-tuned model to detect unusual FOR patterns from merchant chat logs
  • Federated Learning โ€” Train KPT models locally on restaurant edge devices, privacy-preserving
  • Real-Time ETA Correction โ€” Update customer ETA dynamically mid-cook using live kitchen signals

๐Ÿ“œ License & Data Notice

This repository was created for Zomathon 2025, organized by Coding Ninjas in collaboration with Eternal Limited (Zomato). All ideas, designs, and materials created during the competition are subject to the competition's Terms & Conditions regarding confidentiality and intellectual property.

All dataset files (merchants.csv, orders.csv) are 100% synthetically generated. No proprietary or real Zomato data is used, stored, or referenced anywhere in this repository.


Built in 5 days ยท 500 restaurants ยท 50,000 orders ยท 4 Python notebooks ยท 25-page report

GitHub Report Dataset

Releases

No releases published

Packages

 
 
 

Contributors