| ๐ฐ Rs 620 Cr | โฑ๏ธ 55% | ๐จ 83% |
|---|---|---|
| Annual Cost Saving | Rider Wait Reduced | Severe Cases Eliminated |
| Resource | Link |
|---|---|
| ๐ Full 25-Page Report (PDF) | View Report |
| ๐ฆ Dataset (Google Drive) | merchants.csv + orders.csv |
| ๐ GitHub Repository | github.com/JayeshJadhav28/zomathon |
โ ๏ธ Data Disclaimer: All data in this repository is synthetically generated usingnumpyandpandas. No proprietary Zomato data is included or referenced.
- Problem Statement
- The Core Problem
- Our Solution โ 4 Layer Architecture
- Key Results
- Repository Structure
- Dataset Schema
- How to Run Locally
- Simulation Methodology
- Scalability Strategy
- Implementation Roadmap
- Tech Stack
- Team
Kitchen Prep Time (KPT) is the elapsed time between order confirmation and food being genuinely ready for pickup. Zomato must predict this before the food is even made so riders are dispatched at exactly the right moment.
The current system relies on a single signal โ the Food Order Ready (FOR) button pressed by restaurant merchants in the Mx app. Our analysis of 50,000 orders reveals this signal is corrupted in 27% of cases, causing systematic prediction failures across the entire delivery pipeline.
ORDER PLACED โ KITCHEN STARTS โ FOOD READY โ โ RIDER ARRIVES โ CUSTOMER RECEIVES
โโโโโโโโโโโโโ KPT โโโโโโโโโโโโโบ
โ
FOR button pressed here
(often WRONG timing โ 58% of restaurants)
We identified 3 root causes behind KPT prediction failure:
| # | Problem | Scale |
|---|---|---|
| 01 | Rider-Influenced FOR Signal โ Merchants press FOR when rider arrives, not when food is ready, corrupting the KPT label by the full rider travel time | 58% of restaurants |
| 02 | No Kitchen-Wide Visibility โ Zomato sees only its own orders; Swiggy, dine-in, and takeaway orders are completely invisible yet share the same kitchen | 59% of kitchen load hidden |
| 03 | Human Inconsistency in Labeling โ 300,000+ restaurants press FOR at different times, creating a training dataset with inconsistently generated labels | 27% of orders with >5 min error |
KPT too HIGH โ Rider dispatched LATE โ food gets cold โ bad ratings
KPT too LOW โ Rider dispatched EARLY โ waits 7.7 min avg โ Rs 620 Cr wasted/year
Our solution does not change the KPT model. It gives the existing model better, cleaner, more complete inputs.
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ KPT PREDICTION MODEL โ
โ (existing model, unchanged) โ
โโโโโโโโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Better inputs
โโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโ
โผ โผ โผ โผ
โโโโโโโโโโโโ โโโโโโโโโโโโ โโโโโโโโโโโโ โโโโโโโโโโโโ
โ LAYER 1 โ โ LAYER 2 โ โ LAYER 3 โ โ LAYER 4 โ
โ Fix FOR โ โNew Signalsโ โ Kitchen โ โMerchant โ
โ Signal โ โ โ โLoad Indexโ โWorkflow โ
โโโโโโโโโโโโ โโโโโโโโโโโโ โโโโโโโโโโโโ โโโโโโโโโโโโ
โน0 cost App-first Zero hardware Mx App update
Deploy now 300K+ merchants Covers 59% Optional stages
Idea 1.1 โ FOR Trustworthiness Score
A per-merchant reliability metric computed from existing data at zero infrastructure cost:
FOR_Reliability_Score = (
orders_where_FOR_pressed_BEFORE_rider_arrived_by_gte_2min
/ total_orders_past_30_days
)| Score | Tier | % of Restaurants | Action |
|---|---|---|---|
| < 0.3 | UNRELIABLE | 58% | Discount FOR; use rider pickup + historical median |
| 0.3โ0.7 | AVERAGE | 39% | Moderate trust; cross-check against rider pickup |
| > 0.7 | RELIABLE | 3% | Trust FOR fully as primary label |
Idea 1.2 โ Statistical Outlier Detection
Per-merchant IQR-based outlier filter applied before any label reaches the training pipeline:
lower_bound = median_kpt - 1.5 * IQR
upper_bound = median_kpt + 1.5 * IQR
if reported_kpt < lower_bound or reported_kpt > upper_bound:
replace_with = historical_median_kpt # clean labelIdea 1.3 โ Multi-Signal Fusion
Instead of one corrupted signal, fuse four signals with adaptive weights:
True_KPT = wโรFOR_KPT + wโรRider_Pickup_KPT + wโรHistorical_KPT + wโรApp_Activity_KPT
| Signal | Unreliable (<0.3) | Average (0.3โ0.7) | Reliable (>0.7) |
|---|---|---|---|
| FOR Timestamp (wโ) | 0.10 | 0.35 | 0.60 |
| Rider Pickup (wโ) | 0.40 | 0.30 | 0.20 |
| Historical Median (wโ) | 0.30 | 0.20 | 0.10 |
| App Activity (wโ) | 0.20 | 0.15 | 0.10 |
Idea 2.1 โ App-Based Kitchen Sensing (Zero Hardware)
Uses the existing Mx app device as a passive kitchen sensor. All signals captured without any new hardware:
| Signal | Source | Indicates |
|---|---|---|
| Ambient Noise Level | Microphone (on-device ML) | Kitchen activity level |
| Screen Interaction Patterns | App telemetry | Merchant attentiveness |
| Order Acknowledgment Delay | Notification โ accept timestamp | Kitchen busyness |
| App Background Switches | App focus telemetry | Swiggy/competitor usage โ hidden load |
๐ Privacy: Audio processed entirely on-device. Only a numerical score (0โ10) is transmitted. No audio is recorded or stored.
Idea 2.2 โ IoT Instrumentation (Top 10% Merchants)
For ~30,000 high-volume merchants handling 45% of all orders:
Sensor โ Edge Device โ Zomato API โ KPT Model Input
| Sensor | Cost | Signal |
|---|---|---|
| Thermal/IR Sensor | โน800/unit | Cooking activity detection |
| Weight Sensor | โน800/unit | Food packaging detection |
| BLE Beacon | โน100/unit | True rider pickup timestamp |
| Smart Label Printer | Already exists | Packaging start signal |
ROI: โน39 Cr investment โ โน620 Cr annual saving = 15ร ROI in Year 1
Idea 2.3 โ Computer Vision on Existing CCTV
CCTV Feed โ Edge Device (Jetson Nano โน12K) โ Kitchen Metrics API โ KPT Model
4 CV models run on-device: People counter ยท Activity classifier ยท Packaging detector ยท Queue counter
๐ Privacy: Raw video never transmitted. Zomato receives only:
{staff_count: 3, activity: "high", bags_waiting: 2}
The Visibility Gap:
What Zomato Sees: Zomato orders + history + FOR button = 41% of kitchen load
What Actually Exists: + Swiggy + Dine-in + Takeaway = 100% of kitchen load
KLI Formula:
KLI = f(
zomato_active_orders, # 100% known, free, exact
estimated_competitor_orders, # pattern anomaly detection
google_maps_busyness_score, # free API, foot traffic proxy
time_of_day_multiplier, # 1.3ร during 12-2pm, 7-10pm
day_of_week_multiplier, # 1.2ร weekends
special_event_flag, # IPL/rain/festivals โ 2-3ร spike
merchant_self_reported_busyness # 1-5 slider in Mx app
)
# Updated every 5 minutes per restaurant
# KLI is NOT a model change โ it is a new input featureMerchant Incentive System to improve self-reporting accuracy:
| Tier | Threshold | Reward |
|---|---|---|
| Accuracy Bonus | >80% accuracy on >80% of orders | โน500โโน2,000/month |
| Visibility Boost | >80% for 2 months | Better Zomato search ranking |
| Gold Kitchen Badge | >90% for 3 months | Badge displayed on listing |
| Warning | <30% accuracy consistently | Reduced order allocation |
Idea 4.1 โ Granular Kitchen Progress Tracking
Replaces the single FOR button with an optional 5-stage tracker:
BEFORE: [Order Received] โ [Order Accepted] โ [Food Ready โ]
(single button, often wrong)
AFTER: [Order Received] โ [Prep Started] โ [Cooking] โ [Packaging] โ [Ready โ]
AUTO OPTIONAL OPTIONAL OPTIONAL REQUIRED
| Stage Pressed | Signal to Model | Benefit |
|---|---|---|
| Prep Started | Start timer from real zero | Accurate KPT baseline |
| Cooking | Kitchen is active | Confirm not idle |
| Packaging | Food 1โ3 min from ready | Pre-dispatch rider earlier |
| Ready โ | Ground truth KPT label | Clean training data |
Idea 4.2 โ Item-Level Prep Time Configuration
Bottom-up KPT calculation from menu item config:
Order_KPT = max(prep_time for each item in order) \
+ packaging_time_fixed # 1.5 min
+ concurrent_order_penalty # KLI ร 0.3
# Example: Butter Chicken (20) + Naan (5) + Raita (2)
# = max(20, 5, 2) + 1.5 + KLI_penalty = 21.5 min + load adjustmentSolves the cold-start problem for new restaurants with zero historical data.
| Metric | Baseline | Improved | Reduction |
|---|---|---|---|
| Avg Rider Wait | 7.7 min | 3.4 min | โผ 55% |
| Rider Wait P50 | 6.2 min | 2.6 min | โผ 58% |
| Rider Wait P90 | 17.9 min | 8.2 min | โผ 54% |
| ETA Error P50 | 7.0 min | 3.1 min | โผ 56% |
| ETA Error P90 | 17.9 min | 8.2 min | โผ 54% |
| Orders with wait > 5 min | 56.0% | 27.9% | โผ 50% |
| Orders with wait > 10 min | 32.0% | 5.6% | โผ 83% |
Daily Zomato orders: 2,000,000
Rider wait reduced by: ร 4.3 min/order
Daily minutes saved: = 8,600,000 min/day
Rider cost per minute: ร โน2
Daily cost saving: = โน1.72 Cr/day
Annual cost saving: ร 365 days = โน620 Cr/year
zomathon/
โ
โโโ ๐ Day1_Problem_Understanding.ipynb # Root cause analysis & pipeline mapping
โโโ ๐ Day2_Generate_Data.ipynb # Synthetic dataset generation
โโโ ๐ Day3_Analysis_and_Charts.ipynb # EDA + 5 evidence charts
โโโ ๐ Day4_Simulation.ipynb # Discrete event simulation
โโโ ๐ Day5_PDF_Numbers_Reference.ipynb # Final numbers & calculations
โ
โโโ ๐ merchants.csv # 500 synthetic restaurants
โโโ ๐ orders.csv # 50,000 synthetic orders
โ
โโโ ๐ผ๏ธ chart1_kpt_distribution.png # Actual vs FOR-based KPT histogram
โโโ ๐ผ๏ธ chart2_error_by_reliability.png # Label error by reliability tier
โโโ ๐ผ๏ธ chart3_rider_wait_heatmap.png # Rider wait by restaurant size & time
โโโ ๐ผ๏ธ chart4_reliability_distribution.png # FOR reliability score distribution
โโโ ๐ผ๏ธ chart5_kitchen_visibility.png # Kitchen load visibility pie chart
โโโ ๐ผ๏ธ simulation_results.png # 4-chart simulation comparison grid
โ
โโโ .gitignore
| Column | Type | Description |
|---|---|---|
merchant_id |
string | Unique restaurant identifier (M0001โM0500) |
restaurant_type |
string | small / medium / large |
true_reliability_score |
float | Ground truth FOR reliability (0โ1) |
avg_actual_kpt |
float | True average kitchen prep time (minutes) |
has_swiggy |
bool | Whether restaurant is on Swiggy |
has_dine_in |
bool | Whether restaurant has dine-in |
zomato_visibility_fraction |
float | What fraction of kitchen load Zomato sees |
| Column | Type | Description |
|---|---|---|
order_id |
string | Unique order identifier |
merchant_id |
string | Foreign key to merchants.csv |
actual_kpt |
float | Ground truth prep time (minutes) โ never observable in reality |
for_kpt |
float | FOR-button-reported KPT (noisy label used in training) |
label_error |
float | for_kpt - actual_kpt โ measures corruption |
is_peak_hour |
bool | Whether order was during 12โ2pm or 7โ10pm |
rider_travel_time |
float | Time for rider to reach restaurant (minutes) |
is_label_corrupted |
bool | Whether ` |
Python 3.8+
pip install pandas numpy matplotlib seaborn scikit-learn lightgbm jupytergit clone https://github.com/JayeshJadhav28/zomathon.git
cd zomathon
jupyter notebook1. Day2_Generate_Data.ipynb # Creates merchants.csv and orders.csv
2. Day3_Analysis_and_Charts.ipynb # Requires merchants.csv + orders.csv
3. Day4_Simulation.ipynb # Requires merchants.csv + orders.csv
4. Day5_PDF_Numbers_Reference.ipynb # Aggregates all final numbers
๐ก Running in Colab? Each notebook has a setup cell at the top that downloads the CSVs directly from this GitHub repo โ no manual uploads needed.
We built a discrete event simulation that runs both systems (baseline and improved) across 10,000 orders under identical conditions.
| Parameter | Value | Rationale |
|---|---|---|
| Total orders | 10,000 | Statistically significant sample |
| Restaurant types | Small / Medium / Large | Reflects real Zomato distribution |
| Peak hour ratio | 40% of orders | Matches actual Zomato peak data |
| FOR reliability distribution | Beta(2, 5) | Fitted to our 500-restaurant dataset |
| Rider travel time | 4โ14 minutes | Urban delivery range estimate |
BASELINE SYSTEM IMPROVED SYSTEM (OUR SOLUTION)
โโโโโโโโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ FOR timestamp as only label โ โ FOR Trustworthiness Score applied
โ No reliability filtering โ โ Statistical outlier filtering
โ No outlier detection โ โ Multi-signal fusion (4 signals)
โ No Kitchen Load Index โ โ Kitchen Load Index integrated
โ No peak hour adjustment โ โ Peak hour multiplier factored in
โ No multi-signal fusion โ โ Item-level prep time baseline
= High noise, high rider wait = Low noise, low rider wait
A 3-tier deployment strategy for 300,000+ merchants:
| Tier | Target | Solutions | Cost | Timeline | Coverage |
|---|---|---|---|---|---|
| Tier 1 | ALL 300,000+ merchants | FOR reliability score, outlier detection, multi-signal fusion, KLI v1, Google Maps API | โน0 | 4โ6 weeks | 100% merchants |
| Tier 2 | Top 30% by volume (~90,000) | Mx app tracker, item-level config, accuracy bonus, app sensing | ~โน15 Cr | 3โ6 months | ~65% of orders |
| Tier 3 | Top 10% by volume (~30,000) | IoT sensors, edge CV on CCTV, full multi-signal fusion | โน39 Cr | 6โ12 months | ~45% of orders |
Total: โน54 Cr investment โ โน620 Cr annual saving = 11ร ROI in Year 1
M1โโโโโโM3โโโโโโโโโโโโโโM6โโโโโโโโโโโโโโโโโโโโโโโโโโM12
โ Phase 1 โ Phase 2 โ Phase 3 โ
โ Software โ App + Incentives โ IoT + AI Vision โ
โ โน0 โ ~โน15 Cr โ โน39 Cr โ
โ 100% cvg โ 65% orders โ 45% orders โ
โ 30% labelโโ 45% waitโ โ 55% waitโ โ
- โ FOR Reliability Score for all 300,000+ merchants
- โ Statistical outlier detection pipeline
- โ Kitchen Load Index v1 (Zomato orders + time factors)
- โ Google Maps busyness API as KPT feature
- โ Mx app progress tracker (optional stages)
- โ Item-level prep time config in menu onboarding
- โ Accuracy bonus programme for Tier 2 merchants
- โ App-based kitchen sensing rollout (opt-in)
- โ IoT sensor kit shipped to top 30,000 merchants
- โ Edge CV pilot at 100 volunteer restaurants
- โ Full multi-signal fusion model in production
- โ Self-calibrating item prep times across all menus
| Category | Tools |
|---|---|
| Language | Python 3.8+ |
| Data Generation | numpy, pandas, scipy |
| Analysis & Visualization | matplotlib, seaborn |
| Machine Learning | scikit-learn, lightgbm |
| Simulation | Custom discrete event simulation (pure Python) |
| Notebooks | Jupyter Notebook / Google Colab |
| Distribution Fitting | scipy.stats.beta (Beta(2,5) for FOR reliability) |
Team ByteWise โ Dnyanshree Institute of Engineering & Technology
| Name | Role |
|---|---|
| Jayesh Jadhav | Data Science & System Design |
| Omkar Khade | Data Science & Analysis |
| Vinayak Kharade | System Design & Simulation |
| Limitation | Mitigation |
|---|---|
| Synthetic dataset โ real distributions may differ | Methodology validated on real data before production |
| Competitor order estimation is inferred, not direct | Used as 1 of 6 KLI inputs, not primary signal |
| Merchant behaviour change requires habit adoption | All intermediate stages optional; Tier 1 needs zero merchant action |
| IoT sensors need field maintenance | Tier 3 targets only highest-ROI merchants |
- LLM Anomaly Detection โ Fine-tuned model to detect unusual FOR patterns from merchant chat logs
- Federated Learning โ Train KPT models locally on restaurant edge devices, privacy-preserving
- Real-Time ETA Correction โ Update customer ETA dynamically mid-cook using live kitchen signals
This repository was created for Zomathon 2025, organized by Coding Ninjas in collaboration with Eternal Limited (Zomato). All ideas, designs, and materials created during the competition are subject to the competition's Terms & Conditions regarding confidentiality and intellectual property.
All dataset files (merchants.csv, orders.csv) are 100% synthetically generated. No proprietary or real Zomato data is used, stored, or referenced anywhere in this repository.