MSc Business Analytics Dissertation Project
A three-notebook market intelligence pipeline that uses Mexican open business registry data (DENUE) to identify customer segments, map the competitive landscape, and produce a data-driven market entry framework for IMADATA — a Mexico-based data analytics company.
| Notebook | Purpose | Key outputs |
|---|---|---|
NB01_customer_segmentation.ipynb |
K-Means segmentation of 43,330 wholesale-trade firms; state-level opportunity scoring | state_opportunity_scores.csv |
NB02_competitive_intelligence.ipynb |
Competitive intensity analysis across 20,874 PST firms; HHI, Gini, CAGR, CII composite | state_competitive_intensity.csv, municipality_hotspots.csv |
NB03_market_entry_framework.ipynb |
2×2 Market Opportunity Matrix; entry sequencing; municipality drill-down; strategic recommendations | market_entry_matrix.csv, priority_markets.csv |
Source: DENUE — Directorio Estadístico Nacional de Unidades Económicas, INEGI (open data, public domain).
| File | Description |
|---|---|
Potential_customers_raw_data.csv |
Sector 43 (Wholesale Trade) — potential customer universe |
Competitors_raw_data.csv |
Sector 541 (Professional, Scientific & Technical Services) — competitor universe |
Competitors_activities.csv |
Activity-code lookup table for sector 541 |
General_state_muncipality.csv |
State and municipality name lookup (INEGI geo codes) |
activity_type_overrides.csv |
Auditable competitor classification corrections (Direct / Indirect / Exclude) |
Note: Raw data files are included in this repository. They are sourced from DENUE and are in the public domain.
Small result files are committed to outputs/:
| File | Description |
|---|---|
state_opportunity_scores.csv |
Normalised opportunity score (0–1) for all 32 Mexican states |
state_competitive_intensity.csv |
Competitive Intensity Index (CII) per state |
market_entry_matrix.csv |
Full 32-state matrix with quadrant, entry score, and ranking |
priority_markets.csv |
Priority-quadrant states with top municipality drill-down |
municipality_hotspots.csv |
Municipality-level competitor hotspot flags |
market_opportunity_matrix.png |
Static 2×2 market opportunity chart |
Large generated files (cleaned_customers.csv, customer_segments.csv, competitors_filtered.csv) are excluded via .gitignore — they are regenerated by running the notebooks.
pip install -r requirements.txtRun notebooks in order: NB01 → NB02 → NB03. NB03 reads outputs from NB01 and NB02.
The choropleth maps in NB01 and NB03 fetch a Mexico GeoJSON at runtime from GitHub — an internet connection is required for those cells (graceful fallback if unavailable).
-
Customer segmentation (NB01): K-Means on 6 features (operational scale, digital contact flags, company age, regional context). Optimal k selected via Calinski-Harabasz index with a business minimum of k = 4. Company age imputed with sector median for records with no DENUE registration date (58% of records). Decision Tree classifier trained on cluster outputs as a deployable scoring model.
-
Competitive intelligence (NB02): Competitor classification via auditable keyword rules + manual override CSV. Competitive Intensity Index = weighted composite of density (0.45), Gini (0.20), HHI (0.20), CAGR (0.15). States with fewer than 5 Direct competitors are assigned CII = 0 to avoid HHI artefacts.
-
Market entry framework (NB03): 2×2 matrix using median split on both axes (data-driven, not fixed at 0.5). Entry score = (0.7 × opportunity_score + 0.3 × volume_norm) − 0.6 × competitive_intensity, combining tier quality with total addressable market volume (log-dampened) before penalising competition. Municipality drill-down ranks sub-state markets by customer/competitor ratio.
- DENUE records no financial data. Revenue is not estimated. Firm size (employee band) is used as a deal-size proxy.
- Digital Contact Score measures contact-data presence in DENUE, not verified digital maturity.
- Competitive intensity is computed from registered DENUE entities only; informal and foreign competitors are not captured.
- Company age imputation assumes the sector median is representative for all states — a stated, conservative assumption.
Data: INEGI DENUE (public domain). Analysis: original work.