Skip to content

macash916/IMADATA-Market-Intelligence

Repository files navigation

IMADATA Market Intelligence Framework — Mexico

MSc Business Analytics Dissertation Project

A three-notebook market intelligence pipeline that uses Mexican open business registry data (DENUE) to identify customer segments, map the competitive landscape, and produce a data-driven market entry framework for IMADATA — a Mexico-based data analytics company.


Notebooks

Notebook Purpose Key outputs
NB01_customer_segmentation.ipynb K-Means segmentation of 43,330 wholesale-trade firms; state-level opportunity scoring state_opportunity_scores.csv
NB02_competitive_intelligence.ipynb Competitive intensity analysis across 20,874 PST firms; HHI, Gini, CAGR, CII composite state_competitive_intensity.csv, municipality_hotspots.csv
NB03_market_entry_framework.ipynb 2×2 Market Opportunity Matrix; entry sequencing; municipality drill-down; strategic recommendations market_entry_matrix.csv, priority_markets.csv

Data

Source: DENUE — Directorio Estadístico Nacional de Unidades Económicas, INEGI (open data, public domain).

File Description
Potential_customers_raw_data.csv Sector 43 (Wholesale Trade) — potential customer universe
Competitors_raw_data.csv Sector 541 (Professional, Scientific & Technical Services) — competitor universe
Competitors_activities.csv Activity-code lookup table for sector 541
General_state_muncipality.csv State and municipality name lookup (INEGI geo codes)
activity_type_overrides.csv Auditable competitor classification corrections (Direct / Indirect / Exclude)

Note: Raw data files are included in this repository. They are sourced from DENUE and are in the public domain.


Outputs

Small result files are committed to outputs/:

File Description
state_opportunity_scores.csv Normalised opportunity score (0–1) for all 32 Mexican states
state_competitive_intensity.csv Competitive Intensity Index (CII) per state
market_entry_matrix.csv Full 32-state matrix with quadrant, entry score, and ranking
priority_markets.csv Priority-quadrant states with top municipality drill-down
municipality_hotspots.csv Municipality-level competitor hotspot flags
market_opportunity_matrix.png Static 2×2 market opportunity chart

Large generated files (cleaned_customers.csv, customer_segments.csv, competitors_filtered.csv) are excluded via .gitignore — they are regenerated by running the notebooks.


Setup

pip install -r requirements.txt

Run notebooks in order: NB01 → NB02 → NB03. NB03 reads outputs from NB01 and NB02.

The choropleth maps in NB01 and NB03 fetch a Mexico GeoJSON at runtime from GitHub — an internet connection is required for those cells (graceful fallback if unavailable).


Methodology Notes

  • Customer segmentation (NB01): K-Means on 6 features (operational scale, digital contact flags, company age, regional context). Optimal k selected via Calinski-Harabasz index with a business minimum of k = 4. Company age imputed with sector median for records with no DENUE registration date (58% of records). Decision Tree classifier trained on cluster outputs as a deployable scoring model.

  • Competitive intelligence (NB02): Competitor classification via auditable keyword rules + manual override CSV. Competitive Intensity Index = weighted composite of density (0.45), Gini (0.20), HHI (0.20), CAGR (0.15). States with fewer than 5 Direct competitors are assigned CII = 0 to avoid HHI artefacts.

  • Market entry framework (NB03): 2×2 matrix using median split on both axes (data-driven, not fixed at 0.5). Entry score = (0.7 × opportunity_score + 0.3 × volume_norm) − 0.6 × competitive_intensity, combining tier quality with total addressable market volume (log-dampened) before penalising competition. Municipality drill-down ranks sub-state markets by customer/competitor ratio.


Limitations

  • DENUE records no financial data. Revenue is not estimated. Firm size (employee band) is used as a deal-size proxy.
  • Digital Contact Score measures contact-data presence in DENUE, not verified digital maturity.
  • Competitive intensity is computed from registered DENUE entities only; informal and foreign competitors are not captured.
  • Company age imputation assumes the sector median is representative for all states — a stated, conservative assumption.

Data: INEGI DENUE (public domain). Analysis: original work.