A large-scale, spatio-temporally aligned multi-source traffic dataset at metropolitan scale, built for deep learning and traffic state inference. MMTD standardizes heterogeneous fixed sensors and floating car data onto a unified road network, providing a reproducible benchmark for data-driven traffic dynamics modeling.
The processed fixed-sensor data (excluding floating car data) is publicly available on Zenodo:
- Zenodo Record: https://doi.org/10.5281/zenodo.18781181
- Version: v1.0
- Contents: Processed fixed-sensor data, graph topology, and static road attributes
Note on Floating Car Data: The floating car data used in this project is TomTom Traffic Stats, which is a commercial data product. Therefore, TomTom data is not included in the Zenodo release. Users can obtain TomTom Traffic Stats data directly from TomTom Developer Portal under their terms. We provide binding scripts/notebooks to align your own TomTom data with our processed fixed-sensor data to construct the complete MMTD.
- Spatial scope: Madrid metropolitan area
- Temporal resolution: 15 minutes (aligned across sources)
- Road segments: 25,166 (OSM-based, intersection-to-intersection)
- Fixed sensors: 2,476 (2,371 urban + 105 highway)
- Floating car (TomTom): full network coverage except for living streets.
MMTD includes traffic state variables (flow, speed, travel time, occupancy, congestion index where available), road static attributes (length, lanes, speed limit), and graph topology (adjacency, betweenness/closeness centrality).
This dataset is derived from the following original data sources:
| Source | Description | Original Provider |
|---|---|---|
| Fixed sensors | ~4,678 loop detectors; 15-min historical data; flow, occupancy, speed (highway), congestion index (urban). | Ayuntamiento de Madrid |
| Lane counts | Lane-level data for flow normalization. | Ayuntamiento de Madrid |
| Road network | Road topology and geometry. | OpenStreetMap |
| Floating car | 15-min aggregates (flow, speed, travel time) on TomTom's segment geometry. | TomTom |
Madrid fixed-sensor and lane-count data are reusable under the Madrid Open Data terms (Real Decreto Legislativo 1/1996), which requires attribution to the source and author.
Analysis period: August 2024.
Left: Fixed sensors (Madrid). Right: TomTom floating car coverage.
MMTD is built through a standardized pipeline:
-
Layer matching
- Point–line: Match fixed sensors to lane-count layer, then to OSM segments (distance + heading + road-type constraints; search radius 15 m urban / 20 m highway).
- Line–line: Match OSM segments to TomTom network via HMM-based map matching (e.g. leuvenmapmatching), with heading consistency and overlap-ratio checks.
-
Spatial normalization
- OSM: merge degree-2 nodes to get intersection-to-intersection segments.
- TomTom: map and aggregate TomTom segments onto these OSM segments (length-weighted flow; harmonic-mean speed). Highway: further split into ~500 m sub-segments.
-
Fixed-sensor merge & cleaning
- Merge co-located lane-level sensors (sum flow, average speed).
- Drop sensors with <80% valid ratio; adaptive outlier removal (IQR / 3σ by distribution). No imputation (to preserve evaluation of reconstruction methods).
-
Graph & packaging
- Road segments → graph nodes; connectivity → edges. Features: geography, length, lanes, speed limit, betweenness/closeness centrality.
- Outputs: adjacency matrix, node features, fixed-sensor and floating-car state tensors (e.g.
T × N × F), HDF5 + PyG-style dataset and loaders with optional observation masks.
- Graph: adjacency (sparse), node features (geo + static + centrality).
- Fixed sensor:
T × N_s × F_s(e.g. flow, occupancy, speed/congestion). - Floating car:
T × N_f × F_f(e.g. flow, speed, travel time);N_fcovers full network. - Alignment: common segment IDs link all components.
- Processing code: pipeline for layer matching, spatial normalization, cleaning, and graph/dataset construction.
- Processed fixed-sensor data: publicly available on Zenodo (CC BY 4.0 license).
- TomTom binding: scripts/notebooks to align fixed-sensor data with TomTom (users supply their own TomTom data under TomTom’s terms).
- Data loaders: PyG-compatible dataset interface with graph sampling and configurable observation masks.
Raw TomTom data is not redistributable; only processing methodology and processed fixed-sensor outputs are shared.
- Clone the repo and install dependencies (see
requirements.txtor environment files in the repo). - Use the provided scripts/notebooks to reproduce the pipeline or load preprocessed fixed-sensor data.
- To obtain full MMTD (fixed + TomTom), acquire TomTom data for the same period/area and run the binding notebooks.
- Load the PyG dataset via the custom loader for training/evaluation (e.g. state estimation, imputation, forecasting).
Specific commands and example snippets will be added as the repository is populated.
If you use MMTD in your work, please cite:
When using the processed data, please cite the Zenodo record:
@dataset{zhou2026mmtd,
author = {Zhou, Qishen},
title = {MMTD: Madrid Fixed-Sensor Traffic Data (Processed
Sample for Reproducible Research)},
month = feb,
year = {2026},
publisher = {Zenodo},
version = {v1.0},
doi = {10.5281/zenodo.18781181},
url = {https://doi.org/10.5281/zenodo.18781181}
}@phdthesis{zhouphd,
author = {Qishen Zhou},
title = {Resilient Inference and Uncertainty Quantification for Global Traffic State Using Sparse Data},
school = {Zhejiang University},
year = {2026}
}- Code and processing scripts: MIT License — use, modify, and distribute freely; keep the copyright notice.
- Processed data we distribute in this repository: CC BY 4.0 — attribution appreciated; otherwise use freely. (Summary)
- Underlying data sources (Madrid, TomTom, OSM) are subject to their own terms. We do not grant rights beyond what those sources allow.


