MMTD: Metropolitan Multi-source Traffic Data

A large-scale, spatio-temporally aligned multi-source traffic dataset at metropolitan scale, built for deep learning and traffic state inference. MMTD standardizes heterogeneous fixed sensors and floating car data onto a unified road network, providing a reproducible benchmark for data-driven traffic dynamics modeling.

Data Access

The processed fixed-sensor data (excluding floating car data) is publicly available on Zenodo:

Zenodo Record: https://doi.org/10.5281/zenodo.18781181
Version: v1.0
Contents: Processed fixed-sensor data, graph topology, and static road attributes

Note on Floating Car Data: The floating car data used in this project is TomTom Traffic Stats, which is a commercial data product. Therefore, TomTom data is not included in the Zenodo release. Users can obtain TomTom Traffic Stats data directly from TomTom Developer Portal under their terms. We provide binding scripts/notebooks to align your own TomTom data with our processed fixed-sensor data to construct the complete MMTD.

Overview

Spatial scope: Madrid metropolitan area
Temporal resolution: 15 minutes (aligned across sources)
Road segments: 25,166 (OSM-based, intersection-to-intersection)
Fixed sensors: 2,476 (2,371 urban + 105 highway)
Floating car (TomTom): full network coverage except for living streets.

MMTD includes traffic state variables (flow, speed, travel time, occupancy, congestion index where available), road static attributes (length, lanes, speed limit), and graph topology (adjacency, betweenness/closeness centrality).

Data Source Attribution

This dataset is derived from the following original data sources:

Source	Description	Original Provider
Fixed sensors	~4,678 loop detectors; 15-min historical data; flow, occupancy, speed (highway), congestion index (urban).	Ayuntamiento de Madrid
Lane counts	Lane-level data for flow normalization.	Ayuntamiento de Madrid
Road network	Road topology and geometry.	OpenStreetMap
Floating car	15-min aggregates (flow, speed, travel time) on TomTom's segment geometry.	TomTom

Madrid fixed-sensor and lane-count data are reusable under the Madrid Open Data terms (Real Decreto Legislativo 1/1996), which requires attribution to the source and author.

Analysis period: August 2024.

Left: Fixed sensors (Madrid). Right: TomTom floating car coverage.

Pipeline Summary

MMTD is built through a standardized pipeline:

Layer matching
- Point–line: Match fixed sensors to lane-count layer, then to OSM segments (distance + heading + road-type constraints; search radius 15 m urban / 20 m highway).
- Line–line: Match OSM segments to TomTom network via HMM-based map matching (e.g. leuvenmapmatching), with heading consistency and overlap-ratio checks.
Spatial normalization
- OSM: merge degree-2 nodes to get intersection-to-intersection segments.
- TomTom: map and aggregate TomTom segments onto these OSM segments (length-weighted flow; harmonic-mean speed). Highway: further split into ~500 m sub-segments.
Fixed-sensor merge & cleaning
- Merge co-located lane-level sensors (sum flow, average speed).
- Drop sensors with <80% valid ratio; adaptive outlier removal (IQR / 3σ by distribution). No imputation (to preserve evaluation of reconstruction methods).
Graph & packaging
- Road segments → graph nodes; connectivity → edges. Features: geography, length, lanes, speed limit, betweenness/closeness centrality.
- Outputs: adjacency matrix, node features, fixed-sensor and floating-car state tensors (e.g. T × N × F), HDF5 + PyG-style dataset and loaders with optional observation masks.

Dataset Structure (High Level)

Graph: adjacency (sparse), node features (geo + static + centrality).
Fixed sensor: T × N_s × F_s (e.g. flow, occupancy, speed/congestion).
Floating car: T × N_f × F_f (e.g. flow, speed, travel time); N_f covers full network.
Alignment: common segment IDs link all components.

Repository Contents

Processing code: pipeline for layer matching, spatial normalization, cleaning, and graph/dataset construction.
Processed fixed-sensor data: publicly available on Zenodo (CC BY 4.0 license).
TomTom binding: scripts/notebooks to align fixed-sensor data with TomTom (users supply their own TomTom data under TomTom’s terms).
Data loaders: PyG-compatible dataset interface with graph sampling and configurable observation masks.

Raw TomTom data is not redistributable; only processing methodology and processed fixed-sensor outputs are shared.

Getting Started

Clone the repo and install dependencies (see requirements.txt or environment files in the repo).
Use the provided scripts/notebooks to reproduce the pipeline or load preprocessed fixed-sensor data.
To obtain full MMTD (fixed + TomTom), acquire TomTom data for the same period/area and run the binding notebooks.
Load the PyG dataset via the custom loader for training/evaluation (e.g. state estimation, imputation, forecasting).

Specific commands and example snippets will be added as the repository is populated.

Citation

If you use MMTD in your work, please cite:

Dataset (Zenodo)

When using the processed data, please cite the Zenodo record:

@dataset{zhou2026mmtd,
  author       = {Zhou, Qishen},
  title        = {MMTD: Madrid Fixed-Sensor Traffic Data (Processed
                   Sample for Reproducible Research)},
  month        = feb,
  year         = {2026},
  publisher    = {Zenodo},
  version      = {v1.0},
  doi          = {10.5281/zenodo.18781181},
  url          = {https://doi.org/10.5281/zenodo.18781181}
}

Thesis Reference

@phdthesis{zhouphd,
  author = {Qishen Zhou},
  title  = {Resilient Inference and Uncertainty Quantification for Global Traffic State Using Sparse Data},
  school = {Zhejiang University},
  year   = {2026}
}

License

Code and processing scripts: MIT License — use, modify, and distribute freely; keep the copyright notice.
Processed data we distribute in this repository: CC BY 4.0 — attribution appreciated; otherwise use freely. (Summary)
Underlying data sources (Madrid, TomTom, OSM) are subject to their own terms. We do not grant rights beyond what those sources allow.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
CITATION.cff		CITATION.cff
LICENSE		LICENSE
MadridFS.png		MadridFS.png
MadridTom.png		MadridTom.png
README.md		README.md
pipeline.png		pipeline.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MMTD: Metropolitan Multi-source Traffic Data

Data Access

Overview

Data Source Attribution

Pipeline Summary

Dataset Structure (High Level)

Repository Contents

Getting Started

Citation

Dataset (Zenodo)

Thesis Reference

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

MMTD: Metropolitan Multi-source Traffic Data

Data Access

Overview

Data Source Attribution

Pipeline Summary

Dataset Structure (High Level)

Repository Contents

Getting Started

Citation

Dataset (Zenodo)

Thesis Reference

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Packages