Advanced vehicular traffic analysis system for the General Manuel Belgrano Bridge (Corrientes, Argentina), optimized for SISE dynamic surveillance cameras and long-duration video. Three-module pipeline: bootstrap (archived), data preparation (one-time training), and production (YOLO 11 + MLP classifier + self-improving feedback loop).
flowchart LR
subgraph "Module 0 — Bootstrap (ARCHIVED)"
A0[01_legacy_collection.ipynb] -->|Generated| TD[(traffic_data)]
end
subgraph "Module 1 — Data Preparation (one-time)"
TD -->|Telemetry| M1[data_preparation.ipynb]
M1 -->|9→14 features| M1
M1 -->|Auto-label + SMOTE| M1
M1 -->|Train MLP| ART[.keras + .joblib]
end
subgraph "Module 2 — Production (ongoing)"
USR[User] -->|Upload .mp4| M2[traffic_analyzer.ipynb]
M2 -->|YOLO 11 + SORT| GPU[GPU T4/V100]
GPU -->|Detections| M2
ART -->|Load model| M2
M2 -->|Classify| M2
M2 -->|Persist| DB[(AWS RDS<br/>PostgreSQL)]
M2 -->|Feedback loop| M2
M2 -->|Annotated video + state| USR
end
subgraph "Shared Code"
SRC[src/] -.->|imports| M1
SRC -.->|imports| M2
end
- Module 0 — Bootstrap (archived):
archive/00_bootstrap/01_legacy_collection.ipynb— Historical YOLO 11 pipeline that generatedtraffic_data. Never runs again. - Module 1 — Data Preparation:
notebooks/01_data_prep/data_preparation.ipynb— Feature engineering + auto-labeling + SMOTE + MLP training → exports.kerasand.joblibartifacts - Module 2 — Production:
notebooks/02_production/traffic_analyzer.ipynb— YOLO 11 + SORT + speed estimation + trained MLP classifier + DB persistence + self-improving feedback loop - Shared code:
src/— Reusable Python modules imported by Modules 1 and 2 - Persistence: PostgreSQL on AWS RDS (optional). 3 tables:
traffic_data(legacy),telemetry_raw+traffic_classifications(active)
vaaet/
├── archive/
│ └── 00_bootstrap/
│ ├── 01_legacy_collection.ipynb # Module 0: Historical YOLO pipeline (FROZEN)
│ └── README.md # Explains deprecated status
├── notebooks/
│ ├── 01_data_prep/
│ │ └── data_preparation.ipynb # Module 1: Feature eng. + model training
│ └── 02_production/
│ └── traffic_analyzer.ipynb # Module 2: YOLO + classifier + feedback
├── src/
│ ├── __init__.py
│ ├── config.py # Single source of truth: constants, paths, thresholds
│ ├── db.py # SQLAlchemy engine factory, credential handling
│ ├── features.py # Feature engineering (9 → 14 columns)
│ ├── labeling.py # Auto-labeling rules (4 traffic states)
│ └── perception/
│ ├── __init__.py
│ ├── detector.py # YOLODetector wrapper
│ ├── tracker.py # SORTTracker wrapper
│ └── speed.py # Physics-based speed estimation
├── models/
│ ├── perception/ # YOLO weights (downloaded at runtime, gitignored)
│ └── intelligence/ # Module 1 artifacts (.keras, .joblib, gitignored)
├── data/
│ ├── raw/ # DB backups (gitignored)
│ ├── processed/ # Feature CSVs (gitignored)
│ └── samples/ # Example data
├── docs/
│ ├── PRD.md # Product requirements
│ ├── DDS.md # Software design
│ ├── USER_GUIDE.md # User guide
│ ├── DATA_LINEAGE.md # Data lineage
│ ├── BIAS_AND_LIMITATIONS.md # Biases and limitations
│ ├── KPIs/KPIs.md # Metrics and validation
│ ├── adr/ # Architecture Decision Records (9 ADRs)
│ └── diagrams/ # Mermaid diagrams
├── README.md, AGENTS.md, CONTRIBUTING.md, CHANGELOG.md
├── requirements.txt, llms.txt, llms-full.txt
└── LICENSE
- Python 3.8+ (or Google Colab with free GPU)
- PostgreSQL on AWS RDS (optional — system degrades gracefully without DB)
pip install -r requirements.txt- Open
notebooks/01_data_prep/data_preparation.ipynbin Google Colab - Run all 9 code cells in order (Cell 0 through Cell 8)
- Configure DB credentials via environment variables in Cell 2 (
DB_HOST,DB_PORT,DB_NAME,DB_USER,DB_PASSWORD) - The system extracts telemetry from
traffic_data, engineers 14 features, auto-labels 4 traffic states, balances classes with SMOTE, and trains an MLP classifier - Artifacts exported:
models/intelligence/traffic_classifier.keras,feature_scaler.joblib,label_mapping.joblib - Target metric: F1-macro ≥ 0.85
- Open
notebooks/02_production/traffic_analyzer.ipynbin Google Colab - Run Cell 0 (environment setup) and Cell 1 (load trained model)
- Upload a video clip with format
bridge_YYYY-MM-DD_HH-MM-SS_to_HH-MM-SS.mp4 - Run Cell 2 to process the clip (YOLO 11 detection + SORT tracking + speed estimation)
- Run Cell 3 to classify traffic state using the trained MLP
- Run Cell 4 to persist results to DB (optional)
- Run Cell 5 for HITL feedback and re-training (optional)
- Run Cell 6 for visualization dashboard
| Feature | Description |
|---|---|
| Adaptive YOLO selection | 5 model variants selected by video duration (<1h: yolo11x, 1-3h: yolo11l, etc.) |
| Hybrid speed estimation | 70% physics + 30% MLP smoothing, with plausibility filters [2-120 km/h] |
| 4 traffic states | Normal, Reduced, Congested, Accident — classified by MLP from 14 engineered features |
| Self-improving feedback | HITL corrections feed back into retraining pipeline |
| Multi-camera support | Auto-detects 1, 2, or 4 camera layouts |
| Silent degradation | Continues without DB if unavailable; falls back to physics-only speed |
| Strict validation | Video filename format enforced; speeds outside range silently discarded |
CREATE TABLE IF NOT EXISTS traffic_data (
id SERIAL PRIMARY KEY,
clip_id TEXT NOT NULL,
record_time TIMESTAMP NOT NULL,
avg_speed NUMERIC(5,2) NOT NULL,
count_car INTEGER NOT NULL,
count_truck INTEGER NOT NULL,
count_bus INTEGER NOT NULL,
count_motorcycle INTEGER NOT NULL,
count_bicycle INTEGER NOT NULL,
total_vehicles INTEGER NOT NULL,
UNIQUE (clip_id, record_time)
);CREATE TABLE IF NOT EXISTS telemetry_raw (
id SERIAL PRIMARY KEY,
source_record_id INTEGER REFERENCES traffic_data(id),
record_time TIMESTAMP NOT NULL,
avg_speed NUMERIC(5,2),
total_vehicles INTEGER,
count_car INTEGER, count_truck INTEGER, count_bus INTEGER,
count_motorcycle INTEGER, count_bicycle INTEGER,
heavy_vehicle_ratio NUMERIC(5,4),
delta_speed NUMERIC(6,2), delta_count INTEGER,
transition_flag SMALLINT DEFAULT 0,
speed_variance NUMERIC(6,2),
hour_of_day SMALLINT, weather_condition SMALLINT DEFAULT 0,
UNIQUE (source_record_id)
);
CREATE TABLE IF NOT EXISTS traffic_classifications (
id SERIAL PRIMARY KEY,
telemetry_id INTEGER REFERENCES telemetry_raw(id),
classified_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
traffic_state SMALLINT NOT NULL,
state_label TEXT NOT NULL,
confidence NUMERIC(5,4) NOT NULL,
model_version TEXT NOT NULL,
is_human_validated BOOLEAN DEFAULT FALSE,
human_override_state SMALLINT,
validated_at TIMESTAMP,
UNIQUE (telemetry_id, model_version)
);- Credentials are never exposed in cell outputs
- DB persistence only activates when all environment variables are present
- No hardcoded connection strings
- Bridge: General Manuel Belgrano, 1700m length, 8.3m roadway width
- Cameras: SISE dynamic at 60m height, with zoom, pan, night vision
- Vehicle types: car, truck, bus, motorcycle, bicycle
- Typical speeds: 40-80 km/h normal flow, 0-20 km/h congestion
Core: numpy, pandas, sqlalchemy, psycopg2-binary, joblib
Data Preparation (Module 1): tensorflow, scikit-learn, imbalanced-learn, matplotlib, seaborn
Production (Module 2): ultralytics, opencv-python, tensorflow, scikit-learn
pip install -r requirements.txt| Document | Description |
|---|---|
| docs/PRD.md | Product requirements |
| docs/DDS.md | Software design and diagrams |
| docs/USER_GUIDE.md | User guide |
| docs/KPIs/KPIs.md | Metrics and validation |
| docs/DATA_LINEAGE.md | Data lineage |
| docs/BIAS_AND_LIMITATIONS.md | Biases and limitations |
| docs/adr/ | Architecture Decision Records (9 ADRs) |
| docs/diagrams/ | Mermaid diagrams |
| AGENTS.md | Agentic context for AI agents |
| CONTRIBUTING.md | Contribution guide |
Module 0 includes a synthetic video generator (archived) for portfolio demos without requiring real bridge footage. Scenarios: light, normal, busy, mixed, stationary_test.
For calibration, advanced integration, or questions, consult the notebooks, the user guide, and the inline comments in each cell.