This project implements an intelligent video-based traffic anomaly detection system that identifies unusual patterns, congestion levels, and potential accidents in traffic footage using deep learning. The system leverages Vision Transformers (ViT) for feature extraction and Isolation Forest for unsupervised anomaly detection.
Traffic monitoring is critical for urban safety and congestion management. Manual analysis of video footage is time-consuming and inefficient. This project automates the detection of anomalous events (accidents, congestion) by analyzing visual patterns in traffic videos without requiring labeled training data.
- Extracts all frames from input video files (.avi, .mp4, .MOV)
- Resizes frames to 224×224 for model compatibility
- Applies smart quantization (takes every N-th frame) to reduce computational load while preserving temporal information
- Example: 3000 raw frames → ~1000 quantized frames for analysis
- Uses pre-trained vit_base_patch16_224 model from timm library
- Extracts high-dimensional feature vectors (768-dim) from each frame
- Vision Transformers capture global context better than CNNs, making them ideal for detecting subtle anomalies
- Processes features in batches for efficiency
- Applies StandardScaler to normalize feature vectors
- Ensures all features have zero mean and unit variance
- Essential for anomaly detection algorithms to work effectively
- Trains Isolation Forest on normalized features (contamination=5%)
- Unsupervised approach—no labeled data required
- Generates anomaly scores: lower scores indicate more anomalous frames
- Classifies frames as normal or anomalous based on isolation paths in random decision trees
- Maps anomaly scores to three congestion levels:
- HIGH: Bottom 33% of scores (most anomalous patterns)
- MEDIUM: Middle 33% of scores
- LOW: Top 34% of scores (most normal patterns)
- Provides actionable intelligence for traffic management
- Identifies frames in the bottom 1st percentile of anomaly scores
- Flags these as potential accidents for human review
- Timestamps provided for quick reference
✅ Fully Automated Pipeline - No manual labeling or preprocessing required
✅ Temporal Awareness - Calculates precise timestamps for each detected anomaly
✅ Comprehensive Reporting - Outputs detailed JSON with scores, timestamps, and classifications
✅ Visual Analytics - Generates charts comparing raw vs. quantized frames and anomaly distributions
✅ Efficient Processing - Smart quantization reduces computation while maintaining analytical quality
✅ Production-Ready - Includes error handling, progress tracking, and verification steps
The system generates a comprehensive JSON report containing:
{
"metadata": {
"model_name": "vit_base_patch16_224",
"anomaly_detector": "IsolationForest"
},
"summary": {
"total_frames_analyzed": 1000,
"anomalies_detected": 50,
"potential_accidents": 10,
"congestion_level_counts": {...}
},
"frames_data": [
{
"frame_id": "frame_001",
"timestamp": "00:05.30",
"anomaly_score": -0.2451,
"congestion_level": "HIGH",
"is_accident": true
}
]
}| Component | Technology |
|---|---|
| Video Processing | OpenCV |
| Deep Learning | PyTorch, torchvision |
| Vision Model | Timm (vit_base_patch16_224) |
| Anomaly Detection | Scikit-learn (IsolationForest) |
| Data Processing | NumPy, Pandas |
| Visualization | Matplotlib, Seaborn |
| Environment | Google Colab (GPU accelerated) |
Input Videos → Frame Extraction → Quantization → ViT Feature Extraction
↓
Feature Normalization → Isolation Forest Training → Anomaly Scoring
↓
Congestion Classification → Accident Detection → JSON Report Generation
↓
Visualizations & Download
- Contamination Rate: 5% (expected anomalies)
- Isolation Forest Estimators: 100 trees
- Feature Dimension: 768 (from ViT)
- Processing: GPU-accelerated (CUDA if available)
- Upload video files to
/content/directory - Run the notebook cells sequentially
- System automatically:
- Extracts and processes frames
- Detects anomalies
- Generates timestamps and congestion levels
- Creates comprehensive JSON report
- Download the
traffic_anomaly_detection_report.json
- No manual annotation needed (unsupervised learning)
- Captures complex visual patterns that rule-based systems miss
- Scalable to large video datasets
- Real-time insights with precise timestamps
- Adaptable to different traffic scenarios without retraining
- Real-time video stream processing
- Multi-model ensemble for improved accuracy
- Integration with traffic management systems
- Fine-tuning ViT on domain-specific traffic data
- Spatial anomaly localization (bounding boxes)
This project demonstrates how combining Vision Transformers with unsupervised anomaly detection creates a powerful tool for intelligent traffic monitoring. The system identifies congestion and accident patterns automatically, enabling faster emergency response and better urban traffic management.
Status: ✅ Production Ready | Last Updated: December 2025
This project uses Artificial Intelligence to watch traffic videos and automatically detect unusual events like accidents, congestion, and traffic anomalies. Instead of manually watching hours of video footage, this system analyzes every frame and creates a detailed report showing exactly when and where problems occurred.
Imagine a traffic management center in a city with 100 surveillance cameras. Instead of having people watch each camera 24/7, this system watches all cameras automatically, detects problems in seconds, and alerts operators with timestamps and severity levels.
INPUT: Traffic video file (e.g., highway, intersection)
↓
PROCESS: AI analyzes each frame to detect patterns
↓
OUTPUT: JSON report with:
- When anomalies occurred (timestamp)
- How severe they are (anomaly score)
- Traffic congestion level (HIGH/MEDIUM/LOW)
- Potential accident locations
Video: 14,000 frames for 254 Videos (2 minutes at 30 FPS)
↓
System extracts every single frame as an image
↓
Result: 14,000 individual JPG images saved
Why? Videos are too large to process directly. We break them into frames.
14,000 frames extracted
↓
Take 1 frame every 3 frames
↓
Result: ~4,667 frames (33% of original)
Why? We keep enough frames to detect patterns but reduce processing time.
Frame image
↓
Vision Transformer (ViT) model analyzes it
↓
Creates a "fingerprint" (768 numbers) representing what the model sees
↓
Result: Feature vector stored for analysis
Why? Raw pixels are meaningless. The AI converts images into meaningful numbers it can understand.
4,667 feature vectors
↓
Isolation Forest model learns what "normal traffic" looks like
↓
Scores each frame: Lower score = more unusual
↓
Result: Anomaly score for each frame (-0.85 to +0.92)
Why? The model identifies frames that don't match normal patterns (accidents, congestion, incidents).
For each frame, the system determines:
✓ Timestamp (MM:SS)
✓ Anomaly score (numerical)
✓ Congestion level (HIGH/MEDIUM/LOW)
✓ Is it a potential accident? (YES/NO)
All analyzed data → Organized into JSON format
↓
Includes summary statistics
↓
Ready for download and analysis
┌─────────────────────────────────┐
│ Your Traffic Video File │
│ (e.g., highway.avi) │
└────────────┬────────────────────┘
│
▼
┌────────────────────┐
│ PHASE 1: EXTRACT │
│ ALL FRAMES │
│ No skipping │
└────────┬───────────┘
│
▼
14,000 raw frames
(complete video data)
│
▼
┌────────────────────┐
│ PHASE 2: QUANTIZE │
│ (Smart sampling) │
│ Take 1/3 frames │
└────────┬───────────┘
│
▼
4,667 quantized frames
(33% of original)
│
▼
┌────────────────────────┐
│ PHASE 3: ViT FEATURES │
│ AI extracts patterns │
│ 768-dim vectors │
└────────┬───────────────┘
│
▼
┌────────────────────────────┐
│ PHASE 4: ANOMALY DETECTION │
│ Isolation Forest model │
│ Scores each frame │
└────────┬───────────────────┘
│
▼
┌────────────────────────────┐
│ PHASE 5: CLASSIFICATION │
│ - Timestamps │
│ - Congestion levels │
│ - Accident detection │
└────────┬───────────────────┘
│
▼
┌────────────────────────┐
│ PHASE 6: JSON REPORT │
│ Ready for download │
└────────────────────────┘
smart-traffic-anomaly-detection/
├── README.md # This file
├── requirements.txt # Python dependencies
│
├── frames_raw/ # Step 1 output
│ └── traffic_video_1/
│ ├── frame_000001.jpg
│ ├── frame_000002.jpg
│ └── ... (14,000 frames)
│
├── frames_quantized/ # Step 2 output
│ └── traffic_video_1/
│ ├── frame_00001.jpg
│ ├── frame_00002.jpg
│ └── ... (~4,667 frames)
│
└── results/ # Step 6 output
├── traffic_anomaly_detection_report.json
├── 01_raw_frames_distribution.png
├── 02_raw_vs_quantized_comparison.png
└── 03_sample_frames.png
- Google account
- Traffic video file (MP4, AVI, or MOV format)
- 2-3 hours of GPU time (free from Colab)
- Open Google Colab
Go to: colab.research.google.com Create new notebook - Upload Your Video
Click: Files → Upload → Select your video Wait for upload to complete - Run Cells 1-8 (Extract & Quantize)
Copy Cell 1 → Run Copy Cell 2 → Run ... (continue for all 8 cells) Expected time: 5-10 minutes - Run Cells 9-18 (Analysis)
Copy Cell 9 → Run ... (continue through Cell 18) Expected time: 20-30 minutes The JSON file will download automatically!
{
"metadata": {
"model_name": "vit_base_patch16_224",
"anomaly_detector": "IsolationForest"
},
"summary": {
"total_frames_analyzed": 4667,
"anomalies_detected": 233,
"potential_accidents": 23,
"congestion_level_counts": {
"HIGH": 1555,
"MEDIUM": 1556,
"LOW": 1556
}
},
"frames_data": [
{
"frame_id": "frame_00001",
"timestamp": "00:10.50",
"anomaly_score": -0.4523,
"is_anomaly": true,
"congestion_level": "HIGH",
"is_accident": false
}
]
}| Field | Meaning | Example |
|---|---|---|
| timestamp | When in video (MM:SS) | "00:10.50" = 10.5 seconds |
| anomaly_score | How unusual (-1 to +1) | -0.45 = very unusual, +0.92 = normal |
| is_anomaly | Is frame abnormal? | true = yes, false = no |
| congestion_level | Traffic density | HIGH/MEDIUM/LOW |
| is_accident | Potential accident? | true = bottom 1% scores |
An anomaly is anything unusual in the video that differs from normal traffic patterns:
- 🚗 Stalled vehicles
- 💥 Accidents or collisions
- 🚦 Unusual congestion patterns
- 🚧 Road blockages
- 👤 People on roadway
Traffic density classification:
- HIGH: Heavy anomalies (unusual traffic patterns detected)
- MEDIUM: Moderate anomalies (some unusual activity)
- LOW: Normal traffic (routine patterns)
A number showing how unusual a frame is:
-0.85 ← Very unusual (likely anomaly)
-0.50 ← Unusual
0.00 ← Neutral
+0.50 ← Normal
+0.92 ← Very normal
Vision Transformer (ViT):
- Pre-trained on ImageNet
- Extracts 768-dimensional features
- Better at understanding context than CNNs
Isolation Forest:
- Unsupervised anomaly detection
- Identifies outliers in high dimensions
- Contamination rate: 5% (expects 5% anomalies)
14,000 raw frames
→ Resize to 224×224 pixels
→ Normalize using ImageNet standards
→ Quantize to 4,667 frames
→ Extract ViT features
→ Normalize with StandardScaler
→ Train Isolation Forest
→ Score each frame
→ Classify by percentiles
| Metric | Value |
|---|---|
| Frames Analyzed | 4,667 |
| Processing Time | 25-30 minutes |
| GPU Memory | ~4-6 GB |
| Output File Size | ~2-3 MB |
| Anomalies Detected | ~233 (5%) |
| Accuracy | Depends on training data |
# In Cell 3
FRAME_SKIP_FOR_QUANTIZATION = 3 # Current: 1/3 frames
# To get more frames:
FRAME_SKIP_FOR_QUANTIZATION = 2 # Gets ~50% of frames
FRAME_SKIP_FOR_QUANTIZATION = 1 # Gets all frames
# To get fewer frames:
FRAME_SKIP_FOR_QUANTIZATION = 4 # Gets ~25% of frames# In Cell 12
isolation_forest = IsolationForest(
contamination=0.05 # Current: 5% anomalies
# Change to 0.10 for 10% anomalies (more sensitive)
# Change to 0.02 for 2% anomalies (less sensitive)
)Solution:
- Use higher quantization (FRAME_SKIP_FOR_QUANTIZATION = 4)
- Process shorter videos
- Use free GPU in Colab
Solution:
- Lower contamination rate (0.02 instead of 0.05)
- Check if video has actual anomalies
- Verify quantization is working
Solution:
- Verify video FPS is detected correctly
- Check FRAME_SKIP_FOR_QUANTIZATION value
- Frame index × (skip_interval / FPS) = timestamp
- Vision Transformer (ViT): https://arxiv.org/abs/2010.11929
- Isolation Forest: https://cs.anu.edu.au/wp-content/uploads/2015/06/Isolation-Forest.pdf
- PyTorch: https://pytorch.org/
- Timm (PyTorch Image Models): https://github.com/rwightman/pytorch-image-models
- Scikit-learn: https://scikit-learn.org/
- Traffic Management Centers
- Monitor multiple cameras
- Alert operators to incidents
- Reduce response time
- Highway Monitoring
- Detect accidents in real-time
- Track congestion patterns
- Optimize traffic flow
- Parking Lot Surveillance
- Detect unusual behavior
- Monitor traffic density
- City Planning
- Analyze traffic patterns
- Identify problem areas
- Plan infrastructure improvements
Project Name: Smart Traffic Anomaly Detection System
Technology Stack:
- Vision Transformer (Meta AI)
- Isolation Forest (Scikit-learn)
- PyTorch Deep Learning Framework
If you encounter issues:
- Check the troubleshooting section above
- Verify all dependencies are installed (requirements.txt)
- Ensure video file is valid and not corrupted
- Check GPU memory availability in Colab
| Aspect | Details |
|---|---|
| Purpose | Detect traffic anomalies in surveillance videos |
| Input | Video file (AVI, MP4, MOV) |
| Output | JSON report with timestamps & classifications |
| Processing Time | ~30 minutes for 14k frames |
| Accuracy | Depends on training data & model tuning |
| Use Case | Traffic management, incident detection |
| Scalability | Can handle multiple videos sequentially |
This project demonstrates how Artificial Intelligence can automate surveillance analysis, making traffic management more efficient and responsive. By using advanced deep learning models and anomaly detection techniques, we can identify unusual events automatically, saving time and improving public safety.
Key Takeaway: Instead of humans watching videos 24/7, AI does the watching and alerts humans only when something unusual happens.