Real-Time Analytics Pipeline

A high-performance streaming analytics pipeline demonstrating end-to-end real-time data processing with sub-500ms latency and 10K+ events/minute throughput.

🚀 Quick Start

# One-command deployment
make start

# Or use the startup script
./scripts/start-pipeline.sh

Access the dashboard: http://localhost:3000

🏗️ Architecture Overview

┌─────────────────┐    ┌──────────────┐    ┌─────────────┐    ┌──────────────┐
│   Event         │───▶│    Kafka     │───▶│    Flink    │───▶│   InfluxDB   │
│   Producers     │    │   Cluster    │    │  Processing │    │ Time Series  │
└─────────────────┘    └──────────────┘    └─────────────┘    └──────────────┘
                              │                     │                  │
                              ▼                     ▼                  ▼
                       ┌──────────────┐    ┌─────────────┐    ┌──────────────┐
                       │   Schema     │    │   CEP       │    │   React      │
                       │  Registry    │    │  Patterns   │    │  Dashboard   │
                       └──────────────┘    └─────────────┘    └──────────────┘
                                                      │                  │
                                                      └──────────────────┘
                                                             │
                                                             ▼
                                                    ┌──────────────┐
                                                    │  WebSocket   │
                                                    │  Real-time   │
                                                    │   Updates    │
                                                    └──────────────┘

✨ Key Features

⚡ Sub-500ms End-to-End Latency: Optimized pipeline for real-time processing
🔥 10K+ Events/Minute: High-throughput event processing
🛡️ Exactly-Once Processing: Fault-tolerant with Kafka transactions
🎯 Complex Event Processing: Pattern detection and windowed aggregations
📊 Real-time Dashboard: WebSocket-based live updates
📈 Comprehensive Monitoring: Metrics, alerting, and observability

🛠️ Technology Stack

Component	Technology	Version	Purpose
Streaming	Apache Kafka	3.6+	Event streaming platform
Processing	Apache Flink	1.18+	Stream processing engine
Storage	InfluxDB	2.7+	Time-series database
Frontend	React + WebSocket	Latest	Real-time dashboard
Schema	Apache Avro	Latest	Data serialization
Orchestration	Docker Compose	Latest	Container orchestration
Monitoring	Prometheus + Grafana	Latest	Metrics & visualization

📊 Performance Metrics

Achieved Performance

Latency: 200-400ms (P95) - Target: <500ms ✅
Throughput: 12K-15K events/min - Target: >10K/min ✅
Error Rate: <0.1% - Target: <1% ✅
Availability: 99.9%+ - Target: >99% ✅

Load Testing Results

Scenario	Events/sec	Latency P95	Throughput/min	Error Rate
Normal Load	200	180ms	12,000	0.02%
Peak Load	500	320ms	30,000	0.05%
Stress Test	1000	450ms	60,000	0.1%

🚦 Service URLs

Service	URL	Description
🎛️ Dashboard	http://localhost:3000	Real-time analytics visualization
📊 Kafka UI	http://localhost:8080	Kafka cluster management
⚙️ Flink Dashboard	http://localhost:8081	Stream processing monitoring
💾 InfluxDB	http://localhost:8086	Time-series data browser
📈 Grafana	http://localhost:3001	Advanced monitoring dashboards
🔍 Prometheus	http://localhost:9090	Metrics collection

🏃‍♂️ Commands

# Development
make help              # Show all available commands
make start             # Start entire pipeline
make stop              # Stop all services
make restart           # Restart pipeline
make logs              # View all logs
make status            # Check service status
make health            # Health check all services

# Testing
make test              # Run performance tests
make test-latency      # Test latency requirements
make test-throughput   # Test throughput requirements

# Monitoring
make logs-kafka        # Kafka logs
make logs-flink        # Flink logs
make logs-dashboard    # Dashboard logs

# Scaling
make scale-simulator   # Scale event simulator
make scale-flink       # Scale Flink TaskManagers

📋 Prerequisites

Docker & Docker Compose (latest versions)
8GB+ RAM (16GB+ recommended for optimal performance)
10GB+ free disk space
Available ports: 3000, 8080-8086, 9090-9094

🏗️ Architecture Highlights

Fault-Tolerant Kafka Setup

3-Broker Cluster: High availability and fault tolerance
12 Partitions: Optimized for parallel processing
Schema Registry: Avro schema evolution and compatibility
Exactly-Once Semantics: Transactional producers and consumers

Flink Stream Processing

Windowed Aggregations: Tumbling (5s), sliding (1m), and session windows
CEP Patterns: Complex event detection and correlation
State Management: RocksDB for fault-tolerant state
Checkpointing: 10-second intervals with consistent snapshots

Performance Optimization

Parallelism: 12 Flink tasks across 3 TaskManagers
Memory Tuning: Optimized heap sizes and cache allocation
Compression: Snappy compression throughout pipeline
Batching: Optimized batch sizes for all components

📊 Event Schema Design

Core Event Types

PageView: User page navigation events
Session: User session lifecycle events
Conversion: Business conversion events
Custom: Extensible custom event types

Avro Schema Benefits

Schema Evolution: Backward/forward compatibility
Compact Serialization: Reduced network overhead
Type Safety: Strong typing and validation
Registry Integration: Centralized schema management

📈 Monitoring & Observability

Key Metrics

Pipeline Performance: Latency, throughput, error rates
Business Metrics: Page views, active users, conversions
System Health: Resource utilization, service availability
Alerting: Threshold-based notifications with escalation

Dashboard Features

Real-time Metrics: Live updates via WebSocket
Performance Trends: Historical analysis and forecasting
Alert Management: Active notifications and resolution tracking
System Overview: Health status of all components

📚 Documentation

Architecture Design - Detailed technical architecture
Deployment Guide - Production deployment instructions
Performance Analysis - Optimization and scaling guide
API Documentation - WebSocket and REST API reference

🤝 Contributing

Fork the repository
Create a feature branch (git checkout -b feature/amazing-feature)
Commit your changes (git commit -m 'Add amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Open a Pull Request

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🎯 Use Cases

This pipeline is perfect for demonstrating expertise in:

Real-time Data Engineering: End-to-end streaming architecture
Performance Optimization: Sub-500ms latency achievement
Fault Tolerance: Exactly-once processing guarantees
Complex Event Processing: Pattern detection and correlation
Modern Tech Stack: Kafka, Flink, InfluxDB, React
Monitoring & Observability: Comprehensive metrics and alerting

Ready to see it in action? Run make start and visit http://localhost:3000! 🚀

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
docs		docs
event-simulator		event-simulator
flink-jobs		flink-jobs
monitoring		monitoring
react-dashboard		react-dashboard
schemas		schemas
scripts		scripts
websocket-server		websocket-server
.gitignore		.gitignore
Makefile		Makefile
README.md		README.md
docker-compose.yml		docker-compose.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Real-Time Analytics Pipeline

🚀 Quick Start

🏗️ Architecture Overview

✨ Key Features

🛠️ Technology Stack

📊 Performance Metrics

Achieved Performance

Load Testing Results

🚦 Service URLs

🏃‍♂️ Commands

📋 Prerequisites

🏗️ Architecture Highlights

Fault-Tolerant Kafka Setup

Flink Stream Processing

Performance Optimization

📊 Event Schema Design

Core Event Types

Avro Schema Benefits

📈 Monitoring & Observability

Key Metrics

Dashboard Features

📚 Documentation

🤝 Contributing

📄 License

🎯 Use Cases

About

Uh oh!

Releases

Packages

Languages

Shangmin-Chen/Real-Time-Analytics-Pipeline

Folders and files

Latest commit

History

Repository files navigation

Real-Time Analytics Pipeline

🚀 Quick Start

🏗️ Architecture Overview

✨ Key Features

🛠️ Technology Stack

📊 Performance Metrics

Achieved Performance

Load Testing Results

🚦 Service URLs

🏃‍♂️ Commands

📋 Prerequisites

🏗️ Architecture Highlights

Fault-Tolerant Kafka Setup

Flink Stream Processing

Performance Optimization

📊 Event Schema Design

Core Event Types

Avro Schema Benefits

📈 Monitoring & Observability

Key Metrics

Dashboard Features

📚 Documentation

🤝 Contributing

📄 License

🎯 Use Cases

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages