This project is a scalable log analytics system that processes large-scale web server logs and provides an interactive dashboard with AI-powered insights. It combines distributed data processing (PySpark) with a modern analytics UI (Streamlit) and LLM-based intelligence (Gemini) to simulate a real-world monitoring system used in production environments.

- Filter logs by IP, endpoint, and status code
- Pagination for efficient browsing
- Requests over time (time-series visualization)
- Top endpoints analysis
- Status code distribution
- Detects traffic spikes using statistical methods
- Identifies abnormal request patterns
- Automatic summarization of traffic trends
- Natural language Q&A over log metrics
- Parses raw web server logs
- Extracts structured data (IP, timestamp, endpoint, status, etc.)
- Computes aggregated metrics
- Loads preprocessed data
- Provides interactive visualization and filtering
- Generates insights from metrics
- Answers user queries in natural language
- PySpark – Distributed data processing
- Streamlit – Interactive dashboard
- Pandas & Plotly – Data handling and visualization
- Google Gemini API – AI insights and Q&A
- Python – Core development
https://loganalyst-kushagra-gupta.streamlit.app/
This system mimics a real-world log monitoring platform, useful for:
- Observability dashboards
- Traffic analysis
- Error monitoring
- AI-assisted debugging
This project demonstrates the integration of data engineering, analytics, and AI into a unified system, showcasing how modern applications can leverage LLMs for intelligent monitoring.