Professional log analysis tool with ML-powered insights and comprehensive format support
# Clone repository
git clone https://github.com/mdskun/Log-analyser.git
cd Log-analyser
# Install dependencies
pip install -r requirements.txt
# Run application
streamlit run app.pyOpen your browser to http://localhost:8501 and upload a log file.
Sample files are in the examples/ folder if you want to try it immediately.
- Standard Logs: Syslog, Apache (Common/Combined), Custom
[timestamp][level][module]format - JSON Logs: Generic JSON, Docker, Kubernetes, AWS CloudWatch, GCP Cloud Logging
- XML Logs: Windows Event Logs (exported XML)
- Auto-detection: Smart format detection β just upload and go
- Statistical Metrics: Error rates, volume analysis, time-series aggregation
- ML Clustering: K-means clustering of error messages by semantic similarity
- Anomaly Detection: Statistical spike detection with 24-hour rolling z-scores
- Sequence Mining: Discover recurring event patterns that precede errors
- Heatmaps: Activity visualisation by time-of-day and module
- PII Redaction: Automatic removal of emails, IPs, UUIDs, JWT/AWS/GCP tokens
- Configurable: Toggle redaction on/off per session
- Export Safe: Redacted data flows through to CSV/JSON exports
- Pre-compiled regex: All patterns compiled once at import β zero per-line overhead
- Streaming I/O: Files are never fully loaded into memory β suitable for 100 MB+ files
- LRU caching:
detect_line_typeandparse_user_agentare cached across repeated values
pip install -r requirements.txt
streamlit run app.py# Clone
git clone https://github.com/mdskun/Log-analyser.git
cd Log-analyser
# Create and activate a virtual environment
python -m venv venv
source venv/bin/activate # Windows: venv\Scripts\activate
# Install with dev extras
pip install -e ".[dev]"
# Verify setup
pytest
streamlit run app.py- Run
streamlit run app.py - Upload any supported log file via the sidebar
- Configure settings: max lines to parse, PII redaction toggle
- Explore the eight analysis tabs:
| Tab | What it shows |
|---|---|
| π Data | Paginated log viewer with column filters |
| π Charts | Level distribution, HTTP status, time-series trends |
| πΊοΈ Heatmaps | Activity by hour-of-day and day-of-week |
| π Types & Ranking | Module error ranking, line-type breakdown |
| π€ Clusters | K-means grouping of similar error messages |
| π¨ Anomalies | Rolling z-score spike detection |
| 𧬠Sequences | Event patterns that precede errors |
| π₯ Export | Download filtered data as CSV or JSON |
from src.parsers import LogParser
from src.utils.io_utils import iter_lines, detect_format
from src.utils.enrichment import add_enrichments
from src.analytics.metrics import module_ranking, hourly_metrics
with open("app.log", "rb") as f:
lines = list(iter_lines(f))
fmt = detect_format(tuple(lines[:50]))
df = LogParser.parse(iter(lines), fmt)
df = add_enrichments(df)
print(module_ranking(df).head())
print(hourly_metrics(df)[lambda x: x["spike"]].head())See examples/analyse_programmatically.py
for a complete walkthrough covering all analytics functions.
import pandas as pd
from typing import Iterator
from src.parsers import LogParser
def analyze_my_format(lines: Iterator[str]) -> pd.DataFrame:
data = []
for line in lines:
data.append({"timestamp": ..., "level": ..., "module": ..., "message": line})
return pd.DataFrame(data)
LogParser.register_parser("my_format", analyze_my_format)
df = LogParser.parse(iter(lines), "my_format")Log-analyser/
βββ app.py Streamlit entry point
βββ src/
β βββ parsers/ Log format parsers + factory
β βββ analytics/ Statistical and ML analysis
β βββ utils/ Regex patterns, enrichment, I/O
β βββ ui/tabs/ One file per Streamlit tab
βββ tests/ pytest test suite (~80 tests)
βββ docs/ Architecture, API, changelog, roadmap
βββ examples/ Sample log files + usage script
Data flows in one direction: parse β enrich β analyse β render.
No module calls back into app.py or the UI layer.
See docs/ARCHITECTURE.md for the full design document,
including a flow diagram, module responsibilities, and guidance on extending the project.
# Run all tests
pytest
# With coverage report
pytest --cov=src --cov-report=term-missing
# Single file
pytest tests/test_parsers.py -vTests cover parsers, analytics, enrichment, PII redaction, pattern correctness,
and the IPV4 octet-range validation. See tests/ for details.
| File Size | Lines | Parse Time | Peak Memory |
|---|---|---|---|
| 5 MB | ~50 000 | ~2 s | ~180 MB |
| 25 MB | ~250 000 | ~9 s | ~620 MB |
| 100 MB | ~1 000 000 | ~38 s | ~2.1 GB |
Tested on: Intel Core i5-12400, 12 GB RAM, NVMe SSD, Python 3.11. Parse time includes format detection, enrichment, and all analytics. For very large files, use the max-lines sidebar limit for initial exploration.
See docs/ROADMAP.md for the full roadmap with status labels.
| Document | Description |
|---|---|
| ARCHITECTURE.md | System design, data flow, module responsibilities |
| API.md | Full programmatic API reference |
| CONTRIBUTING.md | How to contribute |
| CHANGELOG.md | Version history |
| ROADMAP.md | Planned features |
- Bug Reports: GitHub Issues
- Feature Requests: GitHub Discussions
- Contact: manthandsoni@gmail.com
MIT License β see LICENSE for details.
Built with Streamlit Β· Pandas Β· scikit-learn Β· Altair
Made by Manthan D Soni Β· β Star on GitHub