A comprehensive Python tool for analyzing temporal patterns in event datasets such as system logs, transaction records, and activity timelines. This tool provides advanced time-series analysis, anomaly detection, pattern recognition, and visualization capabilities.
- CSV files with flexible parsing options
- JSON files (arrays or nested objects)
- SQLite databases (extensible to other databases)
- Robust data validation and error handling
- Automatic parsing of various timestamp formats
- Multi-timezone support and normalization
- Intelligent handling of malformed timestamps
- Configurable target timezone conversion
- Configurable time intervals: minute, hour, day, week, month
- Peak and idle period identification
- Event frequency distributions over time
- Recurring pattern detection (hourly, daily, weekly, monthly patterns)
- Statistical analysis with comprehensive metrics
- Statistical anomaly detection using z-scores
- Configurable sensitivity thresholds
- Classification of anomalies (spikes vs drops)
- Visual highlighting of detected anomalies
- Filter by event type, user ID, category, or any column
- Time range filtering
- Chainable filter operations
- Memory-efficient filtering operations
- Time series line charts showing event counts over time
- Activity heatmaps (hour vs day of week)
- Event count histograms for distribution analysis
- Anomaly detection plots with highlighted outliers
- High-quality PNG exports (300 DPI)
- CSV export for processed data and analysis results
- JSON export for structured analysis results
- PNG export for all visualizations
- Comprehensive HTML reports with embedded analysis
- Correlation analysis between event attributes
- Burst detection for sudden activity surges
- Seasonal decomposition support
- Modular architecture for easy extension
- Python 3.8 or higher
- pip package manager
pip install -r requirements.txtFor seasonal decomposition features:
pip install statsmodelsfrom event_analyzer import EventAnalysisTool, AnalysisConfig
# Configure analysis parameters
config = AnalysisConfig(
time_interval='hour',
timezone='UTC',
anomaly_threshold=2.0
)
# Initialize the tool
tool = EventAnalysisTool(config)
# Load data from CSV
tool.load_data('your_events.csv')
# Preprocess data (specify timestamp column)
tool.preprocess_data('timestamp')
# Run comprehensive analysis
results = tool.run_analysis()
# Generate visualizations
tool.generate_visualizations('output/')
# Export all results
tool.export_results('output/')
# Print analysis summary
tool.print_summary()# Apply filters during preprocessing
filters = {
'column': {
'event_type': ['error', 'warning', 'critical'],
'user_id': ['user_001', 'user_002']
},
'time_range': {
'start_time': '2024-01-01 00:00:00',
'end_time': '2024-01-31 23:59:59'
}
}
tool.preprocess_data('timestamp', filters=filters)# From JSON file
tool.load_data('events.json', source_type='json')
# From SQLite database
tool.load_data('sqlite:///database.db',
source_type='database',
query='SELECT * FROM events WHERE date > "2024-01-01"')
# From CSV with custom parameters
tool.load_data('events.csv',
source_type='csv',
sep=';',
encoding='utf-8')| Parameter | Type | Default | Description |
|---|---|---|---|
time_interval |
str | 'hour' | Grouping interval: 'minute', 'hour', 'day', 'week', 'month' |
timezone |
str | 'UTC' | Target timezone for timestamp normalization |
anomaly_threshold |
float | 2.0 | Standard deviations for anomaly detection |
min_samples_for_anomaly |
int | 10 | Minimum data points required for anomaly detection |
pattern_detection_window |
int | 7 | Days window for pattern detection |
from event_analyzer import ConfigTemplates
# For system logs (high frequency)
config = ConfigTemplates.get_system_logs_config()
# For transaction data (medium frequency)
config = ConfigTemplates.get_transaction_config()
# For user activity (lower frequency)
config = ConfigTemplates.get_user_activity_config()Your CSV file should contain at least a timestamp column. Example:
timestamp,event_type,user_id,category
2024-01-01 10:00:00,login,user_001,authentication
2024-01-01 10:05:00,transaction,user_001,payment
2024-01-01 10:10:00,logout,user_001,authentication[
{
"timestamp": "2024-01-01T10:00:00Z",
"event_type": "login",
"user_id": "user_001",
"category": "authentication"
},
{
"timestamp": "2024-01-01T10:05:00Z",
"event_type": "transaction",
"user_id": "user_001",
"category": "payment"
}
]After running the analysis, you'll find the following files in your output directory:
output/
├── analysis_results.json # Complete analysis results
├── processed_data.csv # Cleaned and processed data
├── peak_periods.csv # Identified peak activity periods
├── idle_periods.csv # Identified low activity periods
├── time_series.png # Time series visualization
├── activity_heatmap.png # Weekly activity heatmap
├── event_histogram.png # Event count distribution
├── anomalies.png # Anomaly detection plot
└── analysis_report.html # Comprehensive HTML report
The tool provides comprehensive analysis including:
- Total event counts
- Mean events per time period
- Standard deviation and variance
- Min/max activity periods
- Hourly patterns: Peak hours and activity distribution
- Daily patterns: Weekday vs weekend activity
- Weekly patterns: Day-of-week variations
- Monthly patterns: Seasonal trends (when sufficient data)
- Statistical outliers using z-score analysis
- Classification of spikes vs drops
- Anomaly timestamps and severity scores
- Overall anomaly rate calculations
- Top N highest activity periods
- Lowest activity periods (excluding zero counts)
- Activity distribution statistics
from event_analyzer import AdvancedAnalyzer
analyzer = AdvancedAnalyzer()
correlations = analyzer.correlation_analysis(processed_data, ['event_type', 'category'])bursts = analyzer.burst_detection(processed_data, window_size=60)from event_analyzer import ReportGenerator
report_gen = ReportGenerator(tool)
report_gen.generate_html_report('comprehensive_report.html')- Monitor server performance and identify bottlenecks
- Detect unusual error patterns
- Analyze peak usage times for capacity planning
- Identify peak trading hours
- Detect fraudulent activity patterns
- Monitor payment system performance
- Understand user engagement patterns
- Optimize system maintenance windows
- Detect unusual user behavior
- Monitor login patterns and detect brute force attacks
- Analyze security incident timelines
- Identify recurring security threats
- Memory Usage: Tool efficiently handles large datasets using pandas
- Processing Speed: Optimized for datasets with millions of events
- Storage: Visualizations are saved as high-quality PNG files (300 DPI)
- Scalability: Modular design allows for distributed processing extensions
The tool includes comprehensive error handling for:
- Malformed timestamp data
- Missing columns
- Database connection issues
- File I/O problems
- Insufficient data for analysis
All errors are logged with detailed information for debugging.
The tool uses Python's built-in logging module. Configure logging level:
import logging
logging.basicConfig(level=logging.DEBUG) # For detailed logs
logging.basicConfig(level=logging.INFO) # For standard output
logging.basicConfig(level=logging.WARNING) # For warnings onlyTo extend the tool with custom analysis methods:
- Add new analyzers: Inherit from the base analyzer classes
- Custom visualizations: Extend the
Visualizerclass - New data sources: Add methods to
DataLoaderclass - Export formats: Extend
ResultExporterclass
Example custom analyzer:
class CustomAnalyzer(EventAnalyzer):
def custom_pattern_detection(self, df):
# Your custom analysis logic
return results"No module named 'statsmodels'"
- Install optional dependency:
pip install statsmodels
"Timestamp parsing errors"
- Ensure timestamp column contains valid datetime strings
- Check timezone format consistency
"Insufficient data for analysis"
- Verify dataset has enough records (minimum 10-20 time periods)
- Check that timestamp column is correctly specified
"Memory errors with large datasets"
- Process data in chunks or filter before analysis
- Consider using more specific time ranges
- Filter early: Apply filters during preprocessing to reduce data size
- Choose appropriate intervals: Use larger intervals (hour/day) for massive datasets
- Memory monitoring: Monitor memory usage for very large datasets
- Batch processing: Process multiple files separately for better resource management
This project is open source and available under the MIT License.
For issues, feature requests, or questions:
- Check the troubleshooting section above
- Review the example code and configuration options
- Create detailed issue reports with sample data and error messages
Version: 1.0.0
Python Version: 3.8+
Last Updated: 2024