Skip to content

Add Comprehensive Logging System#3

Open
TrivCodez wants to merge 4 commits intoruijindev0801:mainfrom
TrivCodez:add-comprehensive-logging
Open

Add Comprehensive Logging System#3
TrivCodez wants to merge 4 commits intoruijindev0801:mainfrom
TrivCodez:add-comprehensive-logging

Conversation

@TrivCodez
Copy link
Copy Markdown

Summary

This PR adds a comprehensive logging system to GitHub Scraper 2026, addressing critical production needs for debugging, monitoring, and error tracking.

Why This Change is Critical

Problem

The current application lacks any logging infrastructure, making it nearly impossible to:

  • Debug API failures and rate limiting issues
  • Track application behavior in production
  • Diagnose user-reported problems
  • Monitor export operations
  • Audit search and filtering decisions

Impact

Users experience:

  • Silent failures with unhelpful error messages
  • No way to diagnose why searches return fewer results than expected
  • Inability to troubleshoot Google Sheets/Apps Script integration issues
  • No record of operations for compliance or analysis

Changes Made

1. New Logging Module (github_scraper/logger.py)

  • Structured logging with multiple handlers (console, rotating files)
  • Sensitive data filtering - automatically masks tokens and API keys
  • Context-aware logging - includes operation details and user context
  • Log rotation - prevents disk space issues with automatic rotation
  • Multiple log levels - DEBUG, INFO, WARNING, ERROR, CRITICAL

2. Enhanced Scraper Module (github_scraper/scraper.py)

  • API request logging - tracks all GitHub API calls with status codes
  • Rate limit detection - logs detailed rate limit warnings
  • Retry tracking - logs each retry attempt with error details
  • Progress logging - tracks search phases and user processing
  • Error context - includes username, operation in error logs

3. Enhanced Exporter Module (github_scraper/exporter.py)

  • Contact extraction logging - tracks email/LinkedIn/Discord extraction
  • Filtering decisions - logs why users are skipped (gender, contact mode)
  • Export progress - tracks CSV/Google Sheets operations
  • Schema upgrades - logs CSV format migrations
  • API errors - detailed Apps Script and Google Sheets error logging

4. Enhanced Main Entry Point (main.py)

  • Startup/shutdown logging - application lifecycle tracking
  • Environment info - Python version, working directory
  • Crash detection - logs unhandled exceptions with full tracebacks

Key Features

Security

  • Automatic sensitive data masking - Redacts GitHub tokens and service account keys
  • No credentials in logs - Ensures logs are safe to share

Debuggability

  • Request/response tracking - See exactly what GitHub API returns
  • Operation context - Every log includes relevant context (username, operation, etc.)
  • Error breadcrumbs - Full traceback for unhandled exceptions

Production Readiness

  • Rotating log files - Up to 5 files, 10MB each + persistent log
  • Timestamped logs - Easy to find logs from specific runs
  • Performance impact - Minimal overhead, logs use extra parameter efficiently

Example Log Output

2026-04-26 00:47:23 | INFO | Starting GitHub user scraping | location=San Francisco | has_token=True
2026-04-26 00:47:23 | DEBUG | Built search query | query=location:"San Francisco"
2026-04-26 00:47:25 | INFO | Found 45 users, fetching details... | user_count=45
2026-04-26 00:47:25 | DEBUG | Fetching user details | username=johndoe
2026-04-26 00:47:26 | WARNING | No README found for user | username=johndoe
2026-04-26 00:47:28 | INFO | Export rows built | total_rows=38 | skipped_users=7 | original_users=45
2026-04-26 00:47:29 | INFO | CSV export completed | path=./results.csv | new_rows=38 | total_unique_users=156

Testing

  • Tested logging initialization
  • Verified log file creation and rotation
  • Confirmed sensitive data masking
  • Tested all log levels (DEBUG → CRITICAL)
  • Verified context information in logs
  • Tested error logging with tracebacks

Usage

Logs are automatically created in the logs/ directory:

logs/
├── github_scraper_20260426_004723.log  # Current run
├── github_scraper_20260425_153022.log  # Previous runs
└── github_scraper_persistent.log       # Accumulated logs

For verbose debugging, change log level to DEBUG in main.py:

logger = setup_logger(
    name="github_scraper",
    log_level="DEBUG",  # More verbose logging
    log_to_file=True,
    log_to_console=True
)

Backward Compatibility

Fully backward compatible - No breaking changes to existing functionality
Optional dependency - Logging uses only Python standard library
No UI changes - All logging is backend-only

Related Issues

This addresses the critical need for debugging and monitoring capabilities mentioned in the repository analysis. It enables proper production deployment and user support.

- Create centralized logging module with structured log format
- Support for console, file, and rotating log handlers
- Log levels: DEBUG, INFO, WARNING, ERROR, CRITICAL
- Include timestamp, log level, module, and contextual info
- Configure different log levels for different environments
- Add detailed logging for API requests, rate limiting, and errors
- Log search parameters, user counts, and progress
- Add context to errors with username and operation details
- Log retry attempts and response status codes
- Track API calls for debugging rate limit issues
- Add logging for contact extraction and filtering operations
- Log gender detection and filtering decisions
- Track export progress and destination details
- Log API errors and service account authentication
- Add context to export failures with row counts and settings
- Set up logging system when application starts
- Create logs directory and configure handlers
- Log application startup and version info
- Add logging configuration to main entry point
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant