feat: modular architecture refactoring with 95% test coverage#2
feat: modular architecture refactoring with 95% test coverage#2amargiovanni merged 14 commits intomainfrom
Conversation
- Add feature spec for 001-modular-refactor with 4 user stories - Add implementation plan with modular architecture design - Add research.md with technical decisions - Add data-model.md with entity definitions - Add module interface contracts - Add quickstart.md for post-refactor usage - Add tasks.md with 93 implementation tasks - Update constitution to include core/ module (v1.0.1) - Add CLAUDE.md agent context file
Complete implementation of the modular refactoring (spec 001): ## Architecture - src/github_analyzer/api/ - GitHub API client with requests/urllib fallback - src/github_analyzer/analyzers/ - Commit, PR, Issue, Quality, Productivity analyzers - src/github_analyzer/exporters/ - CSV export functionality - src/github_analyzer/cli/ - Command-line interface with argparse - src/github_analyzer/config/ - Configuration and input validation - src/github_analyzer/core/ - Custom exception hierarchy ## Features - Secure token handling via GITHUB_TOKEN env var (never exposed in logs) - Input validation with injection protection - CLI flags: --days, --output, --repos, --quiet, --full - Optimized PR fetching (--full for detailed data) - 68 tests passing with pytest - Ruff linter clean ## Files - 93 tasks completed per tasks.md - Backward compatible: python github_analyzer.py works identically - Updated README with full documentation
When CLI flags are not provided, the tool now prompts interactively: - Days to analyze (default: 30) - Quiet mode (default: no) - Full PR details (default: no) - Confirmation before starting analysis CLI flags still work to skip prompts: python3 github_analyzer.py -d 7 --quiet --full
- Fix offset-naive vs offset-aware datetime comparison error by using datetime.now(timezone.utc) instead of datetime.now() - Update banner: GITHUB in cyan, ANALYZER in orange - Add more color options (ORANGE, PINK, PURPLE) to Colors class
- screen1: Banner and interactive configuration - screen2: Analysis progress with repository details - screen3: Final summary with generated files
- Run tests on Python 3.9, 3.10, 3.11, 3.12 - Run ruff linter - Run coverage report - Test stdlib-only mode (without requests) - Add badges to README
Add unit tests for core modules: - analyzers: commits, issues, pull_requests, productivity - api: client, models - cli: output formatting - core: exceptions - exporters: csv_exporter Coverage improved from 54% to 80% with 236 passing tests.
- Add test_main.py for CLI main module (GitHubAnalyzer, parse_args, prompts) - Add test_quality.py for quality metrics calculation - Extend test_client.py with urllib error handling tests - Extend test_settings.py with _get_bool_env, _get_int_env, validate, to_dict - Add coverage and test count badges to README - Configure ruff per-file-ignores for test patterns (ARG001, ARG002, SIM117)
- Getting started: fork, clone, branch, dev dependencies - Development workflow: linter, tests, coverage (≥95%) - Code style: Python 3.9+, type hints, PEP 8, docstrings - Commit format: Conventional Commits with types and examples - PR guidelines: tests, coverage, documentation, focus - Testing requirements: unit tests, fixtures, mocking
Summary of ChangesHello @amargiovanni, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request represents a significant architectural overhaul of the GitHub Analyzer tool, transforming it from a single, large script into a well-organized, modular application. The primary goal was to enhance maintainability, testability, and security by separating concerns into distinct modules, implementing rigorous input validation, and establishing a robust testing framework. This refactoring ensures the tool's long-term stability and facilitates future development while maintaining backward compatibility for existing users. Highlights
Ignored Files
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
|
/gemini review |
There was a problem hiding this comment.
Code Review
This is an excellent and massive refactoring of the project. The monolithic script has been broken down into a clean, modular, and testable architecture, following modern Python best practices. The introduction of pyproject.toml, a comprehensive test suite with 95% coverage, and detailed documentation in README.md and the specs directory are all major improvements that significantly increase the project's quality, maintainability, and security. The new architecture with clear separation of concerns in api, analyzers, config, core, and exporters is very well done.
I have a few minor suggestions for improvement. I found a broken placeholder in an auto-generated documentation file. I also identified a potential logic issue in the pull request analyzer where filtering by created_at while sorting by updated_at can be inefficient and might not capture all relevant recent activity. Finally, there's a small piece of dead code in the API client's error handling. These are minor points in an otherwise outstanding pull request.
CLAUDE.md
Outdated
|
|
||
| ## Commands | ||
|
|
||
| cd src [ONLY COMMANDS FOR ACTIVE TECHNOLOGIES][ONLY COMMANDS FOR ACTIVE TECHNOLOGIES] pytest [ONLY COMMANDS FOR ACTIVE TECHNOLOGIES][ONLY COMMANDS FOR ACTIVE TECHNOLOGIES] ruff check . |
There was a problem hiding this comment.
The Commands section contains unresolved placeholders [ONLY COMMANDS FOR ACTIVE TECHNOLOGIES], which makes the example command invalid and confusing. This should be corrected to show a valid sequence of commands.
| cd src [ONLY COMMANDS FOR ACTIVE TECHNOLOGIES][ONLY COMMANDS FOR ACTIVE TECHNOLOGIES] pytest [ONLY COMMANDS FOR ACTIVE TECHNOLOGIES][ONLY COMMANDS FOR ACTIVE TECHNOLOGIES] ruff check . | |
| cd src; pytest; ruff check . |
| created_at = raw.get("created_at", "") | ||
| if created_at: | ||
| try: | ||
| created = datetime.fromisoformat(created_at.replace("Z", "+00:00")) | ||
| if created < since: | ||
| continue | ||
| except ValueError: | ||
| pass | ||
|
|
There was a problem hiding this comment.
The current logic filters pull requests by created_at, but the API query sorts them by updated_at. This is inefficient and may lead to confusing results, as recently updated older PRs will be fetched but then discarded. To analyze all PRs with recent activity, it's more consistent to filter by updated_at. Since the results are sorted by updated_at descending, you can also break the loop once a PR older than the analysis window is found, improving performance.
| created_at = raw.get("created_at", "") | |
| if created_at: | |
| try: | |
| created = datetime.fromisoformat(created_at.replace("Z", "+00:00")) | |
| if created < since: | |
| continue | |
| except ValueError: | |
| pass | |
| # Check if PR was updated within our timeframe | |
| updated_at = raw.get("updated_at", "") | |
| if updated_at: | |
| try: | |
| updated = datetime.fromisoformat(updated_at.replace("Z", "+00:00")) | |
| if updated < since: | |
| # Since results are sorted by `updated` descending, | |
| # we can stop processing further PRs. | |
| break | |
| except ValueError: | |
| pass |
src/github_analyzer/api/client.py
Outdated
| except TimeoutError as e: | ||
| raise APIError( | ||
| "Request timed out", | ||
| details=f"Timeout after {self._config.timeout}s", | ||
| ) from e |
There was a problem hiding this comment.
There was a problem hiding this comment.
Code Review
This is an outstanding pull request that accomplishes a major and much-needed refactoring of the project. The transition from a monolithic script to a clean, modular architecture is executed beautifully. The new structure is logical, testable, and follows modern Python best practices. I'm particularly impressed by the thorough planning documents, the comprehensive test suite with high coverage, and the focus on security with proper token handling and input validation. The updated README.md is excellent and makes the tool much more accessible to new users and contributors.
I found only a couple of minor issues to address: a malformed command in a generated documentation file and a small discrepancy in the test coverage configuration. Overall, this is a high-quality contribution that significantly improves the maintainability and professionalism of the project. Great work!
CLAUDE.md
Outdated
|
|
||
| ## Commands | ||
|
|
||
| cd src [ONLY COMMANDS FOR ACTIVE TECHNOLOGIES][ONLY COMMANDS FOR ACTIVE TECHNOLOGIES] pytest [ONLY COMMANDS FOR ACTIVE TECHNOLOGIES][ONLY COMMANDS FOR ACTIVE TECHNOLOGIES] ruff check . |
There was a problem hiding this comment.
This command appears to be malformed. The [ONLY COMMANDS FOR ACTIVE TECHNOLOGIES] placeholders look like they were not correctly replaced during file generation, resulting in an invalid command. A corrected version might look like the suggestion below.
| cd src [ONLY COMMANDS FOR ACTIVE TECHNOLOGIES][ONLY COMMANDS FOR ACTIVE TECHNOLOGIES] pytest [ONLY COMMANDS FOR ACTIVE TECHNOLOGIES][ONLY COMMANDS FOR ACTIVE TECHNOLOGIES] ruff check . | |
| cd src && pytest && ruff check . |
pytest.ini
Outdated
| raise NotImplementedError | ||
| if TYPE_CHECKING: | ||
| if __name__ == .__main__.: | ||
| fail_under = 80 |
There was a problem hiding this comment.
There's a discrepancy between the coverage threshold here and the project's stated goals. The README.md and PR description mention a goal of ≥95% test coverage, but fail_under is set to 80. To ensure CI enforces the project's quality standard, it would be best to align this value with the documented goal.
fail_under = 95
Update pytest.ini fail_under from 80% to 95% to match the project's stated coverage requirements in README.md and contributing guidelines.
Replace placeholder text [ONLY COMMANDS FOR ACTIVE TECHNOLOGIES] with actual project commands for testing, linting, and running.
urllib.request.urlopen raises URLError wrapping socket.timeout, not the built-in TimeoutError. The previous except TimeoutError block was unreachable dead code. Now correctly checks if URLError.reason is a TimeoutError instance to provide specific timeout error messages.
The API sorts PRs by updated_at desc, but the code was filtering by created_at. This was inconsistent and could miss recently updated old PRs. Now filters by updated_at and breaks early when an old PR is found, improving both correctness and performance.
There was a problem hiding this comment.
Pull request overview
This PR refactors a monolithic 1000+ line script into a clean, modular architecture with comprehensive test coverage. The refactoring introduces:
- Organized module structure following clean architecture principles (api, analyzers, exporters, cli, config, core)
- 308 unit and integration tests achieving 95% coverage
- Type hints throughout for better code maintainability
- Secure token handling with masking utilities
- Input validation against injection attacks
Key Changes:
- Modular architecture with separate concerns (API client, analyzers, exporters, CLI)
- Comprehensive test suite with fixtures for API responses and sample data
- Security improvements: token masking, input validation, no token exposure in logs/errors
Reviewed changes
Copilot reviewed 64 out of 69 changed files in this pull request and generated 12 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/conftest.py | Shared pytest fixtures for API responses, configurations, and test utilities |
| tests/unit/* | Unit tests for all modules with 95% coverage target |
| tests/integration/* | Integration tests verifying full workflow without network calls |
| tests/fixtures/* | Sample data and API response fixtures for testing |
| src/github_analyzer/core/* | Custom exception hierarchy with exit codes and token masking |
| src/github_analyzer/config/* | Configuration loading from environment with validation |
| src/github_analyzer/api/* | GitHub API client with pagination, rate limiting, and data models |
| src/github_analyzer/analyzers/* | Commit, PR, issue, quality, and productivity analyzers |
| src/github_analyzer/exporters/* | CSV export functionality for all analysis results |
| src/github_analyzer/cli/* | Command-line interface with colored output and user prompts |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| """Verify no circular import issues exist.""" | ||
| # Import in order of dependencies (leaf to root) | ||
| import src.github_analyzer.core.exceptions | ||
| import src.github_analyzer.config.validation |
There was a problem hiding this comment.
Module 'src.github_analyzer.config.validation' is imported with both 'import' and 'import from'.
| # Import in order of dependencies (leaf to root) | ||
| import src.github_analyzer.core.exceptions | ||
| import src.github_analyzer.config.validation | ||
| import src.github_analyzer.config.settings |
There was a problem hiding this comment.
Module 'src.github_analyzer.config.settings' is imported with both 'import' and 'import from'.
| import src.github_analyzer.core.exceptions | ||
| import src.github_analyzer.config.validation | ||
| import src.github_analyzer.config.settings | ||
| import src.github_analyzer.api.models |
There was a problem hiding this comment.
Module 'src.github_analyzer.api.models' is imported with both 'import' and 'import from'.
| import src.github_analyzer.config.validation | ||
| import src.github_analyzer.config.settings | ||
| import src.github_analyzer.api.models | ||
| import src.github_analyzer.api.client |
There was a problem hiding this comment.
Module 'src.github_analyzer.api.client' is imported with both 'import' and 'import from'.
| import src.github_analyzer.analyzers.commits | ||
| import src.github_analyzer.analyzers.pull_requests | ||
| import src.github_analyzer.analyzers.issues | ||
| import src.github_analyzer.analyzers.quality |
There was a problem hiding this comment.
Module 'src.github_analyzer.analyzers.quality' is imported with both 'import' and 'import from'.
| import src.github_analyzer.analyzers.productivity | ||
| import src.github_analyzer.exporters.csv_exporter | ||
| import src.github_analyzer.cli.output | ||
| import src.github_analyzer.cli.main |
There was a problem hiding this comment.
Module 'src.github_analyzer.cli.main' is imported with both 'import' and 'import from'.
|
|
||
| from __future__ import annotations | ||
|
|
||
| import json |
There was a problem hiding this comment.
Import of 'json' is not used.
| from __future__ import annotations | ||
|
|
||
| import json | ||
| from datetime import datetime, timedelta |
There was a problem hiding this comment.
Import of 'datetime' is not used.
Import of 'timedelta' is not used.
| import json | ||
| from datetime import datetime, timedelta | ||
| from pathlib import Path | ||
| from unittest.mock import MagicMock, patch |
There was a problem hiding this comment.
Import of 'patch' is not used.
| if updated < since: | ||
| # All remaining PRs will also be older, so stop processing | ||
| break | ||
| except ValueError: |
There was a problem hiding this comment.
'except' clause does nothing but pass and there is no explanatory comment.
| except ValueError: | |
| except ValueError: | |
| # Ignore PRs with invalid 'updated_at' format; skip processing this PR. |
Summary
Modular Architecture: Refactored monolithic
github_analyzer.py(1000+ lines) into organized modules following clean architecture principlesapi/- GitHub API client with retry logic and rate limitinganalyzers/- Commit, PR, Issue, Quality, and Productivity analyzersexporters/- CSV export functionalitycli/- Command-line interface with colored outputconfig/- Settings and input validationcore/- Custom exception hierarchyComprehensive Test Suite: 308 tests with 95% coverage
CI/CD Pipeline: GitHub Actions workflow
Documentation: Updated README with
Code Quality
requests)Test plan
github_analyzer.pyentry point