feat: modular architecture refactoring with 95% test coverage by amargiovanni · Pull Request #2 · Oltrematica/github_analyzer

amargiovanni · 2025-11-28T12:42:39Z

Summary

Modular Architecture: Refactored monolithic github_analyzer.py (1000+ lines) into organized modules following clean architecture principles
- api/ - GitHub API client with retry logic and rate limiting
- analyzers/ - Commit, PR, Issue, Quality, and Productivity analyzers
- exporters/ - CSV export functionality
- cli/ - Command-line interface with colored output
- config/ - Settings and input validation
- core/ - Custom exception hierarchy
Comprehensive Test Suite: 308 tests with 95% coverage
- Unit tests for all modules
- Integration tests for analyzer flow
- Fixtures for API responses and sample data
CI/CD Pipeline: GitHub Actions workflow
- Tests on Python 3.9, 3.10, 3.11, 3.12
- Ruff linting
- Coverage reporting with Codecov
Documentation: Updated README with
- Screenshots and badges
- Comprehensive contributing guidelines
- Conventional Commits format
- Testing requirements (≥95% coverage)
Code Quality
- Type hints throughout
- Secure token handling (never logged/exposed)
- Input validation against injection attacks
- Zero required dependencies (optional requests)

Test plan

All 308 tests pass locally
Ruff linting passes
Coverage at 95%
CI workflow runs on push
Backward compatibility via github_analyzer.py entry point

- Add feature spec for 001-modular-refactor with 4 user stories - Add implementation plan with modular architecture design - Add research.md with technical decisions - Add data-model.md with entity definitions - Add module interface contracts - Add quickstart.md for post-refactor usage - Add tasks.md with 93 implementation tasks - Update constitution to include core/ module (v1.0.1) - Add CLAUDE.md agent context file

Complete implementation of the modular refactoring (spec 001): ## Architecture - src/github_analyzer/api/ - GitHub API client with requests/urllib fallback - src/github_analyzer/analyzers/ - Commit, PR, Issue, Quality, Productivity analyzers - src/github_analyzer/exporters/ - CSV export functionality - src/github_analyzer/cli/ - Command-line interface with argparse - src/github_analyzer/config/ - Configuration and input validation - src/github_analyzer/core/ - Custom exception hierarchy ## Features - Secure token handling via GITHUB_TOKEN env var (never exposed in logs) - Input validation with injection protection - CLI flags: --days, --output, --repos, --quiet, --full - Optimized PR fetching (--full for detailed data) - 68 tests passing with pytest - Ruff linter clean ## Files - 93 tasks completed per tasks.md - Backward compatible: python github_analyzer.py works identically - Updated README with full documentation

When CLI flags are not provided, the tool now prompts interactively: - Days to analyze (default: 30) - Quiet mode (default: no) - Full PR details (default: no) - Confirmation before starting analysis CLI flags still work to skip prompts: python3 github_analyzer.py -d 7 --quiet --full

- Fix offset-naive vs offset-aware datetime comparison error by using datetime.now(timezone.utc) instead of datetime.now() - Update banner: GITHUB in cyan, ANALYZER in orange - Add more color options (ORANGE, PINK, PURPLE) to Colors class

- screen1: Banner and interactive configuration - screen2: Analysis progress with repository details - screen3: Final summary with generated files

- Run tests on Python 3.9, 3.10, 3.11, 3.12 - Run ruff linter - Run coverage report - Test stdlib-only mode (without requests) - Add badges to README

Add unit tests for core modules: - analyzers: commits, issues, pull_requests, productivity - api: client, models - cli: output formatting - core: exceptions - exporters: csv_exporter Coverage improved from 54% to 80% with 236 passing tests.

- Add test_main.py for CLI main module (GitHubAnalyzer, parse_args, prompts) - Add test_quality.py for quality metrics calculation - Extend test_client.py with urllib error handling tests - Extend test_settings.py with _get_bool_env, _get_int_env, validate, to_dict - Add coverage and test count badges to README - Configure ruff per-file-ignores for test patterns (ARG001, ARG002, SIM117)

- Getting started: fork, clone, branch, dev dependencies - Development workflow: linter, tests, coverage (≥95%) - Code style: Python 3.9+, type hints, PEP 8, docstrings - Commit format: Conventional Commits with types and examples - PR guidelines: tests, coverage, documentation, focus - Testing requirements: unit tests, fixtures, mocking

gemini-code-assist · 2025-11-28T12:44:15Z

Summary of Changes

Hello @amargiovanni, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request represents a significant architectural overhaul of the GitHub Analyzer tool, transforming it from a single, large script into a well-organized, modular application. The primary goal was to enhance maintainability, testability, and security by separating concerns into distinct modules, implementing rigorous input validation, and establishing a robust testing framework. This refactoring ensures the tool's long-term stability and facilitates future development while maintaining backward compatibility for existing users.

Highlights

Modular Architecture: The monolithic github_analyzer.py script has been refactored into a clean, modular structure with dedicated api/, analyzers/, exporters/, cli/, config/, and core/ modules, enhancing maintainability and separation of concerns.
Comprehensive Test Suite: A robust test suite has been introduced, comprising 308 tests and achieving 95% code coverage, including both unit and integration tests to ensure reliability and prevent regressions.
Secure Token Handling: GitHub tokens are now securely managed via environment variables, eliminating interactive prompts and preventing sensitive credentials from being exposed in logs or command history.
Robust Input Validation: Strict input validation has been implemented for repository names and URLs, safeguarding against injection attacks and providing immediate, clear feedback on malformed inputs.
Updated Documentation & CI/CD: The README.md has been extensively updated with new features, quick start guides, and contributing guidelines. A CI/CD pipeline (GitHub Actions) is configured for automated testing and linting across multiple Python versions.

Ignored Files

Ignored by pattern: .github/workflows/** (1)
- .github/workflows/tests.yml

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

amargiovanni · 2025-11-28T12:47:46Z

/gemini review

gemini-code-assist

Code Review

This is an excellent and massive refactoring of the project. The monolithic script has been broken down into a clean, modular, and testable architecture, following modern Python best practices. The introduction of pyproject.toml, a comprehensive test suite with 95% coverage, and detailed documentation in README.md and the specs directory are all major improvements that significantly increase the project's quality, maintainability, and security. The new architecture with clear separation of concerns in api, analyzers, config, core, and exporters is very well done.

I have a few minor suggestions for improvement. I found a broken placeholder in an auto-generated documentation file. I also identified a potential logic issue in the pull request analyzer where filtering by created_at while sorting by updated_at can be inefficient and might not capture all relevant recent activity. Finally, there's a small piece of dead code in the API client's error handling. These are minor points in an otherwise outstanding pull request.

gemini-code-assist · 2025-11-28T12:48:13Z

CLAUDE.md

+
+## Commands
+
+cd src [ONLY COMMANDS FOR ACTIVE TECHNOLOGIES][ONLY COMMANDS FOR ACTIVE TECHNOLOGIES] pytest [ONLY COMMANDS FOR ACTIVE TECHNOLOGIES][ONLY COMMANDS FOR ACTIVE TECHNOLOGIES] ruff check .


The Commands section contains unresolved placeholders [ONLY COMMANDS FOR ACTIVE TECHNOLOGIES], which makes the example command invalid and confusing. This should be corrected to show a valid sequence of commands.

Suggested change

cd src [ONLY COMMANDS FOR ACTIVE TECHNOLOGIES][ONLY COMMANDS FOR ACTIVE TECHNOLOGIES] pytest [ONLY COMMANDS FOR ACTIVE TECHNOLOGIES][ONLY COMMANDS FOR ACTIVE TECHNOLOGIES] ruff check .

cd src; pytest; ruff check .

gemini-code-assist · 2025-11-28T12:48:13Z

src/github_analyzer/analyzers/pull_requests.py

+            created_at = raw.get("created_at", "")
+            if created_at:
+                try:
+                    created = datetime.fromisoformat(created_at.replace("Z", "+00:00"))
+                    if created < since:
+                        continue
+                except ValueError:
+                    pass
+


The current logic filters pull requests by created_at, but the API query sorts them by updated_at. This is inefficient and may lead to confusing results, as recently updated older PRs will be fetched but then discarded. To analyze all PRs with recent activity, it's more consistent to filter by updated_at. Since the results are sorted by updated_at descending, you can also break the loop once a PR older than the analysis window is found, improving performance.

Suggested change

created_at = raw.get("created_at", "")

if created_at:

try:

created = datetime.fromisoformat(created_at.replace("Z", "+00:00"))

if created < since:

continue

except ValueError:

pass

# Check if PR was updated within our timeframe

updated_at = raw.get("updated_at", "")

if updated_at:

try:

updated = datetime.fromisoformat(updated_at.replace("Z", "+00:00"))

if updated < since:

# Since results are sorted by `updated` descending,

# we can stop processing further PRs.

break

except ValueError:

pass

gemini-code-assist · 2025-11-28T12:48:13Z

src/github_analyzer/api/client.py

+        except TimeoutError as e:
+            raise APIError(
+                "Request timed out",
+                details=f"Timeout after {self._config.timeout}s",
+            ) from e


The except TimeoutError block is unreachable because urllib.request.urlopen raises a URLError (which may wrap a socket.timeout), not the built-in TimeoutError. The existing URLError handler already catches timeouts, making this block dead code. Removing it will improve code clarity and correctness.

gemini-code-assist

Code Review

This is an outstanding pull request that accomplishes a major and much-needed refactoring of the project. The transition from a monolithic script to a clean, modular architecture is executed beautifully. The new structure is logical, testable, and follows modern Python best practices. I'm particularly impressed by the thorough planning documents, the comprehensive test suite with high coverage, and the focus on security with proper token handling and input validation. The updated README.md is excellent and makes the tool much more accessible to new users and contributors.

I found only a couple of minor issues to address: a malformed command in a generated documentation file and a small discrepancy in the test coverage configuration. Overall, this is a high-quality contribution that significantly improves the maintainability and professionalism of the project. Great work!

gemini-code-assist · 2025-11-28T12:49:49Z

CLAUDE.md

+
+## Commands
+
+cd src [ONLY COMMANDS FOR ACTIVE TECHNOLOGIES][ONLY COMMANDS FOR ACTIVE TECHNOLOGIES] pytest [ONLY COMMANDS FOR ACTIVE TECHNOLOGIES][ONLY COMMANDS FOR ACTIVE TECHNOLOGIES] ruff check .


This command appears to be malformed. The [ONLY COMMANDS FOR ACTIVE TECHNOLOGIES] placeholders look like they were not correctly replaced during file generation, resulting in an invalid command. A corrected version might look like the suggestion below.

Suggested change

cd src [ONLY COMMANDS FOR ACTIVE TECHNOLOGIES][ONLY COMMANDS FOR ACTIVE TECHNOLOGIES] pytest [ONLY COMMANDS FOR ACTIVE TECHNOLOGIES][ONLY COMMANDS FOR ACTIVE TECHNOLOGIES] ruff check .

cd src && pytest && ruff check .

gemini-code-assist · 2025-11-28T12:49:50Z

pytest.ini

+    raise NotImplementedError
+    if TYPE_CHECKING:
+    if __name__ == .__main__.:
+fail_under = 80


There's a discrepancy between the coverage threshold here and the project's stated goals. The README.md and PR description mention a goal of ≥95% test coverage, but fail_under is set to 80. To ensure CI enforces the project's quality standard, it would be best to align this value with the documented goal.

fail_under = 95

Update pytest.ini fail_under from 80% to 95% to match the project's stated coverage requirements in README.md and contributing guidelines.

Replace placeholder text [ONLY COMMANDS FOR ACTIVE TECHNOLOGIES] with actual project commands for testing, linting, and running.

urllib.request.urlopen raises URLError wrapping socket.timeout, not the built-in TimeoutError. The previous except TimeoutError block was unreachable dead code. Now correctly checks if URLError.reason is a TimeoutError instance to provide specific timeout error messages.

The API sorts PRs by updated_at desc, but the code was filtering by created_at. This was inconsistent and could miss recently updated old PRs. Now filters by updated_at and breaks early when an old PR is found, improving both correctness and performance.

Copilot

Pull request overview

This PR refactors a monolithic 1000+ line script into a clean, modular architecture with comprehensive test coverage. The refactoring introduces:

Organized module structure following clean architecture principles (api, analyzers, exporters, cli, config, core)
308 unit and integration tests achieving 95% coverage
Type hints throughout for better code maintainability
Secure token handling with masking utilities
Input validation against injection attacks

Key Changes:

Modular architecture with separate concerns (API client, analyzers, exporters, CLI)
Comprehensive test suite with fixtures for API responses and sample data
Security improvements: token masking, input validation, no token exposure in logs/errors

Reviewed changes

Copilot reviewed 64 out of 69 changed files in this pull request and generated 12 comments.

Show a summary per file

File	Description
tests/conftest.py	Shared pytest fixtures for API responses, configurations, and test utilities
tests/unit/*	Unit tests for all modules with 95% coverage target
tests/integration/*	Integration tests verifying full workflow without network calls
tests/fixtures/*	Sample data and API response fixtures for testing
src/github_analyzer/core/*	Custom exception hierarchy with exit codes and token masking
src/github_analyzer/config/*	Configuration loading from environment with validation
src/github_analyzer/api/*	GitHub API client with pagination, rate limiting, and data models
src/github_analyzer/analyzers/*	Commit, PR, issue, quality, and productivity analyzers
src/github_analyzer/exporters/*	CSV export functionality for all analysis results
src/github_analyzer/cli/*	Command-line interface with colored output and user prompts

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2025-11-28T13:00:13Z