Skip to content

feat: modular architecture refactoring with 95% test coverage#2

Merged
amargiovanni merged 14 commits intomainfrom
001-modular-refactor
Nov 28, 2025
Merged

feat: modular architecture refactoring with 95% test coverage#2
amargiovanni merged 14 commits intomainfrom
001-modular-refactor

Conversation

@amargiovanni
Copy link
Contributor

Summary

  • Modular Architecture: Refactored monolithic github_analyzer.py (1000+ lines) into organized modules following clean architecture principles

    • api/ - GitHub API client with retry logic and rate limiting
    • analyzers/ - Commit, PR, Issue, Quality, and Productivity analyzers
    • exporters/ - CSV export functionality
    • cli/ - Command-line interface with colored output
    • config/ - Settings and input validation
    • core/ - Custom exception hierarchy
  • Comprehensive Test Suite: 308 tests with 95% coverage

    • Unit tests for all modules
    • Integration tests for analyzer flow
    • Fixtures for API responses and sample data
  • CI/CD Pipeline: GitHub Actions workflow

    • Tests on Python 3.9, 3.10, 3.11, 3.12
    • Ruff linting
    • Coverage reporting with Codecov
  • Documentation: Updated README with

    • Screenshots and badges
    • Comprehensive contributing guidelines
    • Conventional Commits format
    • Testing requirements (≥95% coverage)
  • Code Quality

    • Type hints throughout
    • Secure token handling (never logged/exposed)
    • Input validation against injection attacks
    • Zero required dependencies (optional requests)

Test plan

  • All 308 tests pass locally
  • Ruff linting passes
  • Coverage at 95%
  • CI workflow runs on push
  • Backward compatibility via github_analyzer.py entry point

- Add feature spec for 001-modular-refactor with 4 user stories
- Add implementation plan with modular architecture design
- Add research.md with technical decisions
- Add data-model.md with entity definitions
- Add module interface contracts
- Add quickstart.md for post-refactor usage
- Add tasks.md with 93 implementation tasks
- Update constitution to include core/ module (v1.0.1)
- Add CLAUDE.md agent context file
Complete implementation of the modular refactoring (spec 001):

## Architecture
- src/github_analyzer/api/ - GitHub API client with requests/urllib fallback
- src/github_analyzer/analyzers/ - Commit, PR, Issue, Quality, Productivity analyzers
- src/github_analyzer/exporters/ - CSV export functionality
- src/github_analyzer/cli/ - Command-line interface with argparse
- src/github_analyzer/config/ - Configuration and input validation
- src/github_analyzer/core/ - Custom exception hierarchy

## Features
- Secure token handling via GITHUB_TOKEN env var (never exposed in logs)
- Input validation with injection protection
- CLI flags: --days, --output, --repos, --quiet, --full
- Optimized PR fetching (--full for detailed data)
- 68 tests passing with pytest
- Ruff linter clean

## Files
- 93 tasks completed per tasks.md
- Backward compatible: python github_analyzer.py works identically
- Updated README with full documentation
When CLI flags are not provided, the tool now prompts interactively:
- Days to analyze (default: 30)
- Quiet mode (default: no)
- Full PR details (default: no)
- Confirmation before starting analysis

CLI flags still work to skip prompts:
  python3 github_analyzer.py -d 7 --quiet --full
- Fix offset-naive vs offset-aware datetime comparison error
  by using datetime.now(timezone.utc) instead of datetime.now()
- Update banner: GITHUB in cyan, ANALYZER in orange
- Add more color options (ORANGE, PINK, PURPLE) to Colors class
- screen1: Banner and interactive configuration
- screen2: Analysis progress with repository details
- screen3: Final summary with generated files
- Run tests on Python 3.9, 3.10, 3.11, 3.12
- Run ruff linter
- Run coverage report
- Test stdlib-only mode (without requests)
- Add badges to README
Add unit tests for core modules:
- analyzers: commits, issues, pull_requests, productivity
- api: client, models
- cli: output formatting
- core: exceptions
- exporters: csv_exporter

Coverage improved from 54% to 80% with 236 passing tests.
- Add test_main.py for CLI main module (GitHubAnalyzer, parse_args, prompts)
- Add test_quality.py for quality metrics calculation
- Extend test_client.py with urllib error handling tests
- Extend test_settings.py with _get_bool_env, _get_int_env, validate, to_dict
- Add coverage and test count badges to README
- Configure ruff per-file-ignores for test patterns (ARG001, ARG002, SIM117)
- Getting started: fork, clone, branch, dev dependencies
- Development workflow: linter, tests, coverage (≥95%)
- Code style: Python 3.9+, type hints, PEP 8, docstrings
- Commit format: Conventional Commits with types and examples
- PR guidelines: tests, coverage, documentation, focus
- Testing requirements: unit tests, fixtures, mocking
@gemini-code-assist
Copy link

Summary of Changes

Hello @amargiovanni, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request represents a significant architectural overhaul of the GitHub Analyzer tool, transforming it from a single, large script into a well-organized, modular application. The primary goal was to enhance maintainability, testability, and security by separating concerns into distinct modules, implementing rigorous input validation, and establishing a robust testing framework. This refactoring ensures the tool's long-term stability and facilitates future development while maintaining backward compatibility for existing users.

Highlights

  • Modular Architecture: The monolithic github_analyzer.py script has been refactored into a clean, modular structure with dedicated api/, analyzers/, exporters/, cli/, config/, and core/ modules, enhancing maintainability and separation of concerns.
  • Comprehensive Test Suite: A robust test suite has been introduced, comprising 308 tests and achieving 95% code coverage, including both unit and integration tests to ensure reliability and prevent regressions.
  • Secure Token Handling: GitHub tokens are now securely managed via environment variables, eliminating interactive prompts and preventing sensitive credentials from being exposed in logs or command history.
  • Robust Input Validation: Strict input validation has been implemented for repository names and URLs, safeguarding against injection attacks and providing immediate, clear feedback on malformed inputs.
  • Updated Documentation & CI/CD: The README.md has been extensively updated with new features, quick start guides, and contributing guidelines. A CI/CD pipeline (GitHub Actions) is configured for automated testing and linting across multiple Python versions.
Ignored Files
  • Ignored by pattern: .github/workflows/** (1)
    • .github/workflows/tests.yml
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@amargiovanni
Copy link
Contributor Author

/gemini review

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This is an excellent and massive refactoring of the project. The monolithic script has been broken down into a clean, modular, and testable architecture, following modern Python best practices. The introduction of pyproject.toml, a comprehensive test suite with 95% coverage, and detailed documentation in README.md and the specs directory are all major improvements that significantly increase the project's quality, maintainability, and security. The new architecture with clear separation of concerns in api, analyzers, config, core, and exporters is very well done.

I have a few minor suggestions for improvement. I found a broken placeholder in an auto-generated documentation file. I also identified a potential logic issue in the pull request analyzer where filtering by created_at while sorting by updated_at can be inefficient and might not capture all relevant recent activity. Finally, there's a small piece of dead code in the API client's error handling. These are minor points in an otherwise outstanding pull request.

CLAUDE.md Outdated

## Commands

cd src [ONLY COMMANDS FOR ACTIVE TECHNOLOGIES][ONLY COMMANDS FOR ACTIVE TECHNOLOGIES] pytest [ONLY COMMANDS FOR ACTIVE TECHNOLOGIES][ONLY COMMANDS FOR ACTIVE TECHNOLOGIES] ruff check .

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The Commands section contains unresolved placeholders [ONLY COMMANDS FOR ACTIVE TECHNOLOGIES], which makes the example command invalid and confusing. This should be corrected to show a valid sequence of commands.

Suggested change
cd src [ONLY COMMANDS FOR ACTIVE TECHNOLOGIES][ONLY COMMANDS FOR ACTIVE TECHNOLOGIES] pytest [ONLY COMMANDS FOR ACTIVE TECHNOLOGIES][ONLY COMMANDS FOR ACTIVE TECHNOLOGIES] ruff check .
cd src; pytest; ruff check .

Comment on lines 63 to 71
created_at = raw.get("created_at", "")
if created_at:
try:
created = datetime.fromisoformat(created_at.replace("Z", "+00:00"))
if created < since:
continue
except ValueError:
pass

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The current logic filters pull requests by created_at, but the API query sorts them by updated_at. This is inefficient and may lead to confusing results, as recently updated older PRs will be fetched but then discarded. To analyze all PRs with recent activity, it's more consistent to filter by updated_at. Since the results are sorted by updated_at descending, you can also break the loop once a PR older than the analysis window is found, improving performance.

Suggested change
created_at = raw.get("created_at", "")
if created_at:
try:
created = datetime.fromisoformat(created_at.replace("Z", "+00:00"))
if created < since:
continue
except ValueError:
pass
# Check if PR was updated within our timeframe
updated_at = raw.get("updated_at", "")
if updated_at:
try:
updated = datetime.fromisoformat(updated_at.replace("Z", "+00:00"))
if updated < since:
# Since results are sorted by `updated` descending,
# we can stop processing further PRs.
break
except ValueError:
pass

Comment on lines 225 to 229
except TimeoutError as e:
raise APIError(
"Request timed out",
details=f"Timeout after {self._config.timeout}s",
) from e

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The except TimeoutError block is unreachable because urllib.request.urlopen raises a URLError (which may wrap a socket.timeout), not the built-in TimeoutError. The existing URLError handler already catches timeouts, making this block dead code. Removing it will improve code clarity and correctness.

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This is an outstanding pull request that accomplishes a major and much-needed refactoring of the project. The transition from a monolithic script to a clean, modular architecture is executed beautifully. The new structure is logical, testable, and follows modern Python best practices. I'm particularly impressed by the thorough planning documents, the comprehensive test suite with high coverage, and the focus on security with proper token handling and input validation. The updated README.md is excellent and makes the tool much more accessible to new users and contributors.

I found only a couple of minor issues to address: a malformed command in a generated documentation file and a small discrepancy in the test coverage configuration. Overall, this is a high-quality contribution that significantly improves the maintainability and professionalism of the project. Great work!

CLAUDE.md Outdated

## Commands

cd src [ONLY COMMANDS FOR ACTIVE TECHNOLOGIES][ONLY COMMANDS FOR ACTIVE TECHNOLOGIES] pytest [ONLY COMMANDS FOR ACTIVE TECHNOLOGIES][ONLY COMMANDS FOR ACTIVE TECHNOLOGIES] ruff check .

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This command appears to be malformed. The [ONLY COMMANDS FOR ACTIVE TECHNOLOGIES] placeholders look like they were not correctly replaced during file generation, resulting in an invalid command. A corrected version might look like the suggestion below.

Suggested change
cd src [ONLY COMMANDS FOR ACTIVE TECHNOLOGIES][ONLY COMMANDS FOR ACTIVE TECHNOLOGIES] pytest [ONLY COMMANDS FOR ACTIVE TECHNOLOGIES][ONLY COMMANDS FOR ACTIVE TECHNOLOGIES] ruff check .
cd src && pytest && ruff check .

pytest.ini Outdated
raise NotImplementedError
if TYPE_CHECKING:
if __name__ == .__main__.:
fail_under = 80

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

There's a discrepancy between the coverage threshold here and the project's stated goals. The README.md and PR description mention a goal of ≥95% test coverage, but fail_under is set to 80. To ensure CI enforces the project's quality standard, it would be best to align this value with the documented goal.

fail_under = 95

Update pytest.ini fail_under from 80% to 95% to match
the project's stated coverage requirements in README.md
and contributing guidelines.
Replace placeholder text [ONLY COMMANDS FOR ACTIVE TECHNOLOGIES]
with actual project commands for testing, linting, and running.
urllib.request.urlopen raises URLError wrapping socket.timeout,
not the built-in TimeoutError. The previous except TimeoutError
block was unreachable dead code.

Now correctly checks if URLError.reason is a TimeoutError instance
to provide specific timeout error messages.
The API sorts PRs by updated_at desc, but the code was filtering
by created_at. This was inconsistent and could miss recently
updated old PRs.

Now filters by updated_at and breaks early when an old PR is found,
improving both correctness and performance.
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR refactors a monolithic 1000+ line script into a clean, modular architecture with comprehensive test coverage. The refactoring introduces:

  • Organized module structure following clean architecture principles (api, analyzers, exporters, cli, config, core)
  • 308 unit and integration tests achieving 95% coverage
  • Type hints throughout for better code maintainability
  • Secure token handling with masking utilities
  • Input validation against injection attacks

Key Changes:

  • Modular architecture with separate concerns (API client, analyzers, exporters, CLI)
  • Comprehensive test suite with fixtures for API responses and sample data
  • Security improvements: token masking, input validation, no token exposure in logs/errors

Reviewed changes

Copilot reviewed 64 out of 69 changed files in this pull request and generated 12 comments.

Show a summary per file
File Description
tests/conftest.py Shared pytest fixtures for API responses, configurations, and test utilities
tests/unit/* Unit tests for all modules with 95% coverage target
tests/integration/* Integration tests verifying full workflow without network calls
tests/fixtures/* Sample data and API response fixtures for testing
src/github_analyzer/core/* Custom exception hierarchy with exit codes and token masking
src/github_analyzer/config/* Configuration loading from environment with validation
src/github_analyzer/api/* GitHub API client with pagination, rate limiting, and data models
src/github_analyzer/analyzers/* Commit, PR, issue, quality, and productivity analyzers
src/github_analyzer/exporters/* CSV export functionality for all analysis results
src/github_analyzer/cli/* Command-line interface with colored output and user prompts

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

"""Verify no circular import issues exist."""
# Import in order of dependencies (leaf to root)
import src.github_analyzer.core.exceptions
import src.github_analyzer.config.validation
Copy link

Copilot AI Nov 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Module 'src.github_analyzer.config.validation' is imported with both 'import' and 'import from'.

Copilot uses AI. Check for mistakes.
# Import in order of dependencies (leaf to root)
import src.github_analyzer.core.exceptions
import src.github_analyzer.config.validation
import src.github_analyzer.config.settings
Copy link

Copilot AI Nov 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Module 'src.github_analyzer.config.settings' is imported with both 'import' and 'import from'.

Copilot uses AI. Check for mistakes.
import src.github_analyzer.core.exceptions
import src.github_analyzer.config.validation
import src.github_analyzer.config.settings
import src.github_analyzer.api.models
Copy link

Copilot AI Nov 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Module 'src.github_analyzer.api.models' is imported with both 'import' and 'import from'.

Copilot uses AI. Check for mistakes.
import src.github_analyzer.config.validation
import src.github_analyzer.config.settings
import src.github_analyzer.api.models
import src.github_analyzer.api.client
Copy link

Copilot AI Nov 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Module 'src.github_analyzer.api.client' is imported with both 'import' and 'import from'.

Copilot uses AI. Check for mistakes.
import src.github_analyzer.analyzers.commits
import src.github_analyzer.analyzers.pull_requests
import src.github_analyzer.analyzers.issues
import src.github_analyzer.analyzers.quality
Copy link

Copilot AI Nov 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Module 'src.github_analyzer.analyzers.quality' is imported with both 'import' and 'import from'.

Copilot uses AI. Check for mistakes.
import src.github_analyzer.analyzers.productivity
import src.github_analyzer.exporters.csv_exporter
import src.github_analyzer.cli.output
import src.github_analyzer.cli.main
Copy link

Copilot AI Nov 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Module 'src.github_analyzer.cli.main' is imported with both 'import' and 'import from'.

Copilot uses AI. Check for mistakes.

from __future__ import annotations

import json
Copy link

Copilot AI Nov 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Import of 'json' is not used.

Copilot uses AI. Check for mistakes.
from __future__ import annotations

import json
from datetime import datetime, timedelta
Copy link

Copilot AI Nov 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Import of 'datetime' is not used.
Import of 'timedelta' is not used.

Copilot uses AI. Check for mistakes.
import json
from datetime import datetime, timedelta
from pathlib import Path
from unittest.mock import MagicMock, patch
Copy link

Copilot AI Nov 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Import of 'patch' is not used.

Copilot uses AI. Check for mistakes.
if updated < since:
# All remaining PRs will also be older, so stop processing
break
except ValueError:
Copy link

Copilot AI Nov 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

'except' clause does nothing but pass and there is no explanatory comment.

Suggested change
except ValueError:
except ValueError:
# Ignore PRs with invalid 'updated_at' format; skip processing this PR.

Copilot uses AI. Check for mistakes.
@amargiovanni amargiovanni merged commit 26f1903 into main Nov 28, 2025
10 checks passed
@amargiovanni amargiovanni deleted the 001-modular-refactor branch November 28, 2025 13:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant