Skip to content

jugaad-lab/debtector

Repository files navigation

debtector 🔍 — AI-Generated Technical Debt Scanner

Python 3.10+ License: MIT Code style: black

debtector scans codebases for AI-generated technical debt patterns specifically, scoring and reporting them so teams can catch debt before it compounds.

Problem Statement

Per the 2025 Stack Overflow Developer Survey, the #1 frustration for developers is "AI solutions that look correct but are slightly wrong." Ars Technica (Jan 2026) reports growing concern about AI coding agents "building up technical debt — making poor design choices early that snowball into worse problems over time."

The problem: AI code generators produce code that works but accumulates subtle debt patterns — copy-paste duplication with slight variations, inconsistent error handling, over-engineered abstractions, dead code, naming inconsistencies, and "looks right but isn't idiomatic" patterns. Existing linters catch syntax issues but miss these higher-level AI-specific debt patterns.

debtector addresses this gap by specifically scanning for AI-generated technical debt patterns, providing early warning before they compound.

Target Users

  • Dev teams using AI coding assistants (Copilot, Claude Code, Cursor, etc.)
  • Tech leads doing code review on AI-assisted PRs
  • Solo developers who want a second opinion on AI-generated code quality

Quick Start

Installation

# Install from source
git clone https://github.com/debtector/debtector.git
cd debtector
pip install -e .

# Or install from PyPI (when published)
pip install debtector

30-Second Scan

# Scan current directory
debtector scan .

# Scan with JSON output
debtector scan ./src --format json

# Fail CI if score > 60
debtector scan . --threshold 60

# Pretty terminal report with colors
debtector report .

Example Output

🔍 DEBTECTOR REPORT
Overall Debt Score: 73.2/100

📊 Summary
Files Analyzed: 12
Total Issues: 28
AI-Generated Files: 7

🏷️ Category Breakdown
🔄 Duplication: 85.3 (HIGH)
❌ Error Handling: 67.1 (MEDIUM)  
🌀 Complexity: 45.8 (MEDIUM)
💀 Dead Code: 23.4 (LOW)

🔥 Files Needing Attention
📁 ai_service.py - Score: 89.2 | Issues: 8 | AI: 94%
  • HIGH: Functions 'process_user_data' and 'handle_user_data' are 91% similar
  • MEDIUM: Inconsistent error handling patterns across functions

Core Features

1. AI-Specific Pattern Detection

Near-duplicate functions: Functions that are 70-95% similar (AI loves generating slight variations instead of abstracting)

# ❌ AI-generated debt pattern
def process_user_data(user_data):
    result = {}
    if user_data:
        result['name'] = user_data.get('name', '')
        result['processed'] = True
    return result

def process_admin_data(admin_data):  # 91% similar!
    result = {}
    if admin_data:
        result['name'] = admin_data.get('name', '')
        result['processed'] = True
    return result

Copy-paste with mutations: Similar code blocks with small parameter changes

Inconsistent error handling: Mix of try/catch styles, some functions handle errors, others don't

Over-abstraction: Unnecessary wrapper functions, classes with single methods

Generic naming: AI's love for result, data, response, output variables

2. Intelligent Scoring System

  • 0-100 scale: Higher = more debt
  • Category breakdown: Duplication, error handling, complexity, dead code, naming, AI markers
  • Severity levels: Low/Medium/High/Critical
  • AI confidence: Estimates likelihood code was AI-generated

3. Multi-Language Support

  • Python (full support)
  • JavaScript/TypeScript (full support)
  • Pluggable architecture for adding languages

4. Advanced CLI

# Watch mode - re-scan on changes
debtector watch . --interval 5

# Compare debt between commits  
debtector diff HEAD~5

# CI/CD integration
debtector scan . --threshold 60 --format json

# Custom output file
debtector scan . --output report.json

Configuration

Create .debtector.yaml in your project root:

# Supported languages
languages: [python, javascript, typescript]

# Ignore patterns (glob-style)
ignore:
  - "tests/**"
  - "node_modules/**"
  - "*.min.js"

# Score thresholds per category
thresholds:
  duplication: 70
  error_handling: 60
  dead_code: 50
  naming: 40
  complexity: 80
  overall: 60

# Custom patterns (regex)
custom_patterns:
  - name: "hardcoded-urls"
    regex: "https?://[^\"'\\s]+"
    severity: medium
    message: "Hardcoded URL detected"

# File size limit (bytes)
max_file_size: 1048576  # 1MB

# AI confidence threshold
exclude_ai_confidence_threshold: 0.3

How Scoring Works

Category Weights

  • Duplication: 25% (most important for AI-generated code)
  • Complexity: 25% (over-engineering is common)
  • Error Handling: 20% (inconsistency is a key AI pattern)
  • Dead Code: 15% (AI often generates unused code)
  • Naming: 10% (affects readability)
  • AI Markers: 5% (meta-indicator)

Severity Impact

  • Critical: 10x weight
  • High: 5x weight
  • Medium: 2.5x weight
  • Low: 1x weight

AI Confidence Bonus

Files with >70% AI confidence get 20% score penalty (AI code needs extra scrutiny).

CI/CD Integration

GitHub Actions

name: Technical Debt Check
on: [push, pull_request]

jobs:
  debt-scan:
    runs-on: ubuntu-latest
    steps:
    - uses: actions/checkout@v3
    - name: Set up Python
      uses: actions/setup-python@v3
      with:
        python-version: '3.10'
    - name: Install debtector
      run: pip install debtector
    - name: Scan for debt
      run: debtector scan . --threshold 70 --format json

Pre-commit Hook

# .pre-commit-config.yaml
repos:
  - repo: local
    hooks:
      - id: debtector
        name: debtector debt scan
        entry: debtector scan --threshold 80
        language: system
        pass_filenames: false

Architecture Overview

debtector/
├── analyzers/          # Debt pattern detectors
│   ├── duplication.py     # Near-duplicate detection
│   ├── error_handling.py  # Error pattern analysis  
│   ├── dead_code.py       # Unused code detection
│   ├── naming.py          # Naming convention analysis
│   ├── complexity.py      # Over-abstraction detection
│   └── ai_markers.py      # AI-generated code heuristics
├── parsers/            # Language-specific parsers
│   ├── python_parser.py   # Python AST analysis
│   └── js_parser.py       # JavaScript parsing
├── reporters/          # Output formatters
│   ├── terminal.py        # Rich terminal output
│   ├── json_reporter.py   # JSON/CI output
│   └── diff_reporter.py   # Git diff analysis
├── scanner.py          # Main orchestrator
├── scoring.py          # Debt scoring engine
├── config.py          # Configuration management
└── history.py         # Trend tracking

Detected Patterns

Duplication Patterns

  • Near-duplicates (70-95% similar functions)
  • Copy-paste mutations (same logic, different variable names)
  • Structural similarity (same flow, different details)

Error Handling Issues

  • Mixed patterns (try/catch + callbacks + promise.catch in same file)
  • Missing error handling (risky operations without protection)
  • Overly broad exceptions (bare except, catching Exception)

Complexity Anti-patterns

  • God functions (doing too many different things)
  • Single-method classes (unnecessary abstraction)
  • Wrapper functions (just calling another function)
  • Deep nesting (if/for/while pyramids)

AI-Specific Markers

  • Generic variable names (result, data, response, output)
  • Verbose explanatory comments ("This function does...")
  • Boilerplate-heavy structure (unnecessary initialization)
  • Perfect but generic naming (user_input, api_response)

Contributing

We welcome contributions! Here's how to add new analyzers:

Adding a New Language

  1. Create parser in parsers/new_language_parser.py
  2. Update analyzers to support the new AST format
  3. Add file extension to config.py
  4. Write tests with realistic fixtures

Adding a New Analyzer

  1. Inherit from BaseAnalyzer:
from .base import BaseAnalyzer

class MyAnalyzer(BaseAnalyzer):
    def get_analyzer_name(self) -> str:
        return "my_pattern"
    
    def get_supported_extensions(self) -> List[str]:
        return [".py", ".js"]
    
    def analyze_file(self, file_path, content, parsed_ast):
        issues = []
        # Your detection logic here
        return issues
  1. Add to scanner.py analyzer list
  2. Write comprehensive tests
  3. Update documentation

Running Tests

# Install development dependencies
pip install -e .[dev]

# Run all tests
pytest

# Run with coverage
pytest --cov=debtector --cov-report=html

# Run specific test
pytest tests/test_duplication.py -v

Roadmap

Near Term (v0.2)

  • Java/C# support
  • IDE integrations (VS Code extension)
  • Web dashboard for team debt tracking
  • Advanced ML models for AI detection

Medium Term (v0.3)

  • Automatic refactoring suggestions
  • Integration with AI coding tools
  • Team collaboration features
  • Historical debt analysis

Long Term (v1.0)

  • Real-time scanning during development
  • AI-powered code review assistant
  • Cross-repository debt tracking
  • Predictive debt modeling

FAQ

Q: How is this different from traditional linters?

A: Traditional linters catch syntax and style issues. debtector focuses on higher-level patterns specific to AI-generated code — things like near-duplicates, inconsistent abstractions, and "looks right but isn't idiomatic" patterns.

Q: Does debtector require an internet connection?

A: No. All analysis is local and deterministic. No AI models or cloud services required.

Q: What's the performance impact?

A: Minimal. Scanning a 50K line codebase takes ~10-30 seconds on modern hardware.

Q: Can I customize the scoring weights?

A: Yes, via .debtector.yaml. You can adjust category weights, severity thresholds, and add custom regex patterns.

Q: Does it work with legacy codebases?

A: Yes, but it's optimized for AI-generated patterns. You might want to adjust thresholds for older codebases written entirely by humans.

Q: How accurate is the AI detection?

A: The AI confidence score is heuristic-based (comment patterns, naming conventions, code structure). It's ~70-80% accurate for obvious AI-generated code, but should be used as a hint rather than definitive judgment.

License

MIT License. See LICENSE for details.

Credits

Created in response to growing concerns about AI-generated technical debt in software development. Special thanks to the Stack Overflow and Ars Technica reports that highlighted this problem.


Built with ❤️ for developers dealing with AI code debt.

Found a bug or have a suggestion? Open an issue or contribute!

About

AI-generated technical debt scanner — detects near-duplicates, copy-paste mutations, inconsistent error handling, dead code clusters, and more. Zero dependencies.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages