debtector 🔍 — AI-Generated Technical Debt Scanner

debtector scans codebases for AI-generated technical debt patterns specifically, scoring and reporting them so teams can catch debt before it compounds.

Problem Statement

Per the 2025 Stack Overflow Developer Survey, the #1 frustration for developers is "AI solutions that look correct but are slightly wrong." Ars Technica (Jan 2026) reports growing concern about AI coding agents "building up technical debt — making poor design choices early that snowball into worse problems over time."

The problem: AI code generators produce code that works but accumulates subtle debt patterns — copy-paste duplication with slight variations, inconsistent error handling, over-engineered abstractions, dead code, naming inconsistencies, and "looks right but isn't idiomatic" patterns. Existing linters catch syntax issues but miss these higher-level AI-specific debt patterns.

debtector addresses this gap by specifically scanning for AI-generated technical debt patterns, providing early warning before they compound.

Target Users

Dev teams using AI coding assistants (Copilot, Claude Code, Cursor, etc.)
Tech leads doing code review on AI-assisted PRs
Solo developers who want a second opinion on AI-generated code quality

Quick Start

Installation

# Install from source
git clone https://github.com/debtector/debtector.git
cd debtector
pip install -e .

# Or install from PyPI (when published)
pip install debtector

30-Second Scan

# Scan current directory
debtector scan .

# Scan with JSON output
debtector scan ./src --format json

# Fail CI if score > 60
debtector scan . --threshold 60

# Pretty terminal report with colors
debtector report .

Example Output

🔍 DEBTECTOR REPORT
Overall Debt Score: 73.2/100

📊 Summary
Files Analyzed: 12
Total Issues: 28
AI-Generated Files: 7

🏷️ Category Breakdown
🔄 Duplication: 85.3 (HIGH)
❌ Error Handling: 67.1 (MEDIUM)  
🌀 Complexity: 45.8 (MEDIUM)
💀 Dead Code: 23.4 (LOW)

🔥 Files Needing Attention
📁 ai_service.py - Score: 89.2 | Issues: 8 | AI: 94%
  • HIGH: Functions 'process_user_data' and 'handle_user_data' are 91% similar
  • MEDIUM: Inconsistent error handling patterns across functions

Core Features

1. AI-Specific Pattern Detection

Near-duplicate functions: Functions that are 70-95% similar (AI loves generating slight variations instead of abstracting)

# ❌ AI-generated debt pattern
def process_user_data(user_data):
    result = {}
    if user_data:
        result['name'] = user_data.get('name', '')
        result['processed'] = True
    return result

def process_admin_data(admin_data):  # 91% similar!
    result = {}
    if admin_data:
        result['name'] = admin_data.get('name', '')
        result['processed'] = True
    return result

Copy-paste with mutations: Similar code blocks with small parameter changes

Inconsistent error handling: Mix of try/catch styles, some functions handle errors, others don't

Over-abstraction: Unnecessary wrapper functions, classes with single methods

Generic naming: AI's love for result, data, response, output variables

2. Intelligent Scoring System

0-100 scale: Higher = more debt
Category breakdown: Duplication, error handling, complexity, dead code, naming, AI markers
Severity levels: Low/Medium/High/Critical
AI confidence: Estimates likelihood code was AI-generated

3. Multi-Language Support

Python (full support)
JavaScript/TypeScript (full support)
Pluggable architecture for adding languages

4. Advanced CLI

# Watch mode - re-scan on changes
debtector watch . --interval 5

# Compare debt between commits  
debtector diff HEAD~5

# CI/CD integration
debtector scan . --threshold 60 --format json

# Custom output file
debtector scan . --output report.json

Configuration

Create .debtector.yaml in your project root:

# Supported languages
languages: [python, javascript, typescript]

# Ignore patterns (glob-style)
ignore:
  - "tests/**"
  - "node_modules/**"
  - "*.min.js"

# Score thresholds per category
thresholds:
  duplication: 70
  error_handling: 60
  dead_code: 50
  naming: 40
  complexity: 80
  overall: 60

# Custom patterns (regex)
custom_patterns:
  - name: "hardcoded-urls"
    regex: "https?://[^\"'\\s]+"
    severity: medium
    message: "Hardcoded URL detected"

# File size limit (bytes)
max_file_size: 1048576  # 1MB

# AI confidence threshold
exclude_ai_confidence_threshold: 0.3

How Scoring Works

Category Weights

Duplication: 25% (most important for AI-generated code)
Complexity: 25% (over-engineering is common)
Error Handling: 20% (inconsistency is a key AI pattern)
Dead Code: 15% (AI often generates unused code)
Naming: 10% (affects readability)
AI Markers: 5% (meta-indicator)

Severity Impact

Critical: 10x weight
High: 5x weight
Medium: 2.5x weight
Low: 1x weight

AI Confidence Bonus

Files with >70% AI confidence get 20% score penalty (AI code needs extra scrutiny).

CI/CD Integration

GitHub Actions

name: Technical Debt Check
on: [push, pull_request]

jobs:
  debt-scan:
    runs-on: ubuntu-latest
    steps:
    - uses: actions/checkout@v3
    - name: Set up Python
      uses: actions/setup-python@v3
      with:
        python-version: '3.10'
    - name: Install debtector
      run: pip install debtector
    - name: Scan for debt
      run: debtector scan . --threshold 70 --format json

Pre-commit Hook

# .pre-commit-config.yaml
repos:
  - repo: local
    hooks:
      - id: debtector
        name: debtector debt scan
        entry: debtector scan --threshold 80
        language: system
        pass_filenames: false

Architecture Overview

debtector/
├── analyzers/          # Debt pattern detectors
│   ├── duplication.py     # Near-duplicate detection
│   ├── error_handling.py  # Error pattern analysis  
│   ├── dead_code.py       # Unused code detection
│   ├── naming.py          # Naming convention analysis
│   ├── complexity.py      # Over-abstraction detection
│   └── ai_markers.py      # AI-generated code heuristics
├── parsers/            # Language-specific parsers
│   ├── python_parser.py   # Python AST analysis
│   └── js_parser.py       # JavaScript parsing
├── reporters/          # Output formatters
│   ├── terminal.py        # Rich terminal output
│   ├── json_reporter.py   # JSON/CI output
│   └── diff_reporter.py   # Git diff analysis
├── scanner.py          # Main orchestrator
├── scoring.py          # Debt scoring engine
├── config.py          # Configuration management
└── history.py         # Trend tracking

Detected Patterns

Duplication Patterns

Near-duplicates (70-95% similar functions)
Copy-paste mutations (same logic, different variable names)
Structural similarity (same flow, different details)

Error Handling Issues

Mixed patterns (try/catch + callbacks + promise.catch in same file)
Missing error handling (risky operations without protection)
Overly broad exceptions (bare except, catching Exception)

Complexity Anti-patterns

God functions (doing too many different things)
Single-method classes (unnecessary abstraction)
Wrapper functions (just calling another function)
Deep nesting (if/for/while pyramids)

AI-Specific Markers

Generic variable names (result, data, response, output)
Verbose explanatory comments ("This function does...")
Boilerplate-heavy structure (unnecessary initialization)
Perfect but generic naming (user_input, api_response)

Contributing

We welcome contributions! Here's how to add new analyzers:

Adding a New Language

Create parser in parsers/new_language_parser.py
Update analyzers to support the new AST format
Add file extension to config.py
Write tests with realistic fixtures

Adding a New Analyzer

Inherit from BaseAnalyzer:

from .base import BaseAnalyzer

class MyAnalyzer(BaseAnalyzer):
    def get_analyzer_name(self) -> str:
        return "my_pattern"
    
    def get_supported_extensions(self) -> List[str]:
        return [".py", ".js"]
    
    def analyze_file(self, file_path, content, parsed_ast):
        issues = []
        # Your detection logic here
        return issues

Add to scanner.py analyzer list
Write comprehensive tests
Update documentation

Running Tests

# Install development dependencies
pip install -e .[dev]

# Run all tests
pytest

# Run with coverage
pytest --cov=debtector --cov-report=html

# Run specific test
pytest tests/test_duplication.py -v

Roadmap

Near Term (v0.2)

Java/C# support
IDE integrations (VS Code extension)
Web dashboard for team debt tracking
Advanced ML models for AI detection

Medium Term (v0.3)

Automatic refactoring suggestions
Integration with AI coding tools
Team collaboration features
Historical debt analysis

Long Term (v1.0)

Real-time scanning during development
AI-powered code review assistant
Cross-repository debt tracking
Predictive debt modeling

FAQ

Q: How is this different from traditional linters?

A: Traditional linters catch syntax and style issues. debtector focuses on higher-level patterns specific to AI-generated code — things like near-duplicates, inconsistent abstractions, and "looks right but isn't idiomatic" patterns.

Q: Does debtector require an internet connection?

A: No. All analysis is local and deterministic. No AI models or cloud services required.

Q: What's the performance impact?

A: Minimal. Scanning a 50K line codebase takes ~10-30 seconds on modern hardware.

Q: Can I customize the scoring weights?

A: Yes, via .debtector.yaml. You can adjust category weights, severity thresholds, and add custom regex patterns.

Q: Does it work with legacy codebases?

A: Yes, but it's optimized for AI-generated patterns. You might want to adjust thresholds for older codebases written entirely by humans.

Q: How accurate is the AI detection?

A: The AI confidence score is heuristic-based (comment patterns, naming conventions, code structure). It's ~70-80% accurate for obvious AI-generated code, but should be used as a hint rather than definitive judgment.

License

MIT License. See LICENSE for details.

Credits

Created in response to growing concerns about AI-generated technical debt in software development. Special thanks to the Stack Overflow and Ars Technica reports that highlighted this problem.

Built with ❤️ for developers dealing with AI code debt.

Found a bug or have a suggestion? Open an issue or contribute!

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
debtector		debtector
tests		tests
.debtector.yaml		.debtector.yaml
.gitignore		.gitignore
BUILD_SUMMARY.md		BUILD_SUMMARY.md
FUTURE_FEATURES.md		FUTURE_FEATURES.md
LICENSE		LICENSE
README.md		README.md
TASK.md		TASK.md
demo.py		demo.py
pyproject.toml		pyproject.toml

Folders and files

Latest commit

History

Repository files navigation

debtector 🔍 — AI-Generated Technical Debt Scanner

Problem Statement

Target Users

Quick Start

Installation

30-Second Scan

Example Output

Core Features

1. AI-Specific Pattern Detection

2. Intelligent Scoring System

3. Multi-Language Support

4. Advanced CLI

Configuration

How Scoring Works

Category Weights

Severity Impact

AI Confidence Bonus

CI/CD Integration

GitHub Actions

Pre-commit Hook

Architecture Overview

Detected Patterns

Duplication Patterns

Error Handling Issues

Complexity Anti-patterns

AI-Specific Markers

Contributing

Adding a New Language

Adding a New Analyzer

Running Tests

Roadmap

Near Term (v0.2)

Medium Term (v0.3)

Long Term (v1.0)

FAQ

Q: How is this different from traditional linters?

Q: Does debtector require an internet connection?

Q: What's the performance impact?

Q: Can I customize the scoring weights?

Q: Does it work with legacy codebases?

Q: How accurate is the AI detection?

License

Credits

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages