Skip to content

CypherXXXX/GitMetrix

Repository files navigation

⚡ GitMetrix

AI-Powered Developer Intelligence Platform & Repository Analytics Engine
Index any GitHub repository • MoE LLM Router • AST-aware code parsing • Hybrid RAG retrieval • Architecture Visualization • AI PR Review • Code Health Analysis


Why GitMetrix?

Developers lack a single place to understand their GitHub activity at a glance — contribution velocity, language breakdown, streak tracking — and there's no easy way to have an AI conversation about a codebase without copy-pasting files. GitMetrix solves both problems in one tool, and goes further with architecture visualization, automated PR reviews, code health analysis, and developer productivity analytics.


Features

📊 Dashboard Analytics

  • Velocity Score — Weighted composite score measuring overall development output
  • Active Streak — Consecutive days of contributions with streak tracking
  • Total Output — Commits, PRs merged, and issues resolved with yearly breakdown
  • Language Breakdown — Pie chart distribution of languages across all repos
  • Activity Graph — Monthly commit activity visualization using Recharts
  • Top Repositories — Sorted by stars, showing language, stars, and forks
  • Code Health Tab — Static analysis with health gauge, risk files, and AI recommendations
  • Advanced Analytics Tab — Bus Factor, Contributor Influence scores, and team insights

🤖 AI Codebase Chat

  • User Repo Selector — Browse and select your own repositories to chat with
  • Universal Search — Paste any public GitHub URL to index and chat
  • AST-Aware Parsing — Multi-language symbol extraction (functions, classes, methods, imports, exports) across TypeScript, JavaScript, Python, Java, Go, Rust, C/C++, Ruby, and PHP
  • Intelligent Chunking — Code is chunked by logical structure (1 function = 1 chunk, 1 class = 1 chunk) instead of fixed-size splits, with a 100-char minimum to eliminate noise
  • Dependency Graph — Import/export relationships are tracked across files, enabling contextual reasoning about how modules connect
  • Hybrid Retrieval — Combines pgvector cosine similarity search with PostgreSQL full-text search for higher accuracy
  • Multi-Query Search — Each user question generates 3 semantic variations, all searched independently and merged for broader coverage
  • Cohere Cross-Encoder Reranking — Top results are reranked using Cohere's reranker API for precision (gracefully falls back if unavailable)
  • Neighbor Chunk Expansion — Retrieved chunks automatically include adjacent chunks (±1) for fuller code context
  • File Importance Weighting — Chunks from src/, app/, lib/, core/ get a 1.15× relevance boost over test/docs files
  • Streaming Responses — Real-time SSE streaming via MoE LLM Router
  • AI-Generated Suggestions — Three repo-specific starter questions generated by analyzing symbol metadata and language breakdown
  • File References — Each response includes clickable file path tags showing which files were referenced

🏗️ Architecture Visualizer

  • Interactive Dependency Graph — React Flow–powered visualization of file-level dependencies
  • Circular Dependency Detection — DFS-based algorithm highlights circular imports in red
  • Node Importance Scoring — Nodes colored by in-degree + out-degree importance
  • Detail Panel — Click any node to see dependents, dependencies, language, and risk
  • Stats Overlay — Total files, edges, and circular dependency count at a glance

🔍 AI Pull Request Reviewer

  • GitHub PR Diff Analysis — Paste any PR URL for automated code review
  • Categorized Findings — Bugs, security issues, optimizations, and suggestions
  • Severity Classification — Critical, warning, and info severity badges
  • Score Gauge — Overall quality score (0–100) with visual indicator
  • AI Summary — Concise actionable summary generated via deep reasoning

🛡️ Code Health Scanner

  • Static Analysis — Cyclomatic complexity, nesting depth, function lengths, risk scoring
  • Health Score — Aggregate health gauge for the entire repository
  • Risk File Listing — Top risky files ranked with progress bar visualization
  • AI Recommendations — DeepSeek-powered refactoring recommendations via OpenRouter

📈 Developer Productivity Analytics

  • Bus Factor — Contributor concentration risk analysis
  • Contributor Influence Score — Weighted scoring by commits and code changes
  • Influence Leaderboard — Animated progress bars ranking team members

🧭 Agentic Code Navigation

  • Entry Point Detection — Identifies starting files from natural language queries
  • Multi-Hop Dependency Traversal — Walks dependency graph up to 3 levels deep
  • Architecture Explanation — AI-generated code flow explanations referencing specific files

🔐 Authentication & Security

  • GitHub OAuth via Clerk — Sign in with GitHub, access profile and repos
  • Protected Routes — Dashboard, Chat, Architecture, and PR Review require authentication
  • Environment-based Config — All secrets stored as environment variables

How It Works

User selects repo → GitHub API fetches file tree → Redis-cached tree (1hr TTL)
→ Files prioritized (src/app/lib first) → Capped at 1500 files
→ 15 concurrent file fetches with rate-limit backoff
→ AST parser extracts functions, classes, imports per file
→ Chunked by symbol boundaries (100-char min, 800-token max)
→ Embeddings generated via HuggingFace (batch 32)
→ Stored in Supabase pgvector (batch 100 rows)
→ Dependency graph built and stored

Indexing Pipeline (Inngest):

  1. /api/index receives a repo URL, creates a DB record, sends an Inngest event
  2. Inngest function index-github-repo runs in the background:
    • Validates repository exists in database
    • Fetches full repo tree via GitHub API (default branch → commit SHA → tree SHA)
    • Caches tree in Redis (repo-tree:{owner}/{repo}, 1-hour TTL)
    • Filters out 36+ excluded directories, binaries, lock files, and files > 200KB
    • Prioritizes src/, app/, lib/, packages/, core/ directories first
    • Caps at 1500 files for large repository stability
    • Fetches file content at 15 concurrent requests with rate-limit pause
    • Parses each file using regex-based AST extractors (10 languages supported)
    • Chunks by symbol boundaries — 1 function = 1 chunk, skips chunks < 100 chars
    • Generates embeddings in batches of 32 via HuggingFace MiniLM-L6-v2
    • Inserts rows in batches of 100 into Supabase
    • Builds a dependency graph from import/export relationships
    • Updates progress every 30 processed files
  3. /api/index/status polls for completion with enriched stats (files discovered, processed, chunks, vectors, languages)

Chat Pipeline (Hybrid RAG with MoE Router):

  1. User query → LLM generates 2 alternative search queries (3 total) via MoE router
  2. All 3 queries embedded → pgvector cosine similarity search (8 results each)
  3. PostgreSQL full-text search runs in parallel using GIN index
  4. Vector + FTS results merged and deduplicated
  5. Top 5 matches expand: neighbor chunks (±1 chunk index) fetched
  6. Cohere cross-encoder reranking applied (top 12 results)
  7. Dependency graph neighbors retrieved for matched files
  8. Architecture query detection for enhanced dependency context
  9. File importance weighting applied (1.15× for src/, app/, lib/, core/)
  10. Top 12 context blocks assembled with symbol metadata and line ranges
  11. Context + conversation history → MoE Router selects optimal provider
  12. Response streamed back via Server-Sent Events

Tech Stack

Layer Technology
Framework Next.js 16 (App Router, Turbopack)
Language TypeScript 5
Styling Tailwind CSS 4 (@theme tokens)
Auth Clerk (GitHub OAuth)
Database Supabase (PostgreSQL + pgvector)
Cache Upstash Redis (tree caching, 1hr TTL)
Embeddings HuggingFace Inference API (MiniLM-L6-v2, 384-dim)
LLM Router Mixture-of-Experts task-based routing
LLM — Chat Streaming Cerebras (LLaMA 3.3 70B) → Groq fallback
LLM — Query Expansion Groq (LLaMA 3.3 70B) → OpenRouter fallback
LLM — Deep Reasoning DeepSeek V3 via OpenRouter → Groq fallback
LLM — Large Context Gemini Flash (1M context) → OpenRouter fallback
LLM — Consensus OpenRouter (Qwen 2.5 7B) → Together AI fallback
Reranking Cohere Cross-Encoder
Graph Visualization React Flow (@xyflow/react)
Background Jobs Inngest (serverless, runs on Vercel)
GitHub API Octokit
Charts Recharts
Animations Framer Motion
Validation Zod
Icons Lucide React

Project Structure

GitMetrix/
├── src/
│   ├── app/
│   │   ├── api/
│   │   │   ├── analytics/
│   │   │   │   └── route.ts              # Bus Factor + Contributor Influence API
│   │   │   ├── architecture/
│   │   │   │   └── graph/
│   │   │   │       └── route.ts          # Dependency graph API with circular detection
│   │   │   ├── chat/
│   │   │   │   ├── route.ts              # Hybrid RAG chat (multi-query + Cohere reranking)
│   │   │   │   └── suggestions/
│   │   │   │       └── route.ts          # Symbol-aware starter questions
│   │   │   ├── code-health/
│   │   │   │   └── route.ts              # Static analysis + AI recommendations
│   │   │   ├── index/
│   │   │   │   ├── route.ts              # Start repo indexing via Inngest
│   │   │   │   └── status/
│   │   │   │       └── route.ts          # Enriched indexing progress stats
│   │   │   ├── inngest/
│   │   │   │   └── route.ts              # Inngest serve endpoint
│   │   │   ├── pr-review/
│   │   │   │   └── route.ts              # AI PR review (diff analysis + scoring)
│   │   │   └── repos/
│   │   │       └── route.ts              # Fetch user's GitHub repos
│   │   ├── architecture/
│   │   │   ├── layout.tsx                # Architecture page layout with navigation
│   │   │   └── page.tsx                  # Interactive dependency graph visualizer
│   │   ├── chat/
│   │   │   └── page.tsx                  # User repo chat page
│   │   ├── dashboard/
│   │   │   └── page.tsx                  # Dashboard with Overview/Health/Analytics tabs
│   │   ├── pr-review/
│   │   │   ├── layout.tsx                # PR review layout with navigation
│   │   │   └── page.tsx                  # AI pull request reviewer
│   │   ├── search/
│   │   │   └── page.tsx                  # Universal repo search chat
│   │   ├── sign-in/[[...sign-in]]/
│   │   │   └── page.tsx
│   │   ├── sign-up/[[...sign-up]]/
│   │   │   └── page.tsx
│   │   ├── globals.css                   # Tailwind v4 theme + custom animations
│   │   ├── layout.tsx                    # Root layout with Clerk + fonts
│   │   └── page.tsx                      # Landing page
│   ├── components/
│   │   ├── ui/
│   │   │   ├── card.tsx                  # Glassmorphism card component
│   │   │   ├── charts.tsx                # Recharts activity + language charts
│   │   │   └── skeleton.tsx              # Loading skeleton component
│   │   ├── analytics-tab.tsx             # Bus Factor + Influence leaderboard
│   │   ├── animated-background.tsx       # Animated beams + glow background
│   │   ├── architecture-graph.tsx        # React Flow dependency graph component
│   │   ├── chat-interface.tsx            # Full chat UI (phases, streaming, markdown)
│   │   ├── code-health-tab.tsx           # Health gauge + risk files + AI recommendations
│   │   ├── dashboard-content.tsx         # Dashboard cards + grid layout
│   │   ├── dashboard-header.tsx          # Glassmorphism navbar + mobile drawer
│   │   ├── dashboard-tabs.tsx            # Tab switcher (Overview/Health/Analytics)
│   │   ├── navigation.tsx                # Shared nav bar with mobile hamburger
│   │   ├── repo-selector.tsx             # Repository browser with search/filter
│   │   └── username-search.tsx           # GitHub username search input
│   ├── inngest/
│   │   └── index-github-repo.ts          # Multi-step background indexing function
│   ├── lib/
│   │   ├── agentic-navigator.ts          # Multi-hop dependency traversal + AI explanation
│   │   ├── chunker.ts                    # AST-aware code chunking (100-char min, symbol boundaries)
│   │   ├── dependency-graph.ts           # Import/export graph builder + BFS traversal
│   │   ├── embeddings.ts                 # HuggingFace embedding client (batch 32, retry)
│   │   ├── github.ts                     # GitHub API helpers (dashboard stats, repos)
│   │   ├── groq.ts                       # Groq SDK singleton (legacy, used by llm/groq.ts)
│   │   ├── indexer.ts                    # Pipeline orchestrator (fetch→parse→chunk→embed→store)
│   │   ├── inngest.ts                    # Inngest client instance
│   │   ├── llm/
│   │   │   ├── cerebras.ts              # Cerebras provider (LLaMA 3.3 — fast streaming)
│   │   │   ├── cohere.ts                # Cohere cross-encoder reranker
│   │   │   ├── gemini.ts                # Gemini Flash provider (1M context window)
│   │   │   ├── groq.ts                  # Groq provider (LLaMA 3.3 70B)
│   │   │   ├── llmRouter.ts            # MoE task-based router (6 task types)
│   │   │   ├── openrouter.ts            # OpenRouter provider (Qwen 2.5 + DeepSeek V3)
│   │   │   └── together.ts             # Together AI provider (Qwen 2.5 7B Turbo)
│   │   ├── parser.ts                     # Multi-language AST symbol extractor (10 languages)
│   │   ├── redis.ts                      # Upstash Redis client
│   │   ├── static-analysis.ts            # Cyclomatic complexity + nesting + risk scoring
│   │   ├── supabase.ts                   # Supabase client
│   │   ├── types.ts                      # TypeScript type definitions
│   │   ├── utils.ts                      # Utility functions
│   │   └── validators.ts                 # Zod validation schemas
│   └── middleware.ts                     # Clerk auth middleware
├── supabase/
│   ├── migration.sql                     # Base schema (repositories, repository_files, pgvector)
│   ├── migration_v2.sql                  # Symbol metadata columns + dependency_edges table
│   ├── migration_v3.sql                  # GIN full-text search index
│   └── migration_v4.sql                  # Code metrics, PR reviews, file hashes tables
├── .gitignore
├── package.json
├── tsconfig.json
└── next.config.ts

Environment Variables

NEXT_PUBLIC_CLERK_PUBLISHABLE_KEY=
CLERK_SECRET_KEY=
NEXT_PUBLIC_CLERK_SIGN_IN_URL=/sign-in
NEXT_PUBLIC_CLERK_SIGN_UP_URL=/sign-up

GITHUB_TOKEN=

NEXT_PUBLIC_SUPABASE_URL=
SUPABASE_SERVICE_ROLE_KEY=

UPSTASH_REDIS_REST_URL=
UPSTASH_REDIS_REST_TOKEN=

HUGGINGFACE_API_KEY=
GROQ_API_KEY=
OPENROUTER_API_KEY=
TOGETHER_API_KEY=

CEREBRAS_API_KEY=
GEMINI_API_KEY=
COHERE_API_KEY=

INNGEST_EVENT_KEY=
INNGEST_SIGNING_KEY=

Deployment

GitMetrix is designed for Vercel deployment. Inngest runs as a serverless function via the /api/inngest route — no separate server or CLI required.

  1. Push to GitHub
  2. Import into Vercel
  3. Add all environment variables
  4. Connect Inngest Cloud to your Vercel deployment URL
  5. Run the Supabase migrations in order:
    • supabase/migration.sql — base schema
    • supabase/migration_v2.sql — symbol metadata + dependency edges
    • supabase/migration_v3.sql — full-text search index
    • supabase/migration_v4.sql — code metrics, PR reviews, file hashes

LLM Mixture-of-Experts Router

User Query → MoE Router → Task Classification
                            │
                            ├── chat_stream     → Cerebras → Groq → OpenRouter → Together
                            ├── query_expansion → Groq → OpenRouter → Together
                            ├── large_context   → Gemini Flash → OpenRouter → Together
                            ├── deep_reasoning  → DeepSeek V3 (via OpenRouter) → Groq
                            ├── consensus       → OpenRouter (Qwen) → Together → Groq
                            └── general         → Groq → Cerebras → OpenRouter → Together

The system uses a task-based Mixture-of-Experts router:

  • Each task type has a prioritized fallback chain of LLM providers
  • DeepSeek V3 is accessed through OpenRouter (no direct API) for deep reasoning tasks
  • Cerebras handles fast chat streaming with Groq as fallback
  • Gemini Flash processes large context tasks (up to 1M tokens)
  • Cohere provides cross-encoder reranking for RAG retrieval precision
  • Auto-escalates to large_context when token count exceeds 30K
  • The system works with only GROQ_API_KEY + OPENROUTER_API_KEY — other providers activate when their keys are present

Performance

Metric Value
File Fetch Concurrency 15 parallel requests
Max Files Per Repo 1,500
Embedding Batch Size 32
DB Insert Batch Size 100 rows
Redis Tree Cache TTL 1 hour
Supported Languages 10 (TS, JS, Python, Java, Go, Rust, C/C++, Ruby, PHP)
Retrieval Queries Per Chat 3 (multi-query)
Context Chunks Per Response Up to 12 (Cohere-reranked)
LLM Providers 6 (Cerebras, Groq, Gemini, DeepSeek via OR, OpenRouter, Together)
Task Types 6 (chat_stream, query_expansion, large_context, deep_reasoning, consensus, general)

License

MIT

About

AI-powered platform for exploring GitHub repositories — combining hybrid RAG, AST-based code analysis, and multi-LLM routing to deliver codebase chat, architecture visualization, developer analytics, and AI pull-request reviews.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors