CodeMap is structured as a layered application with clean separation between concerns.
┌─────────────────────────────┐
│ CLI Layer │ Typer commands, argument parsing
│ codemap/cli/ │ Thin — delegates to Application
├─────────────────────────────┤
│ Application Layer │ Use-case orchestration
│ codemap/application/ │ scan, analyze, graph, report
├─────────────────────────────┤
│ Domain Layer │ Core graph model, metrics, protocols
│ codemap/domain/ │ Zero external dependencies
├──────────────┬──────────────┤
│ Infrastructure│ Rendering │ Git, filesystem, │ JSON, HTML,
│ codemap/ │ codemap/ │ extractors │ terminal
│ infrastructure│ rendering/ │ │
└──────────────┴──────────────┘
The domain layer is the core — it has no external dependencies (only Python stdlib).
| Type | Purpose |
|---|---|
Node |
A file, directory, or module in the graph |
Edge |
A directed dependency between two nodes |
NodeGroup |
A logical cluster (directory, package) |
CodeGraph |
The central aggregate — nodes, edges, groups |
NodeMetrics |
Fan-in, fan-out, centrality, churn per node |
OwnershipInfo |
Contributor data attached to a node |
ContributorInfo |
Individual contributor snapshot |
| Protocol | Purpose |
|---|---|
DependencyExtractor |
Extracts edges from a source file |
OwnershipProvider |
Provides contributor data per file |
GraphRenderer |
Renders a CodeGraph to an output format |
All protocols use typing.Protocol with runtime_checkable for structural subtyping — implementors don't need to inherit from anything.
Handles all external I/O and language-specific logic.
Walks the repository tree with configurable include/exclude rules and ignored directory lists. Produces ScanResult containing ScannedFile entries.
Shells out to git log to extract per-file contributor and churn data. Implements OwnershipProvider. Degrades gracefully when git is unavailable.
Batch mode (default): Uses prefetch() to load ownership and churn for all files in two git log calls total, regardless of repository size. This replaces the previous 2×N per-file subprocess approach and is critical for performance on large repositories (e.g. Angular: ~6200 files).
Each extractor implements DependencyExtractor:
| Extractor | Strategy |
|---|---|
PythonExtractor |
AST parsing of import / from … import statements |
JavaScriptExtractor |
Regex-based parsing of ES import and CommonJS require |
Orchestrates domain and infrastructure into use-cases:
| Module | Use-case |
|---|---|
scanner.py |
Scan a repository → ScanResult |
analyzer.py |
Scan + extract deps + git + metrics → CodeGraph |
grapher.py |
Render a CodeGraph to HTML or JSON |
reporter.py |
Generate a CodeMapReport with hotspots and ownership |
| Renderer | Output |
|---|---|
JsonRenderer |
Machine-readable codemap.json |
HtmlGraphRenderer |
Interactive D3.js visualization with multiple layouts, view modes, risk overlay, and exploration controls (codemap.html) |
PdfReportRenderer |
Multi-page PDF report with tables and metrics (codemap.pdf) |
terminal_renderer |
Rich console output for codemap report |
The HTML template is separated into _html_template.py for maintainability. The renderer pre-computes risk scores and topological depth in Python and passes enriched data to the D3.js client-side visualization.
The HTML output is structured to be extensible:
- Layout engines are separate JS functions (
computeHierarchy,computeRadial,computeCluster,computeFlow) — adding a new layout means adding one function and one toolbar button. The Manual layout mode stops the simulation and pins nodes viafx/fy; amanualTick()function updates the DOM during drag. Save/restore useslocalStoragewith per-page keys. - Color modes are handled by a single
nodeColor(d)function dispatching oncolorModestate. - View modes (All / Neighborhood / Impact) are applied via the
applyView()function which sets node dimming state. - Display modes (Overview / Readable / Focus / Presentation / Spacious) control label visibility and force spacing via
modeSpacing()andlabelCollisionR(). - Focused Node Exploration is a first-class mode:
activateFocusedNode()/applyFocusedNodeView()isolate a selected node's subgraph with sub-modes (local, deps, reverse, impact, flow). - Sidebar tabs are independent panels — new tabs can be added without affecting existing ones.
- Minimap provides navigation for large, spacious graphs via
buildMinimap(). - Node Notes (
nodeNotesMap) allow annotations on any node, with in-graph indicators and a modal editor (hidden by default, shown only on user action). Notes are stored in-memory per session. - Author accordion (
buildAuthorsTab()) renders expandable per-contributor panels with commit counts, file lists, risk metrics, and clickable file navigation. Selecting an author dims unrelated nodes on the graph. - I18N uses a
T(key)function backed by per-language dictionaries; all ~100 UI strings go through this function. Polish and English are fully supported.
Thin Typer commands that parse arguments and delegate to the application layer. No business logic lives here.
The AnalysisProgress module provides Rich-based progress feedback for long-running operations:
- Spinners for indeterminate work (e.g. scanning, git analysis)
- Stage labels (
► Extracting dependencies…) for each pipeline phase - Success/warning indicators (
✓/!) for completed stages - Progress bars for countable work (available but optional)
All CLI commands wrap operations with AnalysisProgress and handle Ctrl+C gracefully with a clean Cancelled. message and exit code 130.
The HTML visualization uses a progressive rendering strategy for large graphs (>200 nodes):
When DATA.meta.nodeCount > 200, the visualization automatically:
- Collapses groups — Directory groups with >5 files become single cluster nodes
- Remaps edges — Edges between collapsed nodes are merged into cluster-level edges
- Supports click-to-expand — Clicking a cluster node expands its member files
- Rebuilds the simulation — The
rebuildGraph()function reinitializes D3 data joins, forces, and event handlers
The simParams() function scales force simulation parameters based on node count:
| Node Count | Link Distance | Charge | Distance Max | Collision Iterations |
|---|---|---|---|---|
| ≤200 | 280 | -800 | 1200 | 4 |
| 200–500 | 320 | -1000 | 1600 | 3 |
| >500 | 400 | -1200 | 2000 | 2 |
On large graphs, the simulation tick handler runs expensive DOM operations (hulls, hotspot rings, entry markers, note indicators) every 3rd tick instead of every tick, reducing CPU overhead.
The toolbar includes a Fast / Quality toggle (visible only for large graphs):
- Fast — collapsed clusters, stricter zoom-gated labels, throttled ticks
- Quality — all clusters expanded, full label visibility
The --fast flag on analyze and graph commands skips git analysis (ownership, churn) entirely, reducing processing time for large repositories to structure-only analysis.
CLI → Application → Domain
↘ Infrastructure → Domain
↘ Rendering → Domain
The domain layer is depended on by all other layers but depends on nothing. This ensures the core model can be tested in isolation and evolved independently.
- New language analyzer → implement
DependencyExtractor, register inanalyzer.py - New output format → implement
GraphRenderer, register ingrapher.py - New ownership strategy → implement
OwnershipProvider - New metrics → extend
compute_all_metrics()indomain/metrics.py - CI validation rules → consume
CodeGraphorCodeMapReport - New performance strategy → extend
simParams()in the HTML template or add analysis depth levels inanalyzer.py
The HTML visualization supports three theme modes:
| Mode | Behavior |
|---|---|
| System (default) | Follows the OS prefers-color-scheme media query |
| Dark | GitHub-inspired dark palette (:root variables) |
| Light | GitHub-inspired light palette ([data-theme="light"] override) |
Theme selection is persisted in localStorage (codemap-theme key). The applyTheme() function sets or removes the data-theme attribute on <html>. All UI elements use CSS custom properties (--bg0–--bg3, --text, --accent, etc.) ensuring consistent theming across sidebar, toolbar, graph, and overlays.