Skip to content

mdemirhan/dux

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

61 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

dux — a fast, pretty, interactive disk usage analyzer

Parallel disk usage analyzer for macOS and Linux. Scans directories with multi-threaded I/O, categorizes files (temp, cache, build artifacts), and presents results as rich CLI tables or an interactive TUI with vim-style navigation.

100% AI-written. The vast majority of this codebase was written by Claude (Anthropic), with contributions from Codex (OpenAI). Human involvement was limited to directing, reviewing, and benchmarking.

Screenshots

CLI summary (uv run dux)

CLI summary table

TUI overview (uv run dux -i)

TUI overview tab

TUI browse — expandable directory tree with disk usage bars

TUI browse tab

Features

  • Parallel scanning with configurable thread pool (default 4 workers)
  • Interactive TUI with 5 views, vim keybindings, search/filter, pagination
  • Composable CLI flags--top-temp, --top-cache, --top-dirs, --top-files each print their own table and can be freely combined
  • 59 built-in pattern rules for detecting temp files, caches, and build artifacts across dozens of ecosystems (Node, Python, Rust, Go, JVM, Swift, C++, and more)
  • Fully configurable via JSON config with pattern overrides and custom paths
  • Analysis-only — never deletes, moves, or modifies your files

Quick Start

Requires Python 3.13+ and uv.

# Clone and install
git clone https://github.com/mdemirhan/dux.git
cd dux
uv sync

# Analyze current directory (summary table)
uv run dux

# Analyze a specific path
uv run dux ~/src

# Include apparent (logical) file size column
uv run dux -A ~/src

# Largest temp/build artifacts
uv run dux -t ~/src

# Combine flags: summary + cache + temp
uv run dux -c -t ~/src

# Interactive TUI
uv run dux -i ~/src

Shell Alias

To call dux directly from anywhere, add an alias to your shell config (~/.bashrc or ~/.zshrc):

# Using free-threaded Python 3.14t (recommended for best performance)
alias dux='uv run --python 3.14t --project /path/to/dux dux'

# Or with standard Python
alias dux='uv run --project /path/to/dux dux'

Replace /path/to/dux with the actual clone path. Then:

dux ~/src          # CLI summary
dux -i ~/src       # Interactive TUI
dux -v ~/src       # Verbose (shows GIL status, scanner, timing)

TUI Views

Switch views with Tab/Shift+Tab or press the shortcut key directly.

Key View Description
o Overview Total disk usage, file/dir counts, temp/cache/build totals, largest directories
b Browse Expandable directory tree with disk usage bars
d Directories by Size Paginated list of largest directories
f Files by Size Paginated list of largest individual files
t Temporary Files All detected temp, cache, and build artifact items

Keybindings

Navigation

Key Action
j / k / Arrow keys Move up/down
gg / G Jump to top/bottom
Ctrl+U / Ctrl+D Page up/down
[ / ] Previous/next page (paginated views)

Browse View

Key Action
l / Right / Enter Expand or drill into directory
h / Left / Backspace Collapse or go to parent
Space Toggle expand/collapse

General

Key Action
/ Search/filter rows
Escape Clear active filter
y Yank full path to clipboard
Y Yank display name to clipboard
? Toggle help overlay
q Quit

CLI Options

uv run dux [PATH] [OPTIONS]

By default dux prints a CLI summary table. Use --interactive / -i to launch the TUI. The --top-* flags are composable — use multiple at once to print additional tables.

Option Description
--interactive / -i Launch interactive TUI
--apparent-size / -A Show apparent size column (logical file size)
--top-temp / -t Largest temp/build artifacts
--top-cache / -c Largest cache files/directories
--top-dirs / -d Largest directories
--top-files / -f Largest files
--top Number of items in --top-* views (default: 15)
--workers / -w Number of scan threads (default: 4)
--max-depth Maximum directory depth to scan
--max-insights Max insights per category
--overview-dirs Top directories shown in TUI overview
--scroll-step Lines to jump on PgUp/PgDn in TUI
--page-size Rows per page in TUI
--scanner / -S Scanner variant: auto, python, posix, macos (default: auto)
--verbose / -v Print GIL status, scanner, and timing info
--sample-config Print full sample config and exit

Configuration

Config file: ~/.config/dux/config.json

Generate a sample config with all defaults:

uv run dux --sample-config > ~/.config/dux/config.json

Key settings:

{
  "scanWorkers": 4,
  "maxDepth": null,
  "topCount": 15,
  "pageSize": 100,
  "overviewTopDirs": 100,
  "scrollStep": 20,
  "maxInsightsPerCategory": 1000,
  "additionalTempPaths": [],
  "additionalCachePaths": [],
  "tempPatterns": [...],
  "cachePatterns": [...],
  "buildArtifactPatterns": [...]
}

Each pattern rule:

{
  "name": "npm cache",
  "pattern": "**/.npm/**",
  "category": "cache",
  "applyTo": "both",
  "stopRecursion": false
}

Performance

Benchmarks

Measured with hyperfine (5 runs, 1 warmup) on a MacBook Pro M4 with Python 3.14t (free-threaded), 4 workers:

Command ~/src (295k files, 38k dirs) ~ (2.1M files, 323k dirs)
du -sh 1.150s 37.818s
dux --scanner macos 0.811s 19.784s
dux --scanner posix 2.429s 24.417s
dux --scanner python 2.577s 25.156s

Relative to the fastest (dux --scanner macos):

Command ~/src ~
dux --scanner macos 1.00x 1.00x
du -sh 1.42x 1.91x
dux --scanner posix 3.00x 1.23x
dux --scanner python 3.18x 1.27x

The macOS scanner (getattrlistbulk) fetches stat info in bulk per directory, avoiding per-file syscalls. Single-threaded du falls further behind as the tree grows. The posix and python scanners are close on macOS because readdir doesn't bundle stat info (unlike Linux), so both end up doing per-file lstat calls.

Note on the du comparison: du -sh only traverses, stats, and sums — dux does all of that plus builds a full in-memory tree, pattern-matches every node against 59 rules (Aho-Corasick + hash lookups), generates categorized insights, and renders Rich output. dux does strictly more work and is still faster with the macOS scanner. The one caveat is that du deduplicates hard-linked files by inode while dux does not, though this is negligible on most home directories.

Scanner Backends

The scanner is I/O-bound. dux ships three scanner backends and automatically selects the best one for your platform:

Scanner Platform Mechanism
NativeScanner (macos) macOS (default) C extension using getattrlistbulk — fetches all entries + stat data in a single syscall per batch
NativeScanner (posix) Linux (GIL enabled) C extension using readdir + lstat — releases the GIL during I/O for better thread utilization
PythonScanner Fallback / GIL disabled Pure Python via os.scandir — also used for testing via the FileSystem abstraction

Override with --scanner posix|macos|python.

Free-Threaded Python

dux supports free-threaded Python (3.13t+). All three C extensions (_walker, _ac_matcher, _prefix_trie) declare Py_MOD_GIL_NOT_USED, enabling true parallel execution without GIL contention. Use --verbose to see GIL status and active scanner at runtime.

When the GIL is disabled, default_scanner() selects PythonScanner — the C readdir wrapper's overhead becomes negligible compared to the parallelism gains from true multi-threading, and the pure Python scanner has the advantage of working through the FileSystem abstraction layer.

Pattern Matching

Pattern matching (insight generation) is the second-hottest path after scanning. dux avoids naive fnmatch-per-rule by classifying all 59 rules at compile time into fast string operations:

  • EXACTdict lookup on lowercased basename — O(1)
  • CONTAINS + ENDSWITH — Aho-Corasick automaton (C extension) for multi-pattern search in a single pass over the path — O(path_length). ENDSWITH suffixes are added as end-only keys, matched only when they occur at the end of the path
  • STARTSWITH — PrefixTrie (C extension) walks the basename once, collecting all matching prefixes — O(basename_length) regardless of pattern count
  • GLOB — fallback to fnmatch only for patterns that can't be decomposed

Brace expansion ({a,b}) is resolved at compile time. All matcher values are lowercased once at build time; paths are lowercased once per node for case-insensitive matching.

See docs/aho-corasick.md and docs/prefix-trie.md for detailed algorithm explanations, and docs/architecture.md for the full end-to-end pipeline.

Development

# Install with dev dependencies
uv sync

# Run tests
uv run pytest

# Lint and format
uv run ruff check
uv run ruff format

# Type check
uv run basedpyright

Tech Stack

Component Tool
CLI framework Typer
TUI framework Textual
Terminal rendering Rich
Error handling result (Rust-style Result[T, E])
Type checking basedpyright (standard mode)
Linting/formatting Ruff
Testing pytest
Package management uv

License

MIT

About

A disk usage tool written in Python

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •