Sia Code

Local-first codebase search with semantic understanding and multi-hop code discovery.

Benchmark Results

89.9% Recall@5 on RepoEval benchmark (1,600 queries, 8 repositories)

+12.9 percentage points better than cAST (77.0%)
Lexical-only search outperforms hybrid (BM25 > BM25+embeddings)
Publication-quality results with ±1.5% confidence interval

See docs/BENCHMARK_RESULTS.md for full analysis.

Features

89.9% Recall@5 - State-of-the-art code search performance on RepoEval benchmark
Lexical-First Search - BM25 + FTS5 optimized for code queries (outperforms semantic-only)
Multi-Hop Research - Automatically discover code relationships and call graphs
AST-Aware Chunking - Tree-sitter preserves function/class boundaries
Project Auto-Detection - Automatic language detection and indexing strategy
Tiered Search - Filter by project code, dependencies, or both
12 Languages - Python, JS/TS, Go, Rust, Java, C/C++, C#, Ruby, PHP (full AST support)
Watch Mode - Auto-reindex on file changes with incremental updates
Portable Index - Usearch HNSW + SQLite FTS5 in .sia-code/ directory

Installation

# From PyPI (recommended)
pip install sia-code

# Or with uv
uv tool install sia-code

# Or from source
uv tool install git+https://github.com/DxTa/sia-code.git

# Try without installing (ephemeral run)
uvx sia-code --version
uvx sia-code search "authentication logic"

# Verify installation
sia-code --version

Quick Start

# Initialize and index
sia-code init
sia-code index .

# Search
sia-code search "authentication logic"           # Hybrid search (default: BM25 + semantic)
sia-code search --regex "def.*login"             # Lexical-only search (BM25)
sia-code search --semantic-only "handle errors"  # Semantic-only search

# Multi-hop research (discover relationships)
sia-code research "how does the API handle errors?"

# Check index health
sia-code status

Commands

Command	Description
`sia-code init`	Initialize index in current directory
`sia-code index .`	Index codebase
`sia-code index --update`	Re-index changed files only
`sia-code index --watch`	Auto-reindex on file changes
`sia-code search "query"`	Hybrid search (BM25 + semantic)
`sia-code search --regex "pattern"`	Lexical-only search
`sia-code search --semantic-only "query"`	Semantic-only search
`sia-code research "question"`	Multi-hop code discovery
`sia-code status`	Index health and staleness metrics
`sia-code compact`	Remove stale chunks
`sia-code memory list`	List timeline/changelogs/decisions
`sia-code memory changelog`	Generate changelog from git
`sia-code memory sync-git`	Import events from git history
`sia-code config show`	Display configuration
`sia-code interactive`	Live search mode

See docs/CLI_FEATURES.md for complete command reference with all options and examples.

Configuration

Recommended: Lexical-only search (best performance, no API key needed)

sia-code init
sia-code index .
# Search uses BM25 by default (89.9% Recall@5)

Optional: Hybrid search (adds semantic embeddings):

export OPENAI_API_KEY=sk-your-key-here
sia-code config set embedding.enabled true
sia-code config set search.vector_weight 0.0  # 0.0 = lexical-only (recommended!)
sia-code index --clean

Edit config at .sia-code/config.json to:

Set vector_weight (0.0 = lexical-only, 0.5 = hybrid, 1.0 = semantic-only)
Change embedding model (BAAI/bge-small-en-v1.5, openai-small)
Exclude patterns (node_modules/, __pycache__/, etc.)
Adjust chunk sizes (max_chunk_size, min_chunk_size)

View config: sia-code config show

AI Summarization (optional, enhances git changelogs):

{
  "summarization": {
    "enabled": true,
    "model": "google/flan-t5-base",
    "max_commits": 20
  }
}

Output Formats

sia-code search "query" --format json            # JSON output
sia-code search "query" --format table           # Rich table
sia-code search "query" --format csv             # CSV for Excel
sia-code search "query" --output results.json    # Save to file

Supported Languages

Full AST Support (12): Python, JavaScript, TypeScript, JSX, TSX, Go, Rust, Java, C, C++, C#, Ruby, PHP

Recognized: Kotlin, Groovy, Swift, Bash, Vue, Svelte, and more (indexed as text)

Troubleshooting

Issue	Solution
No API key warning	Normal - searches fallback to lexical mode
Index growing large	Run `sia-code compact` to remove stale chunks
Slow indexing	Use `sia-code index --update` for incremental
Stale search results	Run `sia-code index --clean` to rebuild

How It Works

Parse - Tree-sitter generates language-agnostic AST for each file
Chunk - AST-aware chunking preserves function/class boundaries (max 1200 chars)
Index - Usearch HNSW (vectors) + SQLite FTS5 (lexical search with BM25)
Store - Portable .sia-code/ directory (17-25 MB per repo)
Search - Lexical-first (BM25) with optional hybrid fusion (RRF)

Key Innovation: Lexical-only search (BM25) outperforms hybrid (BM25+embeddings) for code queries because code contains precise identifiers that benefit from exact keyword matching.

Documentation

Architecture & Implementation

docs/ARCHITECTURE.md - System design, data structures, and technology stack
docs/CODE_STRUCTURE.md - Codebase organization and key classes
docs/INDEXING.md - Indexing pipeline and AST-aware chunking
docs/QUERYING.md - Search methods and hybrid fusion

Benchmark Results

docs/BENCHMARK_RESULTS.md - 89.9% Recall@5 full results and analysis
docs/BENCHMARK_METHODOLOGY.md - RepoEval benchmark setup
docs/PERFORMANCE_ANALYSIS.md - Why sia-code outperforms cAST by +12.9 pts

Usage & Configuration

docs/CLI_FEATURES.md - Complete CLI reference and examples
examples/ - Test results and usage examples
ROADMAP.md - Development progress
KNOWN_LIMITATIONS.md - Current limitations and workarounds

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 89 Commits
.github/workflows		.github/workflows
.pci		.pci
docs		docs
examples		examples
results/academic		results/academic
scripts		scripts
sia_code		sia_code
test_project/.pci		test_project/.pci
tests		tests
.gitignore		.gitignore
DAEMON_USAGE_GUIDE.md		DAEMON_USAGE_GUIDE.md
EMBEDDING_SERVER_VERIFICATION.md		EMBEDDING_SERVER_VERIFICATION.md
FINAL_SUMMARY.md		FINAL_SUMMARY.md
MODEL_COMPARISON.md		MODEL_COMPARISON.md
README.md		README.md
ROADMAP.md		ROADMAP.md
TEST_RESULTS.md		TEST_RESULTS.md
pyproject.toml		pyproject.toml
test_auto_unload.py		test_auto_unload.py
test_embedding_server.py		test_embedding_server.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Sia Code

Benchmark Results

Features

Installation

Quick Start

Commands

Configuration

Output Formats

Supported Languages

Troubleshooting

How It Works

Documentation

Architecture & Implementation

Benchmark Results

Usage & Configuration

License

About

Uh oh!

Releases 7

Packages

Contributors 2

Uh oh!

Languages

DxTa/sia-code

Folders and files

Latest commit

History

Repository files navigation

Sia Code

Benchmark Results

Features

Installation

Quick Start

Commands

Configuration

Output Formats

Supported Languages

Troubleshooting

How It Works

Documentation

Architecture & Implementation

Benchmark Results

Usage & Configuration

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases 7

Packages 0

Contributors 2

Uh oh!

Languages

Packages