Skip to content

zoobz-io/chisel

Repository files navigation

chisel

CI Status codecov Go Report Card CodeQL Go Reference License Go Version Release

AST-aware code chunking for semantic search and embeddings. Chisel parses source code into meaningful units—functions, classes, methods—preserving the context that makes code searchable.

From Syntax to Semantics

source := []byte(`
func New(cfg Config) *Handler { ... }

func (h *Handler) ServeHTTP(w http.ResponseWriter, r *http.Request) { ... }

type Config struct {
    Timeout time.Duration
    Logger  *slog.Logger
}
`)

chunks, _ := c.Chunk(ctx, chisel.Go, "api.go", source)

for _, chunk := range chunks {
    fmt.Printf("[%s] %s (lines %d-%d)\n", chunk.Kind, chunk.Symbol, chunk.StartLine, chunk.EndLine)
}
// [function] New (lines 2-2)
// [method] Handler.ServeHTTP (lines 4-4)
// [class] Config (lines 6-9)

Every chunk carries its symbol name, kind, line range, and parent context. Methods know their receiver. Nested types know their enclosing scope.

chunk := chunks[1]
// chunk.Symbol    → "Handler.ServeHTTP"
// chunk.Kind      → "method"
// chunk.Context   → ["Handler"]
// chunk.Content   → the full method source
// chunk.StartLine → 4
// chunk.EndLine   → 4

Feed chunks to an embedding model, store in a vector database, and search code by meaning rather than text.

Install

go get github.com/zoobz-io/chisel

Language providers (install only what you need):

go get github.com/zoobz-io/chisel/golang     # Go (stdlib, no deps)
go get github.com/zoobz-io/chisel/markdown   # Markdown (no deps)
go get github.com/zoobz-io/chisel/typescript # TypeScript/JavaScript (tree-sitter)
go get github.com/zoobz-io/chisel/python     # Python (tree-sitter)
go get github.com/zoobz-io/chisel/rust       # Rust (tree-sitter)

Requires Go 1.24+.

Quick Start

package main

import (
    "context"
    "fmt"

    "github.com/zoobz-io/chisel"
    "github.com/zoobz-io/chisel/golang"
    "github.com/zoobz-io/chisel/typescript"
)

func main() {
    // Create a chunker with language providers
    c := chisel.New(
        golang.New(),
        typescript.New(),
        typescript.NewJavaScript(),
    )

    source := []byte(`
package auth

// Authenticate validates user credentials.
func Authenticate(username, password string) (*User, error) {
    // ...
}

// User represents an authenticated user.
type User struct {
    ID    string
    Email string
}
`)

    chunks, err := c.Chunk(context.Background(), chisel.Go, "auth.go", source)
    if err != nil {
        panic(err)
    }

    for _, chunk := range chunks {
        fmt.Printf("[%s] %s\n", chunk.Kind, chunk.Symbol)
        fmt.Printf("  Lines: %d-%d\n", chunk.StartLine, chunk.EndLine)
        if len(chunk.Context) > 0 {
            fmt.Printf("  Context: %v\n", chunk.Context)
        }
    }
}

Output:

[function] Authenticate
  Lines: 4-6
[class] User
  Lines: 8-12

Capabilities

Feature Description Docs
Multi-language Go, TypeScript, JavaScript, Python, Rust, Markdown Providers
Semantic extraction Functions, methods, classes, interfaces, types, enums Concepts
Context preservation Parent chain for nested definitions Architecture
Line mapping Precise source locations for each chunk Types
Zero-copy providers Go and Markdown use stdlib only Architecture

Why Chisel?

  • Semantic boundaries — Chunks split at function/class boundaries, not arbitrary line counts
  • Embedding-ready — Output designed for vector databases and semantic search
  • Isolated dependencies — Tree-sitter only where needed; Go/Markdown have zero external deps
  • Context-aware — Methods know their parent class; nested functions know their scope
  • Consistent interface — Same Provider contract across all languages

Code Intelligence Pipelines

Chisel enables a pattern: parse once, search by meaning.

Your codebase becomes a corpus of semantic units. Each function, method, and type gets embedded with its full context — symbol name, parent scope, documentation. Queries match intent, not just text.

// Chunk your codebase
chunks, _ := c.Chunk(ctx, chisel.Go, path, source)

// Embed each chunk (using your embedding provider)
for _, chunk := range chunks {
    embedding := embedder.Embed(chunk.Content)
    vectorDB.Store(embedding, chunk.Symbol, chunk.Kind, path)
}

// Search by meaning
results := vectorDB.Query("authentication middleware")
// Returns: AuthMiddleware, ValidateToken, SessionHandler
// Not just files containing the word "authentication"

Symbol names and kinds become metadata. Line ranges enable source navigation. Context chains power hierarchical search.

Ecosystem

Chisel provides the chunking layer for code intelligence pipelines:

  • vicky — Code search and retrieval service

Documentation

Contributing

Contributions welcome. See CONTRIBUTING.md for guidelines.

License

MIT — see LICENSE for details.