SAE-instrumented LLM inference server. Run a local LLM with real-time Sparse Autoencoder (SAE) feature extraction — see what concepts the model is "thinking about" as it generates each token.
Neuroscope wraps mistral.rs for inference and hooks into the model's forward pass to extract activations, run them through a pre-trained SAE encoder, and stream the top activated features over SSE.
Chat API (:8080) Features API (:8081)
POST /v1/chat/completions GET /v1/features/stream → SSE
GET /v1/models GET /v1/features/labels → JSON
│ ▲
▼ │
┌─────────────────────────────────────┤
│ Inference Engine (mistral.rs) │
│ │
│ Transformer layer 20 ──hook──► SAE Encoder
│ │
│ Token output ───────────────► broadcast channel
└─────────────────────────────────────┘
The chat API is fully OpenAI-compatible — use it with any existing client (Continue, Open WebUI, curl, etc.). The features API is a separate port that streams SAE feature activations as the model generates tokens.
- Rust 1.88+
- ~5 GB disk space for model weights (downloaded automatically on first run)
- A HuggingFace account with access to google/gemma-2-2b-it
- macOS Metal builds: The Metal Toolchain must be installed. If you get
cannot execute tool 'metal'errors, run:xcodebuild -downloadComponent MetalToolchain
# macOS (Metal GPU)
cargo build --release -p neuroscope-cli --features metal
# NVIDIA GPU
cargo build --release -p neuroscope-cli --features cuda
# CPU only
cargo build --release -p neuroscope-cliAll examples below use neuroscope as shorthand for ./target/release/neuroscope.
On first run, model and SAE weights are downloaded from HuggingFace and cached locally. Before serving, you need calibration data (for filtering noisy features) and feature labels (for human-readable descriptions).
Option A: Pull pre-generated data (~1 min):
neuroscope pullOption B: Generate from scratch (~2+ hours, requires GPU + API key):
neuroscope calibrate run # ~10 min
neuroscope labels generate # ~2 hours, requires OPENROUTER_API_KEY or ANTHROPIC_API_KEYBoth generate commands auto-download a WikiText corpus and resume if interrupted. See Environment variables for API key setup.
Then serve:
neuroscope serveYou can skip steps 1–2 and go straight to serve — it will fall back to Neuronpedia labels and disable filtering, but the results will be noisier.
In one terminal, listen for feature activations:
curl -N http://localhost:8081/v1/features/streamIn another, send a chat request:
curl http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "gemma-2-2b-it",
"messages": [{"role": "user", "content": "What is the capital of France?"}]
}'The features stream will emit SSE events like:
event: feature_activation
data: {"token_index":0,"token":"Paris","layer":20,"top_features":[{"index":4521,"label":"geography or place names","activation":3.82},{"index":12033,"label":"European countries and capitals","activation":2.14}]}
Neuroscope loads a .env file from the working directory if present.
| Variable | Purpose |
|---|---|
OPENROUTER_API_KEY |
API key for OpenRouter (used for label generation) |
ANTHROPIC_API_KEY |
API key for Anthropic (fallback for label generation) |
LABELER_MODEL |
LLM model for label generation [default: deepseek/deepseek-v3.2] |
LABELER_BACKEND |
API backend: auto, openrouter, anthropic, or a custom base URL [default: auto] |
With auto backend, Neuroscope checks OPENROUTER_API_KEY first, then falls back to ANTHROPIC_API_KEY.
Start the inference server with SAE instrumentation.
neuroscope serve [OPTIONS]
Options:
--model <ID> HuggingFace model ID [default: google/gemma-2-2b-it]
--sae <ID> HuggingFace SAE repo ID [default: google/gemma-scope-2b-pt-res]
--sae-path <PATH> Path within SAE repo [default: layer_20/width_16k/average_l0_71/params.npz]
--sae-local-path <PATH> Local path to SAE npz file (bypasses HF download)
--sae-layer <N> Transformer layer to hook [default: 20]
--top-k <N> Number of top features per token [default: 10]
--port <PORT> Chat API port [default: 8080]
--features-port <PORT> Features SSE port [default: 8081]
--feature-filter <MODE> Filter mode: none, frequency, surprise, combined [default: combined]
--filter-threshold <F> Filter threshold (e.g. max firing rate) [default: 0.5]
--calibration-path <PATH> Path to calibration stats JSON (auto-detected from cache if omitted)
--labeler <MODEL> Labeler model whose labels to use [default: deepseek/deepseek-v3.2] [env: LABELER_MODEL]
Feature filtering is enabled by default. To disable it:
neuroscope serve --feature-filter noneFilter modes:
- frequency — exclude features that fire more than
--filter-thresholdfraction of the time (removes "always-on" features) - surprise — rank features by how surprising their activation is relative to calibration statistics
- combined — exclude high-frequency features, then rank the rest by surprise (default)
Build or inspect the text corpus used for calibration and label generation.
neuroscope corpus build [OPTIONS]
Options:
--samples <N> Number of samples to download [default: 2000]
--output <DIR> Output directory [default: ~/.cache/neuroscope/corpus/]
--dataset <NAME> HuggingFace dataset [default: wikitext]
--config <NAME> Dataset config/subset [default: wikitext-103-raw-v1]
neuroscope corpus show [--path <DIR>]
Downloads WikiText-103 parquet files from HuggingFace Hub and extracts text samples locally. No Python dependency needed.
You don't need to run this manually — calibrate run and labels generate will auto-download the corpus if --corpus-path is not provided.
Collect per-feature firing statistics from a text corpus. This data is used by the feature filter.
neuroscope calibrate run [OPTIONS]
Options:
--model <ID> HuggingFace model ID [default: google/gemma-2-2b-it]
--sae <ID> HuggingFace SAE repo ID [default: google/gemma-scope-2b-pt-res]
--sae-path <PATH> Path within SAE repo [default: layer_20/width_16k/average_l0_71/params.npz]
--sae-local-path <PATH> Local path to SAE npz file
--sae-layer <N> Transformer layer [default: 20]
--corpus-path <DIR> Directory of .txt files (auto-downloads WikiText if omitted)
--samples <N> Max samples to process [default: 1000]
--output <PATH> Output path (defaults to ~/.cache/neuroscope/calibration/)
neuroscope calibrate show --stats <PATH> [--top <N>]
Generate, score, and inspect feature labels.
Run a corpus through the model+SAE to collect max-activating examples, then call an LLM API to auto-generate human-readable labels for each feature.
neuroscope labels generate [OPTIONS]
Options:
--model <ID> HuggingFace model ID [default: google/gemma-2-2b-it]
--sae <ID> HuggingFace SAE repo ID [default: google/gemma-scope-2b-pt-res]
--sae-path <PATH> Path within SAE repo
--sae-local-path <PATH> Local path to SAE npz file
--sae-layer <N> Transformer layer [default: 20]
--corpus-path <DIR> Directory of .txt files (auto-downloads WikiText if omitted)
--samples <N> Max samples to process [default: 5000]
--examples-per-feature <N> Max-activating examples per feature [default: 20]
--labeler <MODEL> LLM model for labeling [default: deepseek/deepseek-v3.2] [env: LABELER_MODEL]
--labeler-backend <BACKEND> API backend: auto, openrouter, anthropic, or custom URL [default: auto] [env: LABELER_BACKEND]
--concurrency <N> Max concurrent API requests [default: 50]
--score Also run detection scoring after generation
--output <PATH> Output path (defaults to ~/.cache/neuroscope/autointerp_labels/<labeler>/)
Labels are namespaced by labeler model, so you can generate and compare labels from different LLMs:
# Generate with DeepSeek V3.2 (default, via OpenRouter)
neuroscope labels generate --samples 5000 --concurrency 50
# Generate with Claude Haiku for comparison
neuroscope labels generate --samples 5000 --concurrency 50 \
--labeler anthropic/claude-haiku-4-5
# Generate with GPT-4o
neuroscope labels generate --samples 5000 --concurrency 50 \
--labeler openai/gpt-4oThe pipeline has two phases, both with checkpointing:
- Corpus pass (~75 min for 5K samples) — runs text through the model+SAE, saves checkpoint every 50 samples. Shared across all labelers.
- Label generation (~30 min via OpenRouter) — calls the LLM API for each feature, saves each label as it completes.
Both phases resume automatically if interrupted.
Uses structured output (JSON schema) when the backend supports it (OpenRouter, OpenAI-compatible) for reliable label parsing.
Score existing labels using detection accuracy.
neuroscope labels score --labels <PATH> [--labeler <MODEL>] [--threshold <F>]
Show labels for specific features.
neuroscope labels show --features 4521,3022,11612
Manage cached data. All clean operations move data to a trash folder and are reversible.
neuroscope clean corpus # Trash downloaded corpus text files
neuroscope clean checkpoints # Trash corpus pass checkpoints
neuroscope clean labels # Trash all generated feature labels
neuroscope clean calibration # Trash calibration statistics
neuroscope clean all # Trash everything
neuroscope clean show # Show what's in the trash
neuroscope clean undo # Restore the most recently trashed item
neuroscope clean empty # Permanently delete everything in the trash
neuroscope models list # Show HuggingFace cache info
neuroscope models pull <ID> # Pre-download a model
neuroscope sae list # Show cached SAEs
neuroscope sae pull <ID> # Pre-download an SAE
Sync labels and calibration data with HuggingFace, so new users can skip the hours of compute needed to generate them.
neuroscope push [OPTIONS]
Options:
--repo <ID> HuggingFace dataset repo [default: cjroth/neuroscope]
--message <MSG> Commit message
neuroscope pull [OPTIONS]
Options:
--repo <ID> HuggingFace dataset repo [default: cjroth/neuroscope]
Pull downloads labels and calibration data directly into the local cache — no further setup needed, just neuroscope serve after pulling.
Push requires a HuggingFace write token (huggingface-cli login).
When loading feature labels, Neuroscope checks sources in this order:
- Auto-interp labels (namespaced by
--labelermodel) — highest quality - Auto-interp labels (legacy non-namespaced path)
- Neuronpedia cache (fetched from neuronpedia.org)
- Neuronpedia API (live fetch, cached on first use)
- Numeric fallback (
feature_0,feature_1, ...)
Gemma Scope SAEs were trained on RMS-normalized hidden states. Neuroscope applies the same normalization before encoding: each hidden state vector is divided by its root mean square, scaling it to unit RMS before the SAE matmul. This prevents distorted activations from magnitude differences between raw and training-time inputs.
Standard OpenAI chat completions API. Supports "stream": true.
Lists the loaded model.
SSE stream of feature activations. Each feature_activation event corresponds to one generated token. A generation_complete event is sent when generation finishes.
When feature filtering is enabled, events include a filtered_count field showing how many features were suppressed. Features may also include a surprise score when using surprise or combined filter modes.
Returns the full label map as JSON — a mapping from feature index to human-readable description.
The project is split into four crates:
| Crate | Purpose |
|---|---|
neuroscope-core |
SAE encoder, types, labels, calibration, filtering, auto-interp, corpus download, scoring |
neuroscope-engine |
Inference engine wrapping mistral.rs with SAE hook wiring, corpus runner |
neuroscope-server |
Axum HTTP servers for chat API and features SSE |
neuroscope-cli |
CLI binary tying everything together |
Model: Gemma 2 2B IT — small enough to run on a laptop, with the best SAE coverage of any open model.
SAE: Gemma Scope layer 20, 16K width — 16,384 learned features extracted from the residual stream at layer 20/26. Deep enough to capture semantic concepts (not just syntax). Uses JumpReLU activation with ~71 features firing per token on average.
The chat API on :8080 is strictly OpenAI-compatible so it works as a drop-in replacement with existing tools. The features stream on :8081 is a separate concern — this keeps the chat API clean and lets you consume features independently (terminal logger, web UI, or both at once via the broadcast channel).
# Unit tests (no model download needed)
cargo test
# Integration tests (requires model weights)
NEUROSCOPE_INTEGRATION_TESTS=1 cargo testMIT