Neuroscope

SAE-instrumented LLM inference server. Run a local LLM with real-time Sparse Autoencoder (SAE) feature extraction — see what concepts the model is "thinking about" as it generates each token.

Neuroscope wraps mistral.rs for inference and hooks into the model's forward pass to extract activations, run them through a pre-trained SAE encoder, and stream the top activated features over SSE.

How it works

 Chat API (:8080)              Features API (:8081)
 POST /v1/chat/completions     GET /v1/features/stream → SSE
 GET  /v1/models               GET /v1/features/labels → JSON
        │                              ▲
        ▼                              │
 ┌─────────────────────────────────────┤
 │  Inference Engine (mistral.rs)      │
 │                                     │
 │  Transformer layer 20 ──hook──► SAE Encoder
 │                                  │
 │  Token output ───────────────► broadcast channel
 └─────────────────────────────────────┘

The chat API is fully OpenAI-compatible — use it with any existing client (Continue, Open WebUI, curl, etc.). The features API is a separate port that streams SAE feature activations as the model generates tokens.

Quickstart

Prerequisites

Rust 1.88+
~5 GB disk space for model weights (downloaded automatically on first run)
A HuggingFace account with access to google/gemma-2-2b-it
macOS Metal builds: The Metal Toolchain must be installed. If you get cannot execute tool 'metal' errors, run:
```
xcodebuild -downloadComponent MetalToolchain
```

Build

# macOS (Metal GPU)
cargo build --release -p neuroscope-cli --features metal

# NVIDIA GPU
cargo build --release -p neuroscope-cli --features cuda

# CPU only
cargo build --release -p neuroscope-cli

Getting started

All examples below use neuroscope as shorthand for ./target/release/neuroscope.

On first run, model and SAE weights are downloaded from HuggingFace and cached locally. Before serving, you need calibration data (for filtering noisy features) and feature labels (for human-readable descriptions).

Option A: Pull pre-generated data (~1 min):

neuroscope pull

Option B: Generate from scratch (~2+ hours, requires GPU + API key):

neuroscope calibrate run          # ~10 min
neuroscope labels generate        # ~2 hours, requires OPENROUTER_API_KEY or ANTHROPIC_API_KEY

Both generate commands auto-download a WikiText corpus and resume if interrupted. See Environment variables for API key setup.

Then serve:

neuroscope serve

You can skip steps 1–2 and go straight to serve — it will fall back to Neuronpedia labels and disable filtering, but the results will be noisier.

Use

In one terminal, listen for feature activations:

curl -N http://localhost:8081/v1/features/stream

In another, send a chat request:

curl http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gemma-2-2b-it",
    "messages": [{"role": "user", "content": "What is the capital of France?"}]
  }'

The features stream will emit SSE events like:

event: feature_activation
data: {"token_index":0,"token":"Paris","layer":20,"top_features":[{"index":4521,"label":"geography or place names","activation":3.82},{"index":12033,"label":"European countries and capitals","activation":2.14}]}

Environment variables

Neuroscope loads a .env file from the working directory if present.

Variable	Purpose
`OPENROUTER_API_KEY`	API key for OpenRouter (used for label generation)
`ANTHROPIC_API_KEY`	API key for Anthropic (fallback for label generation)
`LABELER_MODEL`	LLM model for label generation [default: `deepseek/deepseek-v3.2`]
`LABELER_BACKEND`	API backend: `auto`, `openrouter`, `anthropic`, or a custom base URL [default: `auto`]

With auto backend, Neuroscope checks OPENROUTER_API_KEY first, then falls back to ANTHROPIC_API_KEY.

CLI reference

`neuroscope serve`

Start the inference server with SAE instrumentation.

neuroscope serve [OPTIONS]

Options:
  --model <ID>              HuggingFace model ID [default: google/gemma-2-2b-it]
  --sae <ID>                HuggingFace SAE repo ID [default: google/gemma-scope-2b-pt-res]
  --sae-path <PATH>         Path within SAE repo [default: layer_20/width_16k/average_l0_71/params.npz]
  --sae-local-path <PATH>   Local path to SAE npz file (bypasses HF download)
  --sae-layer <N>           Transformer layer to hook [default: 20]
  --top-k <N>               Number of top features per token [default: 10]
  --port <PORT>             Chat API port [default: 8080]
  --features-port <PORT>    Features SSE port [default: 8081]
  --feature-filter <MODE>   Filter mode: none, frequency, surprise, combined [default: combined]
  --filter-threshold <F>    Filter threshold (e.g. max firing rate) [default: 0.5]
  --calibration-path <PATH> Path to calibration stats JSON (auto-detected from cache if omitted)
  --labeler <MODEL>         Labeler model whose labels to use [default: deepseek/deepseek-v3.2] [env: LABELER_MODEL]

Feature filtering is enabled by default. To disable it:

neuroscope serve --feature-filter none

Filter modes:

frequency — exclude features that fire more than --filter-threshold fraction of the time (removes "always-on" features)
surprise — rank features by how surprising their activation is relative to calibration statistics
combined — exclude high-frequency features, then rank the rest by surprise (default)

`neuroscope corpus`

Build or inspect the text corpus used for calibration and label generation.

neuroscope corpus build [OPTIONS]

Options:
  --samples <N>             Number of samples to download [default: 2000]
  --output <DIR>            Output directory [default: ~/.cache/neuroscope/corpus/]
  --dataset <NAME>          HuggingFace dataset [default: wikitext]
  --config <NAME>           Dataset config/subset [default: wikitext-103-raw-v1]

neuroscope corpus show [--path <DIR>]

Downloads WikiText-103 parquet files from HuggingFace Hub and extracts text samples locally. No Python dependency needed.

You don't need to run this manually — calibrate run and labels generate will auto-download the corpus if --corpus-path is not provided.

`neuroscope calibrate`

Collect per-feature firing statistics from a text corpus. This data is used by the feature filter.

neuroscope calibrate run [OPTIONS]

Options:
  --model <ID>              HuggingFace model ID [default: google/gemma-2-2b-it]
  --sae <ID>                HuggingFace SAE repo ID [default: google/gemma-scope-2b-pt-res]
  --sae-path <PATH>         Path within SAE repo [default: layer_20/width_16k/average_l0_71/params.npz]
  --sae-local-path <PATH>   Local path to SAE npz file
  --sae-layer <N>           Transformer layer [default: 20]
  --corpus-path <DIR>       Directory of .txt files (auto-downloads WikiText if omitted)
  --samples <N>             Max samples to process [default: 1000]
  --output <PATH>           Output path (defaults to ~/.cache/neuroscope/calibration/)

neuroscope calibrate show --stats <PATH> [--top <N>]

`neuroscope labels`

Generate, score, and inspect feature labels.

`labels generate`

Run a corpus through the model+SAE to collect max-activating examples, then call an LLM API to auto-generate human-readable labels for each feature.

neuroscope labels generate [OPTIONS]

Options:
  --model <ID>                   HuggingFace model ID [default: google/gemma-2-2b-it]
  --sae <ID>                     HuggingFace SAE repo ID [default: google/gemma-scope-2b-pt-res]
  --sae-path <PATH>              Path within SAE repo
  --sae-local-path <PATH>        Local path to SAE npz file
  --sae-layer <N>                Transformer layer [default: 20]
  --corpus-path <DIR>            Directory of .txt files (auto-downloads WikiText if omitted)
  --samples <N>                  Max samples to process [default: 5000]
  --examples-per-feature <N>     Max-activating examples per feature [default: 20]
  --labeler <MODEL>              LLM model for labeling [default: deepseek/deepseek-v3.2] [env: LABELER_MODEL]
  --labeler-backend <BACKEND>    API backend: auto, openrouter, anthropic, or custom URL [default: auto] [env: LABELER_BACKEND]
  --concurrency <N>              Max concurrent API requests [default: 50]
  --score                        Also run detection scoring after generation
  --output <PATH>                Output path (defaults to ~/.cache/neuroscope/autointerp_labels/<labeler>/)

Labels are namespaced by labeler model, so you can generate and compare labels from different LLMs:

# Generate with DeepSeek V3.2 (default, via OpenRouter)
neuroscope labels generate --samples 5000 --concurrency 50

# Generate with Claude Haiku for comparison
neuroscope labels generate --samples 5000 --concurrency 50 \
  --labeler anthropic/claude-haiku-4-5

# Generate with GPT-4o
neuroscope labels generate --samples 5000 --concurrency 50 \
  --labeler openai/gpt-4o

The pipeline has two phases, both with checkpointing:

Corpus pass (~75 min for 5K samples) — runs text through the model+SAE, saves checkpoint every 50 samples. Shared across all labelers.
Label generation (~30 min via OpenRouter) — calls the LLM API for each feature, saves each label as it completes.

Both phases resume automatically if interrupted.

Uses structured output (JSON schema) when the backend supports it (OpenRouter, OpenAI-compatible) for reliable label parsing.

`labels score`

Score existing labels using detection accuracy.

neuroscope labels score --labels <PATH> [--labeler <MODEL>] [--threshold <F>]

`labels show`

Show labels for specific features.

neuroscope labels show --features 4521,3022,11612

`neuroscope clean`

Manage cached data. All clean operations move data to a trash folder and are reversible.

neuroscope clean corpus       # Trash downloaded corpus text files
neuroscope clean checkpoints  # Trash corpus pass checkpoints
neuroscope clean labels       # Trash all generated feature labels
neuroscope clean calibration  # Trash calibration statistics
neuroscope clean all          # Trash everything

neuroscope clean show         # Show what's in the trash
neuroscope clean undo         # Restore the most recently trashed item
neuroscope clean empty        # Permanently delete everything in the trash

`neuroscope models` / `neuroscope sae`

neuroscope models list      # Show HuggingFace cache info
neuroscope models pull <ID> # Pre-download a model
neuroscope sae list         # Show cached SAEs
neuroscope sae pull <ID>    # Pre-download an SAE

`neuroscope push` / `neuroscope pull`

Sync labels and calibration data with HuggingFace, so new users can skip the hours of compute needed to generate them.

neuroscope push [OPTIONS]

Options:
  --repo <ID>       HuggingFace dataset repo [default: cjroth/neuroscope]
  --message <MSG>   Commit message

neuroscope pull [OPTIONS]

Options:
  --repo <ID>       HuggingFace dataset repo [default: cjroth/neuroscope]

Pull downloads labels and calibration data directly into the local cache — no further setup needed, just neuroscope serve after pulling.

Push requires a HuggingFace write token (huggingface-cli login).

Label priority

When loading feature labels, Neuroscope checks sources in this order:

Auto-interp labels (namespaced by --labeler model) — highest quality
Auto-interp labels (legacy non-namespaced path)
Neuronpedia cache (fetched from neuronpedia.org)
Neuronpedia API (live fetch, cached on first use)
Numeric fallback (feature_0, feature_1, ...)

Input normalization

Gemma Scope SAEs were trained on RMS-normalized hidden states. Neuroscope applies the same normalization before encoding: each hidden state vector is divided by its root mean square, scaling it to unit RMS before the SAE matmul. This prevents distorted activations from magnitude differences between raw and training-time inputs.

API

`POST /v1/chat/completions` (port 8080)

Standard OpenAI chat completions API. Supports "stream": true.

`GET /v1/models` (port 8080)

Lists the loaded model.

`GET /v1/features/stream` (port 8081)

SSE stream of feature activations. Each feature_activation event corresponds to one generated token. A generation_complete event is sent when generation finishes.

When feature filtering is enabled, events include a filtered_count field showing how many features were suppressed. Features may also include a surprise score when using surprise or combined filter modes.

`GET /v1/features/labels` (port 8081)

Returns the full label map as JSON — a mapping from feature index to human-readable description.

Architecture

The project is split into four crates:

Crate	Purpose
`neuroscope-core`	SAE encoder, types, labels, calibration, filtering, auto-interp, corpus download, scoring
`neuroscope-engine`	Inference engine wrapping mistral.rs with SAE hook wiring, corpus runner
`neuroscope-server`	Axum HTTP servers for chat API and features SSE
`neuroscope-cli`	CLI binary tying everything together

Default model and SAE

Model: Gemma 2 2B IT — small enough to run on a laptop, with the best SAE coverage of any open model.

SAE: Gemma Scope layer 20, 16K width — 16,384 learned features extracted from the residual stream at layer 20/26. Deep enough to capture semantic concepts (not just syntax). Uses JumpReLU activation with ~71 features firing per token on average.

Why two ports?

The chat API on :8080 is strictly OpenAI-compatible so it works as a drop-in replacement with existing tools. The features stream on :8081 is a separate concern — this keeps the chat API clean and lets you consume features independently (terminal logger, web UI, or both at once via the broadcast channel).

Tests

# Unit tests (no model download needed)
cargo test

# Integration tests (requires model weights)
NEUROSCOPE_INTEGRATION_TESTS=1 cargo test

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
circuit-tracer		circuit-tracer
crates		crates
spec		spec
vendor/mistral.rs		vendor/mistral.rs
.gitignore		.gitignore
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
README.md		README.md
spec.md		spec.md

Folders and files

Latest commit

History

Repository files navigation

Neuroscope

How it works

Quickstart

Prerequisites

Build

Getting started

Use

Environment variables

CLI reference

neuroscope serve

neuroscope corpus

neuroscope calibrate

neuroscope labels

labels generate

labels score

labels show

neuroscope clean

neuroscope models / neuroscope sae

neuroscope push / neuroscope pull

Label priority

Input normalization

API

POST /v1/chat/completions (port 8080)

GET /v1/models (port 8080)

GET /v1/features/stream (port 8081)

GET /v1/features/labels (port 8081)

Architecture

Default model and SAE

Why two ports?

Tests

License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`neuroscope serve`

`neuroscope corpus`

`neuroscope calibrate`

`neuroscope labels`

`labels generate`

`labels score`

`labels show`

`neuroscope clean`

`neuroscope models` / `neuroscope sae`

`neuroscope push` / `neuroscope pull`

`POST /v1/chat/completions` (port 8080)

`GET /v1/models` (port 8080)

`GET /v1/features/stream` (port 8081)

`GET /v1/features/labels` (port 8081)

Packages