Cerebellum

Ablation-informed mixed-precision quantization for GGUF models.

Instead of applying the same quant level to every tensor, Cerebellum measures the actual sensitivity of each one and allocates bits where they matter.

How It Works

Ablate — crush each tensor individually to Q2_K, measure the perplexity impact
Allocate — sacred tensors (high PPL delta) get promoted to Q6_K/Q8_0, demotable tensors (negative delta) stay at Q2_K, everything else fills in to meet the size budget
Build — llama-quantize --tensor-type @tensor_types.txt applies per-tensor overrides

Install

pip install git+https://github.com/deucebucket/cerebellum.git

Requires PyTorch and Transformers. llama.cpp required for ablation sweep and final quantization.

Usage

1. Generate Importance Matrix (~60 seconds, CPU only)

python -m cerebellum.imatrix_stream \
    --model Qwen/Qwen3.6-27B \
    --output imatrix.dat -v

Computes channel sensitivity directly from weight statistics. No calibration data, no GPU.

2. Run Ablation Sweep

python -m cerebellum.cerebellum ablate \
    --base-gguf model-Q2_K.gguf \
    --tensors ablation_plan.json \
    --output ablation_results.json

Crushes each tensor to Q2_K one at a time, measures the real perplexity delta.

3. Allocate Budget

python -m cerebellum.cerebellum allocate \
    --ablation ablation_results.json \
    --budget 12.0 \
    --output tensor_types.txt

Generates per-tensor quant level assignments for a target file size.

4. Build the GGUF

llama-quantize --imatrix imatrix.dat \
    --tensor-type @tensor_types.txt \
    model-f16.gguf model-cerebellum.gguf Q2_K

Models

Qwen3.6-27B-Cerebellum-v4-GGUF — 12 GB, PPL 7.034, 181 overrides
Qwen3.6-27B-Osmosis-Q2_K-GGUF — 10 GB, PPL 7.500, imatrix baseline

License

Apache 2.0

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
cerebellum		cerebellum
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Cerebellum

How It Works

Install

Usage

1. Generate Importance Matrix (~60 seconds, CPU only)

2. Run Ablation Sweep

3. Allocate Budget

4. Build the GGUF

Models

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Cerebellum

How It Works

Install

Usage

1. Generate Importance Matrix (~60 seconds, CPU only)

2. Run Ablation Sweep

3. Allocate Budget

4. Build the GGUF

Models

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages