Local First AI

Benchmark data, scripts, and configuration for running production AI on local hardware.

This repo accompanies the blog at localfirstai.eu.

What's here

Benchmarks for Gemma 4 26B (MoE, Q4_K_M) on Apple Silicon M4 Pro with 64GB unified memory, running via Ollama.

Phase 1 — Context window vs inference speed (4K–130K tokens). Result: generation speed is flat at ~41 t/s across all context sizes. No memory cliff.
Phase 1b — Thinking mode cost isolation. Result: thinking adds 5–15x token overhead on simple tasks for zero quality improvement.

Full analysis: Should We Stop Asking Local LLMs to Think?

Setup

Component	Value
Hardware	Mac Mini M4 Pro, 64GB unified memory
Model	gemma4-think:26b (MoE, 25.8B params, Q4_K_M)
Runtime	Ollama 0.20.2, KEEP_ALIVE=-1, FLASH_ATTENTION=1
Context	130,000 tokens
Gateway	OpenClaw → Telegram

Running the benchmarks

# Phase 1: context window vs performance
chmod +x benchmarks/nestor-bench-phase1.sh
./benchmarks/nestor-bench-phase1.sh

# Phase 1b: thinking token cost
chmod +x benchmarks/nestor-bench-phase1b.sh
./benchmarks/nestor-bench-phase1b.sh

Requires: ollama, jq, python3, and a model pulled via ollama pull gemma4:26b.

Results are saved as CSV to benchmarks/results/.

Key findings

Context size doesn't matter on Apple Silicon. gen_tps holds at ~41 t/s from 4K to 130K.
Thinking mode is the bottleneck. A JSON-to-table conversion: 31 tokens (1s) without thinking, 451 tokens (11s) with thinking. Same output.
Ambiguous prompts + thinking = runaway. In production with accumulated session context, thinking generates 10,000–25,000 hidden tokens. At 38 t/s, that's 4–11 minutes per response.
The fix is architectural. Default to think=off. Decompose tasks. Let the human be the reasoning layer.

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
benchmarks		benchmarks
content/posts		content/posts
tasks/chronos		tasks/chronos
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Local First AI

What's here

Setup

Running the benchmarks

Key findings

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Local First AI

What's here

Setup

Running the benchmarks

Key findings

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages