-
Notifications
You must be signed in to change notification settings - Fork 100
Open
Labels
bugSomething isn't workingSomething isn't working
Description
Device & OS
- Hardware: Raspberry Pi 3B+
- OS: Raspberry Pi OS 64-bit, Debian 1:6.12.62-1+rpt1 (2025-12-18) aarch64 GNU/Linux
- Compiler: gcc 14.2.0
Model
- Model file: tinyllama-1.1b-chat-v1.0.Q4_K_M.gguf
- Quantization: Q4_K_M
What happened?
I am getting nowhere near the 4 tk/s for the Raspberry Pi 3B+
Command you ran
picolm models/tinyllama-1.1b-chat-v1.0.Q4_K_M.gguf -p "The capital of France is" -n 10 -t 0 -j 4Expected output
Generation speed that's close to 4 tk/s
Actual output
Model config:
n_embd=2048, n_ffn=5632, n_heads=32, n_kv_heads=4
n_layers=22, vocab_size=32000, max_seq=2048
head_dim=64, rope_base=10000.0
Allocating 1.17 MB for runtime state (+ 44.00 MB FP16 KV cache)
Tokenizer loaded: 32000 tokens, bos=1, eos=2
Prompt: 6 tokens, generating up to 10 (temp=0.00, top_p=0.90, threads=4)
---
Paris.
2. B.C. The
---
Prefill: 6 tokens in 166.62s (0.0 tok/s)
Generation: 11 tokens in 278.45s (0.0 tok/s)
Total: 445.07s
Memory: 45.17 MB runtime state (FP16 KV cache)
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working