-
Notifications
You must be signed in to change notification settings - Fork 100
Open
Description
hi with the following patch
diff --git a/picolm/Makefile b/picolm/Makefile
index 4fd3c7a..a298dce 100644
--- a/picolm/Makefile
+++ b/picolm/Makefile
@@ -1,6 +1,6 @@
CC = gcc
-CFLAGS = -O2 -std=c11 -D_GNU_SOURCE -Wall -Wextra -Wpedantic
-LDFLAGS = -lm -lpthread
+CFLAGS = -O3 -std=c11 -D_GNU_SOURCE -Wall -Wextra -Wpedantic -ffast-math -funroll-loops -flto
+LDFLAGS = -lm -lpthread -flto
SRCS = picolm.c model.c tensor.c quant.c tokenizer.c sampler.c grammar.c
TARGET = picolmI went from
./picolm tinyllama-1.1b-chat-v1.0.Q4_K_M.gguf -p 'write me a haiku'
Loading model: tinyllama-1.1b-chat-v1.0.Q4_K_M.gguf
Model config:
n_embd=2048, n_ffn=5632, n_heads=32, n_kv_heads=4
n_layers=22, vocab_size=32000, max_seq=2048
head_dim=64, rope_base=10000.0
Allocating 1.17 MB for runtime state (+ 44.00 MB FP16 KV cache)
Tokenizer loaded: 32000 tokens, bos=1, eos=2
Prompt: 6 tokens, generating up to 256 (temp=0.80, top_p=0.90, threads=4)
---
about the feeling of being lost in a forest.</s>
---
Prefill: 6 tokens in 1.68s (3.6 tok/s)
Generation: 11 tokens in 2.63s (4.2 tok/s)
Total: 4.30s
Memory: 45.17 MB runtime state (FP16 KV cache)
to
/picolm tinyllama-1.1b-chat-v1.0.Q4_K_M.gguf -p 'write me a haiku'
Loading model: tinyllama-1.1b-chat-v1.0.Q4_K_M.gguf
Model config:
n_embd=2048, n_ffn=5632, n_heads=32, n_kv_heads=4
n_layers=22, vocab_size=32000, max_seq=2048
head_dim=64, rope_base=10000.0
Allocating 1.17 MB for runtime state (+ 44.00 MB FP16 KV cache)
Tokenizer loaded: 32000 tokens, bos=1, eos=2
Prompt: 6 tokens, generating up to 256 (temp=0.80, top_p=0.90, threads=4)
---
about the feeling of being lost in a forest.</s>
---
Prefill: 6 tokens in 0.47s (12.8 tok/s)
Generation: 11 tokens in 0.77s (14.2 tok/s)
Total: 1.24s
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels