Skip to content

FYI got a little more speed #16

@stuartskelton

Description

@stuartskelton

hi with the following patch

diff --git a/picolm/Makefile b/picolm/Makefile
index 4fd3c7a..a298dce 100644
--- a/picolm/Makefile
+++ b/picolm/Makefile
@@ -1,6 +1,6 @@
 CC      = gcc
-CFLAGS  = -O2 -std=c11 -D_GNU_SOURCE -Wall -Wextra -Wpedantic
-LDFLAGS = -lm -lpthread
+CFLAGS  = -O3 -std=c11 -D_GNU_SOURCE -Wall -Wextra -Wpedantic -ffast-math -funroll-loops -flto
+LDFLAGS = -lm -lpthread -flto
 SRCS    = picolm.c model.c tensor.c quant.c tokenizer.c sampler.c grammar.c
 TARGET  = picolm

I went from

./picolm tinyllama-1.1b-chat-v1.0.Q4_K_M.gguf -p 'write me a haiku'
Loading model: tinyllama-1.1b-chat-v1.0.Q4_K_M.gguf
Model config:
  n_embd=2048, n_ffn=5632, n_heads=32, n_kv_heads=4
  n_layers=22, vocab_size=32000, max_seq=2048
  head_dim=64, rope_base=10000.0
Allocating 1.17 MB for runtime state (+ 44.00 MB FP16 KV cache)
Tokenizer loaded: 32000 tokens, bos=1, eos=2
Prompt: 6 tokens, generating up to 256 (temp=0.80, top_p=0.90, threads=4)
---
 about the feeling of being lost in a forest.</s>
---
Prefill: 6 tokens in 1.68s (3.6 tok/s)
Generation: 11 tokens in 2.63s (4.2 tok/s)
Total: 4.30s
Memory: 45.17 MB runtime state (FP16 KV cache)

to

/picolm tinyllama-1.1b-chat-v1.0.Q4_K_M.gguf -p 'write me a haiku'
Loading model: tinyllama-1.1b-chat-v1.0.Q4_K_M.gguf
Model config:
  n_embd=2048, n_ffn=5632, n_heads=32, n_kv_heads=4
  n_layers=22, vocab_size=32000, max_seq=2048
  head_dim=64, rope_base=10000.0
Allocating 1.17 MB for runtime state (+ 44.00 MB FP16 KV cache)
Tokenizer loaded: 32000 tokens, bos=1, eos=2
Prompt: 6 tokens, generating up to 256 (temp=0.80, top_p=0.90, threads=4)
---
 about the feeling of being lost in a forest.</s>
---
Prefill: 6 tokens in 0.47s (12.8 tok/s)
Generation: 11 tokens in 0.77s (14.2 tok/s)
Total: 1.24s

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions