FYI got a little more speed

hi with the following patch 

```diff
diff --git a/picolm/Makefile b/picolm/Makefile
index 4fd3c7a..a298dce 100644
--- a/picolm/Makefile
+++ b/picolm/Makefile
@@ -1,6 +1,6 @@
 CC      = gcc
-CFLAGS  = -O2 -std=c11 -D_GNU_SOURCE -Wall -Wextra -Wpedantic
-LDFLAGS = -lm -lpthread
+CFLAGS  = -O3 -std=c11 -D_GNU_SOURCE -Wall -Wextra -Wpedantic -ffast-math -funroll-loops -flto
+LDFLAGS = -lm -lpthread -flto
 SRCS    = picolm.c model.c tensor.c quant.c tokenizer.c sampler.c grammar.c
 TARGET  = picolm
```

I went from 

```
./picolm tinyllama-1.1b-chat-v1.0.Q4_K_M.gguf -p 'write me a haiku'
Loading model: tinyllama-1.1b-chat-v1.0.Q4_K_M.gguf
Model config:
  n_embd=2048, n_ffn=5632, n_heads=32, n_kv_heads=4
  n_layers=22, vocab_size=32000, max_seq=2048
  head_dim=64, rope_base=10000.0
Allocating 1.17 MB for runtime state (+ 44.00 MB FP16 KV cache)
Tokenizer loaded: 32000 tokens, bos=1, eos=2
Prompt: 6 tokens, generating up to 256 (temp=0.80, top_p=0.90, threads=4)
---
 about the feeling of being lost in a forest.</s>
---
Prefill: 6 tokens in 1.68s (3.6 tok/s)
Generation: 11 tokens in 2.63s (4.2 tok/s)
Total: 4.30s
Memory: 45.17 MB runtime state (FP16 KV cache)
```

to 
```
/picolm tinyllama-1.1b-chat-v1.0.Q4_K_M.gguf -p 'write me a haiku'
Loading model: tinyllama-1.1b-chat-v1.0.Q4_K_M.gguf
Model config:
  n_embd=2048, n_ffn=5632, n_heads=32, n_kv_heads=4
  n_layers=22, vocab_size=32000, max_seq=2048
  head_dim=64, rope_base=10000.0
Allocating 1.17 MB for runtime state (+ 44.00 MB FP16 KV cache)
Tokenizer loaded: 32000 tokens, bos=1, eos=2
Prompt: 6 tokens, generating up to 256 (temp=0.80, top_p=0.90, threads=4)
---
 about the feeling of being lost in a forest.</s>
---
Prefill: 6 tokens in 0.47s (12.8 tok/s)
Generation: 11 tokens in 0.77s (14.2 tok/s)
Total: 1.24s
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FYI got a little more speed #16

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

FYI got a little more speed #16

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions