dontmerge: support talkie 13b by georgewhewell · Pull Request #508 · hellas-ai/catgrad

georgewhewell · 2026-04-29T10:02:42Z

❯ cargo run --release --features metal --example llama -- --backend candle -p 'i think that category theory is ' -m /tmp/talkie-convert/talkie-base-out/  --raw -k -s 40 --dtype bf16
    Finished [`release` profile [optimized]](https://doc.rust-lang.org/cargo/reference/profiles.html#default-profiles) target(s) in 0.12s
     Running `target/release/examples/llama --backend candle -p 'i think that category theory is ' -m /tmp/talkie-convert/talkie-base-out/ --raw -k -s 40 --dtype bf16`
Model weights loaded for /tmp/talkie-convert/talkie-base-out/ in 2.46 seconds
i think that category theory is 1t is the only theory that will explain the facts. The facts are that the world is a world of change, and that the only thing that is permanent is the law of change. The law
40 tokens generated in 19 seconds. (2.10 tps)

`get_model_files` and `get_model_chat_template` now treat the model identifier as a local directory if it's an existing path on disk; that directory must look like a HuggingFace snapshot (config.json, tokenizer.json, tokenizer_config.json, and either model.safetensors or model.safetensors.index.json + shards). Otherwise the existing HF hub download path is used unchanged.

Talkie is a 40-layer/40-head decoder-only transformer (talkie-lm.com, github.com/talkie-lm/talkie) with the standard Llama backbone plus four small departures, all expressible with existing catgrad operators: 1. RMSNorm everywhere is unweighted (F.rms_norm with no gamma), including a norm immediately after the embedding. 2. QK-norm — RMSNorm is applied to Q and K after RoPE. 3. Per-head and per-layer learned gains — head_gain ([H]) on Q after QK-norm, and scalar attn_gain / mlp_gain / embed_skip on the residual branches. 4. Embedding-skip residual — the post-input-norm activations are threaded through every block as e_x and added back via a learned scalar. The lm_head is an untied [V, D] parameter (not a Linear) scaled by a learned scalar (lm_head_gain.w_g) before the final matmul. Talkie's RoPE uses the opposite sin convention from catgrad's default; we negate cache.sin once after init to match. Architecture string: TalkieForCausalLM. End-to-end inference reproduces the upstream pytorch reference byte-for-byte at greedy argmax for short sequences in bf16; on longer sequences the cross-implementation bf16 noise floor (Metal vs CPU) flips one borderline argmax per ~40 tokens on some prompts. Test harness in scripts/compare/talkie_compare.sh. Helpers: - scripts/convert_talkie.py: pickle -> safetensors + tokenizer + config - scripts/llm_talkie.py: greedy-argmax pytorch reference - scripts/compare/talkie_compare.sh: token-level stability matrix

The decoder stack now reads from `model.embed.weight`, `model.blocks.{i}.…` — matching the HF port at `lewtun/talkie-1930-13b-it-hf` (`TalkieForCausalLM` with `self.model = TalkieModel(…)` and `lm_head`/`lm_head_gain.w_g` at the root). That repo includes a full HF-format checkpoint plus a `tokenizer.json` already in HF tokenizers form, so our pickle→safetensors converter and greedy-argmax reference are no longer needed: - rm catgrad-llm/scripts/convert_talkie.py - rm catgrad-llm/scripts/llm_talkie.py - rm catgrad-llm/scripts/compare/talkie_compare.sh End-to-end run: ./target/release/examples/llama -m lewtun/talkie-1930-13b-it-hf \ -k -s 60 --dtype bf16 -p "Write a short poem about the wireless telegraph."

georgewhewell added 3 commits April 29, 2026 11:38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

dontmerge: support talkie 13b#508

dontmerge: support talkie 13b#508
georgewhewell wants to merge 3 commits intohellas-ai:masterfrom
georgewhewell:grw/feat/local-model+talkie

georgewhewell commented Apr 29, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

georgewhewell commented Apr 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

georgewhewell commented Apr 29, 2026 •

edited

Loading