-
-
Notifications
You must be signed in to change notification settings - Fork 793
llama-quantize crashes with ios_base::failbit when using --tensor-type-file or --tensor-type (regression from #20503) #68
Description
Describe the bug
After PR #20503, llama-quantize (and any tool using --tensor-type-file / --tensor-type) spams the following message for many models:
str: cannot properly format tensor name position_embd with suffix=weight bid=-1 xid=-1
str: cannot properly format tensor name token_types with suffix=weight bid=-1 xid=-1
This spam eventually causes:
llama_model_quantize: failed to quantize: ios_base::failbit set: iostream stream error
main: failed to quantize model from '...'
The crash happens reliably on Qwen3.5 models (and likely other architectures that have non-standard tensors like position_embd.weight, token_types.weight, token_embd.weight etc.).
Steps to reproduce
- Use any recent build of llama.cpp after PR #20503 (including llama-cpp-turboquant).
- Create a tensor-type config (example for Qwen3.5-4B with 32 layers):
position_embd.weight=Q8_0
token_types.weight=Q8_0
token_embd.weight=Q8_0
output.weight=Q8_0
output_norm.weight=Q8_0
# Middle layers (example)
blk.2.attn_q.weight=tq4_1s
... (all the way to blk.29)- Run:
llama-quantize --allow-requantize \
--tensor-type-file config_i.txt \
Qwen3.5-4B-Q8_0.gguf \
Qwen3.5-4B-TQ4_1S.gguf \
Q8_0→ Gets flooded with cannot properly format tensor name messages → crashes with ios_base::failbit.
Even using multiple --tensor-type "blk.*.weight=..." arguments produces the same spam and crash.
Expected behavior
- Custom tensor types should be applied silently.
- Non-critical tensors (
bid=-1) should not spam or cause iostream failure. - Quantization should complete successfully.
Actual behavior
Massive spam + hard crash before any output file is written.
Environment
- Model: Qwen3.5-4B (and probably any Qwen 3.x)
- llama.cpp version: post #20503 (tested on latest master + llama-cpp-turboquant)
- Command:
llama-quantizewith--tensor-type-fileor--tensor-type - OS: Windows / Linux (reproduced on both)
Related issues
- This is the exact same regression described in #21115 ("Eval bug: regression introduced in #20503")
- The problem is much more severe during quantization because it aborts the entire process.
Workaround (temporary)
Downgrade to a commit before #20503 (e.g. git checkout 2026-03-20 or earlier).
Would be great if the tensor name formatter could gracefully skip/handle tensors with bid=-1 / xid=-1 without spamming and without breaking the iostream.