Skip to content

llama-quantize crashes with ios_base::failbit when using --tensor-type-file or --tensor-type (regression from #20503) #68

@dogukanatakul

Description

@dogukanatakul

Describe the bug

After PR #20503, llama-quantize (and any tool using --tensor-type-file / --tensor-type) spams the following message for many models:

str: cannot properly format tensor name position_embd with suffix=weight bid=-1 xid=-1
str: cannot properly format tensor name token_types with suffix=weight bid=-1 xid=-1

This spam eventually causes:

llama_model_quantize: failed to quantize: ios_base::failbit set: iostream stream error
main: failed to quantize model from '...'

The crash happens reliably on Qwen3.5 models (and likely other architectures that have non-standard tensors like position_embd.weight, token_types.weight, token_embd.weight etc.).

Steps to reproduce

  1. Use any recent build of llama.cpp after PR #20503 (including llama-cpp-turboquant).
  2. Create a tensor-type config (example for Qwen3.5-4B with 32 layers):
position_embd.weight=Q8_0
token_types.weight=Q8_0
token_embd.weight=Q8_0
output.weight=Q8_0
output_norm.weight=Q8_0

# Middle layers (example)
blk.2.attn_q.weight=tq4_1s
... (all the way to blk.29)
  1. Run:
llama-quantize --allow-requantize \
  --tensor-type-file config_i.txt \
  Qwen3.5-4B-Q8_0.gguf \
  Qwen3.5-4B-TQ4_1S.gguf \
  Q8_0

→ Gets flooded with cannot properly format tensor name messages → crashes with ios_base::failbit.

Even using multiple --tensor-type "blk.*.weight=..." arguments produces the same spam and crash.

Expected behavior

  • Custom tensor types should be applied silently.
  • Non-critical tensors (bid=-1) should not spam or cause iostream failure.
  • Quantization should complete successfully.

Actual behavior

Massive spam + hard crash before any output file is written.

Environment

  • Model: Qwen3.5-4B (and probably any Qwen 3.x)
  • llama.cpp version: post #20503 (tested on latest master + llama-cpp-turboquant)
  • Command: llama-quantize with --tensor-type-file or --tensor-type
  • OS: Windows / Linux (reproduced on both)

Related issues

  • This is the exact same regression described in #21115 ("Eval bug: regression introduced in #20503")
  • The problem is much more severe during quantization because it aborts the entire process.

Workaround (temporary)

Downgrade to a commit before #20503 (e.g. git checkout 2026-03-20 or earlier).

Would be great if the tensor name formatter could gracefully skip/handle tensors with bid=-1 / xid=-1 without spamming and without breaking the iostream.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions