Skip to content

P2: model_quantized.onnx filename not recognized as quantized — wrong param estimate #167

@Shreyas582

Description

@Shreyas582

Description

The model parameter estimation logic uses filename patterns to detect quantization (e.g., q4, int8, q4f16). However, models named model_quantized.onnx (a common HuggingFace Optimum naming convention) do not match the quantization pattern, causing the parameter count to be estimated from raw file size without the quantization divisor.

This leads to incorrect capability tier classification — a quantized 0.5B model may be estimated at 1.4B parameters and still classified as Basic, or a quantized 3B model might be estimated at ~10B and classified as Strong.

Reproduction

wraithrun --live \
  --model C:\Models\Qwen2.5-0.5B-Instruct-ONNX\onnx\model_quantized.onnx \
  --tokenizer C:\Models\Qwen2.5-0.5B-Instruct-ONNX \
  --task ssh-keys
# Log shows: "Estimated 1.4B params" (should be ~0.5B)

Expected Behavior

Filename patterns should also match:

  • quantized / model_quantized
  • quant / model_quant
  • Or better: read ONNX metadata (if available) for actual quantization info

Affected Files

  • inference_bridge/src/lib.rs or inference_bridge/src/onnx_vitis.rs (filename-based param estimation)

Metadata

Metadata

Assignees

No one assigned

    Labels

    area:inferenceInference engine, model loading, execution providersbugSomething isn't workinglive-testing-auditFrom v1.6.0 live-mode comprehensive testingpriority:p2Normal-priority issue

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions