Description
The model parameter estimation logic uses filename patterns to detect quantization (e.g., q4, int8, q4f16). However, models named model_quantized.onnx (a common HuggingFace Optimum naming convention) do not match the quantization pattern, causing the parameter count to be estimated from raw file size without the quantization divisor.
This leads to incorrect capability tier classification — a quantized 0.5B model may be estimated at 1.4B parameters and still classified as Basic, or a quantized 3B model might be estimated at ~10B and classified as Strong.
Reproduction
wraithrun --live \
--model C:\Models\Qwen2.5-0.5B-Instruct-ONNX\onnx\model_quantized.onnx \
--tokenizer C:\Models\Qwen2.5-0.5B-Instruct-ONNX \
--task ssh-keys
# Log shows: "Estimated 1.4B params" (should be ~0.5B)
Expected Behavior
Filename patterns should also match:
quantized / model_quantized
quant / model_quant
- Or better: read ONNX metadata (if available) for actual quantization info
Affected Files
inference_bridge/src/lib.rs or inference_bridge/src/onnx_vitis.rs (filename-based param estimation)
Description
The model parameter estimation logic uses filename patterns to detect quantization (e.g.,
q4,int8,q4f16). However, models namedmodel_quantized.onnx(a common HuggingFace Optimum naming convention) do not match the quantization pattern, causing the parameter count to be estimated from raw file size without the quantization divisor.This leads to incorrect capability tier classification — a quantized 0.5B model may be estimated at 1.4B parameters and still classified as Basic, or a quantized 3B model might be estimated at ~10B and classified as Strong.
Reproduction
Expected Behavior
Filename patterns should also match:
quantized/model_quantizedquant/model_quantAffected Files
inference_bridge/src/lib.rsorinference_bridge/src/onnx_vitis.rs(filename-based param estimation)