-
Notifications
You must be signed in to change notification settings - Fork 113
Description
Defect report: Quark BFPQuantizeDequantize abort during ORT session creation (YOLOv8x BFP16)
Title
BFPQuantizeDequantize custom op crashes ONNX Runtime during session creation (shape inference) for YOLOv8x BFP16; isolated to single node (idx=993); workaround: replace with Identity
Summary
Loading a BFP16-quantized YOLOv8x ONNX model in ONNX Runtime with Quark custom ops registered causes a native abort during InferenceSession creation.
I was experimenting with different data and op types in AMD Quark (from Ryzen AI Software 1.7.0 on Ryzen AI HX 370 using Ubuntu 25.10 after building XRT from source and updating to the latest ROCm and ONNX runtimes). I used Yolov8x as my float model, and successfully experimented with INT8 and BF16 before moving on to BFP16 and MX9.
Investigation shows the abort occurs inside the Quark custom ops library (libcustom_ops.so) during shape inference for BFPFixNeuron.
Graph bisection isolated the crash to a single BFPQuantizeDequantize node (node index 993). Replacing only that node with Identity unblocks session creation.
A minimal standalone reproducer containing a BFPQuantizeDequantize node with the same input tensor rank/shape ([1,80,8400]) loads successfully, suggesting the failure is context-dependent (graph metadata / surrounding topology), not purely based on tensor shape.
Environment
- Host:
jc01(Ryzen AI / AMD platform) - OS:
Linux-6.17.0-12-generic-x86_64-with-glibc2.42 - Python:
3.12.12 | packaged by conda-forge | (main, Jan 26 2026, 23:51:32) [GCC 14.3.0] - ONNX Runtime:
1.22.1 - Quark:
amd-quark 0.11(import path:/home/johnk/miniforge3/envs/quark312/lib/python3.12/site-packages/quark/__init__.py) - Custom ops library:
quark/onnx/operators/custom_ops/lib/libcustom_ops.so - Execution Provider:
CPUExecutionProvider - Custom op registration:
import onnxruntime as ort
from quark.onnx import get_library_path
so = ort.SessionOptions()
so.register_custom_ops_library(get_library_path("CPU"))Problem details
- Failure mode: process abort during
onnxruntime.InferenceSession(...)creation. - Observed abort signature: assertion failure from
std::vector::operator[](native crash). - Crash location: inside Quark custom ops library
libcustom_ops.soinBFPFixNeuronshape inference.
Evidence (gdb backtrace)
Attach the full backtrace captured on jc01. Key frames indicate a crash during custom-op shape inference inside libcustom_ops.so for BFPFixNeuron.
- gdb backtrace: TODO (paste full text)
Steps to reproduce
- Install Quark + ONNX Runtime (versions above).
- Ensure the Quark custom ops library is available.
- Register custom ops library and create a session:
import onnxruntime as ort
from quark.onnx import get_library_path
model_path = "yolov8x.bfp16.v1.onnx" # failing model
so = ort.SessionOptions()
so.register_custom_ops_library(get_library_path("CPU"))
sess = ort.InferenceSession(model_path, sess_options=so, providers=["CPUExecutionProvider"])- Observe abort during session creation (native crash).
Bisection / isolation result
The crash persists until isolating a single node:
- Node index:
993 - Op type:
BFPQuantizeDequantize - Node name:
/model.22/Sigmoid_output_0_DequantizeLinear - Input producer:
Sigmoid - Observed input shape at that edge:
[1, 80, 8400]
Workaround
Replace only node idx=993 with Identity (preserving input/output tensor name wiring) and save a patched model.
- Patched artifact:
yolov8x.bfp16.v1.patched-only-node993.onnx - Result: ORT session creation succeeds reliably.
Session creation confirmation (patched model)
Run:
import onnxruntime as ort
from quark.onnx import get_library_path
model="/home/johnk/experiments/quark-yolov8x-exp/2026-02-09-jc01/models/onnx/bfp16/yolov8x.bfp16.v1.patched-only-node993.onnx"
so = ort.SessionOptions()
so.graph_optimization_level = ort.GraphOptimizationLevel.ORT_DISABLE_ALL
so.register_custom_ops_library(get_library_path("CPU"))
print("registered custom ops; optimizations disabled")
sess = ort.InferenceSession(model, sess_options=so, providers=["CPUExecutionProvider"])
print("PATCHED (only node 993) session OK")Observed output:
registered custom ops; optimizations disabled
PATCHED (only node 993) session OK
Performance sanity check (patched model)
Configuration:
- ORT graph optimizations:
ORT_DISABLE_ALL - Provider:
CPUExecutionProvider - Input: random FP32 tensor with shape
(1,3,640,640) - Warmup:
5 - Timed iterations:
50
Result:
sec_total=640.4786179065704sec_per_iter=12.809572358131408out0_shape=(1, 84, 8400)out0_dtype=float32
Additional observations
- A minimal standalone ONNX model containing only
BFPQuantizeDequantizewith input shape[1,80,8400]loads successfully (after forcing ONNX IR version 10 in the minimal repro model). - Earlier minimal repro also showed rank-0 (scalar) inputs can crash
BFPQuantizeDequantizeshape inference (separate issue), but the YOLOv8x crash was isolated to the above non-scalar node. - Since the standalone reproducer works, this points to a context-dependent shape inference bug in the custom op implementation (graph metadata/value_info/dynamic dims/multiple consumers/etc.).
Related issues (ruled out / adjacent symptoms)
These issues are not the same root cause as this defect, but they are related to earlier symptoms we ruled out while diagnosing Linux + native plugin/custom-op availability.
- On Linux, OGA NPU Execution Mode can't work because of no onnx_custom_ops.so #265 (Linux OGA NPU execution mode fails, missing/expected
onnx_custom_ops.so) - Linux NPU support blocked by missing proprietary blobs! #335 (Linux packages missing actual shared libraries due to Git LFS pointer files)
Expected behavior
- ORT session creation should not abort.
- If an input is unsupported, the custom op should return a recoverable error with a descriptive message rather than aborting.
Actual behavior
- Native abort during session initialization, no Python exception.
Related investigation notes
See:
docs/articles/productize_amd_ai_workflow/quark-experiments/2026-02-09-yolov8x-quark-pipeline-experiment.md