Skip to content

BFPQuantizeDequantize crashes ONNX Runtime during session creation #339

@jensjohansen

Description

@jensjohansen

Defect report: Quark BFPQuantizeDequantize abort during ORT session creation (YOLOv8x BFP16)

Title

BFPQuantizeDequantize custom op crashes ONNX Runtime during session creation (shape inference) for YOLOv8x BFP16; isolated to single node (idx=993); workaround: replace with Identity

Summary

Loading a BFP16-quantized YOLOv8x ONNX model in ONNX Runtime with Quark custom ops registered causes a native abort during InferenceSession creation.

I was experimenting with different data and op types in AMD Quark (from Ryzen AI Software 1.7.0 on Ryzen AI HX 370 using Ubuntu 25.10 after building XRT from source and updating to the latest ROCm and ONNX runtimes). I used Yolov8x as my float model, and successfully experimented with INT8 and BF16 before moving on to BFP16 and MX9.

Investigation shows the abort occurs inside the Quark custom ops library (libcustom_ops.so) during shape inference for BFPFixNeuron.

Graph bisection isolated the crash to a single BFPQuantizeDequantize node (node index 993). Replacing only that node with Identity unblocks session creation.

A minimal standalone reproducer containing a BFPQuantizeDequantize node with the same input tensor rank/shape ([1,80,8400]) loads successfully, suggesting the failure is context-dependent (graph metadata / surrounding topology), not purely based on tensor shape.

Environment

  • Host: jc01 (Ryzen AI / AMD platform)
  • OS: Linux-6.17.0-12-generic-x86_64-with-glibc2.42
  • Python: 3.12.12 | packaged by conda-forge | (main, Jan 26 2026, 23:51:32) [GCC 14.3.0]
  • ONNX Runtime: 1.22.1
  • Quark: amd-quark 0.11 (import path: /home/johnk/miniforge3/envs/quark312/lib/python3.12/site-packages/quark/__init__.py)
  • Custom ops library: quark/onnx/operators/custom_ops/lib/libcustom_ops.so
  • Execution Provider: CPUExecutionProvider
  • Custom op registration:
import onnxruntime as ort
from quark.onnx import get_library_path

so = ort.SessionOptions()
so.register_custom_ops_library(get_library_path("CPU"))

Problem details

  • Failure mode: process abort during onnxruntime.InferenceSession(...) creation.
  • Observed abort signature: assertion failure from std::vector::operator[] (native crash).
  • Crash location: inside Quark custom ops library libcustom_ops.so in BFPFixNeuron shape inference.

Evidence (gdb backtrace)

Attach the full backtrace captured on jc01. Key frames indicate a crash during custom-op shape inference inside libcustom_ops.so for BFPFixNeuron.

  • gdb backtrace: TODO (paste full text)

Steps to reproduce

  1. Install Quark + ONNX Runtime (versions above).
  2. Ensure the Quark custom ops library is available.
  3. Register custom ops library and create a session:
import onnxruntime as ort
from quark.onnx import get_library_path

model_path = "yolov8x.bfp16.v1.onnx"  # failing model

so = ort.SessionOptions()
so.register_custom_ops_library(get_library_path("CPU"))

sess = ort.InferenceSession(model_path, sess_options=so, providers=["CPUExecutionProvider"])
  1. Observe abort during session creation (native crash).

Bisection / isolation result

The crash persists until isolating a single node:

  • Node index: 993
  • Op type: BFPQuantizeDequantize
  • Node name: /model.22/Sigmoid_output_0_DequantizeLinear
  • Input producer: Sigmoid
  • Observed input shape at that edge: [1, 80, 8400]

Workaround

Replace only node idx=993 with Identity (preserving input/output tensor name wiring) and save a patched model.

  • Patched artifact: yolov8x.bfp16.v1.patched-only-node993.onnx
  • Result: ORT session creation succeeds reliably.

Session creation confirmation (patched model)

Run:

import onnxruntime as ort
from quark.onnx import get_library_path

model="/home/johnk/experiments/quark-yolov8x-exp/2026-02-09-jc01/models/onnx/bfp16/yolov8x.bfp16.v1.patched-only-node993.onnx"

so = ort.SessionOptions()
so.graph_optimization_level = ort.GraphOptimizationLevel.ORT_DISABLE_ALL
so.register_custom_ops_library(get_library_path("CPU"))
print("registered custom ops; optimizations disabled")

sess = ort.InferenceSession(model, sess_options=so, providers=["CPUExecutionProvider"])
print("PATCHED (only node 993) session OK")

Observed output:

registered custom ops; optimizations disabled
PATCHED (only node 993) session OK

Performance sanity check (patched model)

Configuration:

  • ORT graph optimizations: ORT_DISABLE_ALL
  • Provider: CPUExecutionProvider
  • Input: random FP32 tensor with shape (1,3,640,640)
  • Warmup: 5
  • Timed iterations: 50

Result:

  • sec_total=640.4786179065704
  • sec_per_iter=12.809572358131408
  • out0_shape=(1, 84, 8400)
  • out0_dtype=float32

Additional observations

  • A minimal standalone ONNX model containing only BFPQuantizeDequantize with input shape [1,80,8400] loads successfully (after forcing ONNX IR version 10 in the minimal repro model).
  • Earlier minimal repro also showed rank-0 (scalar) inputs can crash BFPQuantizeDequantize shape inference (separate issue), but the YOLOv8x crash was isolated to the above non-scalar node.
  • Since the standalone reproducer works, this points to a context-dependent shape inference bug in the custom op implementation (graph metadata/value_info/dynamic dims/multiple consumers/etc.).

Related issues (ruled out / adjacent symptoms)

These issues are not the same root cause as this defect, but they are related to earlier symptoms we ruled out while diagnosing Linux + native plugin/custom-op availability.

Expected behavior

  • ORT session creation should not abort.
  • If an input is unsupported, the custom op should return a recoverable error with a descriptive message rather than aborting.

Actual behavior

  • Native abort during session initialization, no Python exception.

Related investigation notes

See:

  • docs/articles/productize_amd_ai_workflow/quark-experiments/2026-02-09-yolov8x-quark-pipeline-experiment.md

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions