BFPQuantizeDequantize crashes ONNX Runtime during session creation

# Defect report: Quark BFPQuantizeDequantize abort during ORT session creation (YOLOv8x BFP16)

## Title
`BFPQuantizeDequantize` custom op crashes ONNX Runtime during session creation (shape inference) for YOLOv8x BFP16; isolated to single node (idx=993); workaround: replace with `Identity`

## Summary
Loading a BFP16-quantized YOLOv8x ONNX model in ONNX Runtime with Quark custom ops registered causes a native abort during `InferenceSession` creation.

I was experimenting with different data and op types in AMD Quark (from Ryzen AI Software 1.7.0 on Ryzen AI HX 370 using Ubuntu 25.10 after building XRT from source and updating to the latest ROCm and ONNX runtimes). I used Yolov8x as my float model, and successfully experimented with INT8 and BF16 before moving on to BFP16 and MX9.

Investigation shows the abort occurs inside the Quark custom ops library (`libcustom_ops.so`) during shape inference for `BFPFixNeuron`.

Graph bisection isolated the crash to a single `BFPQuantizeDequantize` node (node index `993`). Replacing only that node with `Identity` unblocks session creation.

A minimal standalone reproducer containing a `BFPQuantizeDequantize` node with the same input tensor rank/shape (`[1,80,8400]`) loads successfully, suggesting the failure is **context-dependent** (graph metadata / surrounding topology), not purely based on tensor shape.

## Environment
- **Host**: `jc01` (Ryzen AI / AMD platform)
- **OS**: `Linux-6.17.0-12-generic-x86_64-with-glibc2.42`
- **Python**: `3.12.12 | packaged by conda-forge | (main, Jan 26 2026, 23:51:32) [GCC 14.3.0]`
- **ONNX Runtime**: `1.22.1`
- **Quark**: `amd-quark 0.11` (import path: `/home/johnk/miniforge3/envs/quark312/lib/python3.12/site-packages/quark/__init__.py`)
- **Custom ops library**: `quark/onnx/operators/custom_ops/lib/libcustom_ops.so`
- **Execution Provider**: `CPUExecutionProvider`
- **Custom op registration**:

```python
import onnxruntime as ort
from quark.onnx import get_library_path

so = ort.SessionOptions()
so.register_custom_ops_library(get_library_path("CPU"))
```

## Problem details
- **Failure mode**: process abort during `onnxruntime.InferenceSession(...)` creation.
- **Observed abort signature**: assertion failure from `std::vector::operator[]` (native crash).
- **Crash location**: inside Quark custom ops library `libcustom_ops.so` in `BFPFixNeuron` shape inference.

## Evidence (gdb backtrace)
Attach the full backtrace captured on `jc01`. Key frames indicate a crash during custom-op shape inference inside `libcustom_ops.so` for `BFPFixNeuron`.

- **gdb backtrace**: TODO (paste full text)

## Steps to reproduce
1. Install Quark + ONNX Runtime (versions above).
2. Ensure the Quark custom ops library is available.
3. Register custom ops library and create a session:

```python
import onnxruntime as ort
from quark.onnx import get_library_path

model_path = "yolov8x.bfp16.v1.onnx"  # failing model

so = ort.SessionOptions()
so.register_custom_ops_library(get_library_path("CPU"))

sess = ort.InferenceSession(model_path, sess_options=so, providers=["CPUExecutionProvider"])
```

4. Observe abort during session creation (native crash).

## Bisection / isolation result
The crash persists until isolating a **single node**:

- **Node index**: `993`
- **Op type**: `BFPQuantizeDequantize`
- **Node name**: `/model.22/Sigmoid_output_0_DequantizeLinear`
- **Input producer**: `Sigmoid`
- **Observed input shape at that edge**: `[1, 80, 8400]`

### Workaround
Replace only node `idx=993` with `Identity` (preserving input/output tensor name wiring) and save a patched model.

- **Patched artifact**: `yolov8x.bfp16.v1.patched-only-node993.onnx`
- **Result**: ORT session creation succeeds reliably.

#### Session creation confirmation (patched model)

Run:

```python
import onnxruntime as ort
from quark.onnx import get_library_path

model="/home/johnk/experiments/quark-yolov8x-exp/2026-02-09-jc01/models/onnx/bfp16/yolov8x.bfp16.v1.patched-only-node993.onnx"

so = ort.SessionOptions()
so.graph_optimization_level = ort.GraphOptimizationLevel.ORT_DISABLE_ALL
so.register_custom_ops_library(get_library_path("CPU"))
print("registered custom ops; optimizations disabled")

sess = ort.InferenceSession(model, sess_options=so, providers=["CPUExecutionProvider"])
print("PATCHED (only node 993) session OK")
```

Observed output:

```
registered custom ops; optimizations disabled
PATCHED (only node 993) session OK
```

#### Performance sanity check (patched model)

Configuration:

- ORT graph optimizations: `ORT_DISABLE_ALL`
- Provider: `CPUExecutionProvider`
- Input: random FP32 tensor with shape `(1,3,640,640)`
- Warmup: `5`
- Timed iterations: `50`

Result:

- `sec_total=640.4786179065704`
- `sec_per_iter=12.809572358131408`
- `out0_shape=(1, 84, 8400)`
- `out0_dtype=float32`

## Additional observations
- A minimal standalone ONNX model containing only `BFPQuantizeDequantize` with input shape `[1,80,8400]` loads successfully (after forcing ONNX IR version 10 in the minimal repro model).
- Earlier minimal repro also showed rank-0 (scalar) inputs can crash `BFPQuantizeDequantize` shape inference (separate issue), but the YOLOv8x crash was isolated to the above non-scalar node.
- Since the standalone reproducer works, this points to a **context-dependent shape inference bug** in the custom op implementation (graph metadata/value_info/dynamic dims/multiple consumers/etc.).

## Related issues (ruled out / adjacent symptoms)

These issues are not the same root cause as this defect, but they are related to earlier symptoms we ruled out while diagnosing Linux + native plugin/custom-op availability.

- https://github.com/amd/RyzenAI-SW/issues/265 (Linux OGA NPU execution mode fails, missing/expected `onnx_custom_ops.so`)
- https://github.com/amd/RyzenAI-SW/issues/335 (Linux packages missing actual shared libraries due to Git LFS pointer files)

## Expected behavior
- ORT session creation should not abort.
- If an input is unsupported, the custom op should return a recoverable error with a descriptive message rather than aborting.

## Actual behavior
- Native abort during session initialization, no Python exception.

## Related investigation notes
See:
- `docs/articles/productize_amd_ai_workflow/quark-experiments/2026-02-09-yolov8x-quark-pipeline-experiment.md`


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BFPQuantizeDequantize crashes ONNX Runtime during session creation #339

Defect report: Quark BFPQuantizeDequantize abort during ORT session creation (YOLOv8x BFP16)

Title

Summary

Environment

Problem details

Evidence (gdb backtrace)

Steps to reproduce

Bisection / isolation result

Workaround

Session creation confirmation (patched model)

Performance sanity check (patched model)

Additional observations

Related issues (ruled out / adjacent symptoms)

Expected behavior

Actual behavior

Related investigation notes

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

BFPQuantizeDequantize crashes ONNX Runtime during session creation #339

Description

Defect report: Quark BFPQuantizeDequantize abort during ORT session creation (YOLOv8x BFP16)

Title

Summary

Environment

Problem details

Evidence (gdb backtrace)

Steps to reproduce

Bisection / isolation result

Workaround

Session creation confirmation (patched model)

Performance sanity check (patched model)

Additional observations

Related issues (ruled out / adjacent symptoms)

Expected behavior

Actual behavior

Related investigation notes

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions