NPU Stability Issues on RadxaOS (Kernel 6.1.84-vendor-rk35xx)

## Environment

- **Board:** Rock 5C (RK3588)
- **OS:** RadxaOS (Kernel 6.1.84-8-rk2410)
- **RKNPU2 Package:** 2.3.0-1
- **RKNN Runtime:** 2.3.2
- **RKNPU Driver:** 0.9.8

## Problem Description

Experiencing severe NPU instability when running llama.cpp with RKNPU2 backend on RadxaOS. Issues manifest as:

1. **EFAULT errors** (`failed to allocate handle, errno: 14`) during inference after ~60s of model loading
2. **Board crashes/reboots** frequently during NPU operations
3. **CPU-only builds also crash** the board, indicating the issue is not purely RKNPU-related

## Error Messages

```
E RKNN: failed to allocate handle, errno: 14 (EFAULT)
E RKNN: failed to submit matmul task
```

## Known Driver Bugs

1. **Spinlock recursion bug** in RKNPU 0.9.x with 4+ contexts in auto core mode
   - Reference: https://github.com/rockchip-linux/kernel/issues/329
   - Workaround: Force single core (RKNN_NPU_CORE_0)

2. **Kernel panic issues** with NPU operations on 6.1.84-vendor-rk35xx
   - Reference: https://forum.armbian.com/topic/48399-6184-vendor-rk35xx-kernel-panic/

## Investigation Summary

| Test | Result |
|------|--------|
| Standalone SDK matmul test | PASS |
| SDK stress test (50 contexts) | PASS |
| CPU-only llama-cli build | FAIL (board crashes) |
| NPU-accelerated llama-cli | FAIL (EFAULT errors) |

**Conclusion:** Hardware/system instability that predates our fork's changes.

## Workarounds Applied

1. Force single NPU core (RKNN_NPU_CORE_0)
2. Use static memory context for B-matrix allocations
3. Map quantization to power of 2 for context caching

## Recommendation

Switch to **Armbian** for Rock 5C, which provides more stable NPU driver support.

## Expected Behavior

NPU should operate stably for sustained inference sessions without board crashes or EFAULT errors.

## Related

- Original driver issue: https://github.com/rockchip-linux/kernel/issues/329
- Forum discussion: https://forum.armbian.com/topic/48399-6184-vendor-rk35xx-kernel-panic/

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NPU Stability Issues on RadxaOS (Kernel 6.1.84-vendor-rk35xx) #2

Environment

Problem Description

Error Messages

Known Driver Bugs

Investigation Summary

Workarounds Applied

Recommendation

Expected Behavior

Related

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Test	Result
Standalone SDK matmul test	PASS
SDK stress test (50 contexts)	PASS
CPU-only llama-cli build	FAIL (board crashes)
NPU-accelerated llama-cli	FAIL (EFAULT errors)

NPU Stability Issues on RadxaOS (Kernel 6.1.84-vendor-rk35xx) #2

Description

Environment

Problem Description

Error Messages

Known Driver Bugs

Investigation Summary

Workarounds Applied

Recommendation

Expected Behavior

Related

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions