Skip to content

NPU Stability Issues on RadxaOS (Kernel 6.1.84-vendor-rk35xx) #2

@KHAEntertainment

Description

@KHAEntertainment

Environment

  • Board: Rock 5C (RK3588)
  • OS: RadxaOS (Kernel 6.1.84-8-rk2410)
  • RKNPU2 Package: 2.3.0-1
  • RKNN Runtime: 2.3.2
  • RKNPU Driver: 0.9.8

Problem Description

Experiencing severe NPU instability when running llama.cpp with RKNPU2 backend on RadxaOS. Issues manifest as:

  1. EFAULT errors (failed to allocate handle, errno: 14) during inference after ~60s of model loading
  2. Board crashes/reboots frequently during NPU operations
  3. CPU-only builds also crash the board, indicating the issue is not purely RKNPU-related

Error Messages

E RKNN: failed to allocate handle, errno: 14 (EFAULT)
E RKNN: failed to submit matmul task

Known Driver Bugs

  1. Spinlock recursion bug in RKNPU 0.9.x with 4+ contexts in auto core mode

  2. Kernel panic issues with NPU operations on 6.1.84-vendor-rk35xx

Investigation Summary

Test Result
Standalone SDK matmul test PASS
SDK stress test (50 contexts) PASS
CPU-only llama-cli build FAIL (board crashes)
NPU-accelerated llama-cli FAIL (EFAULT errors)

Conclusion: Hardware/system instability that predates our fork's changes.

Workarounds Applied

  1. Force single NPU core (RKNN_NPU_CORE_0)
  2. Use static memory context for B-matrix allocations
  3. Map quantization to power of 2 for context caching

Recommendation

Switch to Armbian for Rock 5C, which provides more stable NPU driver support.

Expected Behavior

NPU should operate stably for sustained inference sessions without board crashes or EFAULT errors.

Related

🤖 Generated with Claude Code

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions