Skip to content

P3 - Nvidia decoding sometimes returns CUDA_ERROR_UNKNOWN #239

@jailuthra

Description

@jailuthra

debug CUDA_ERROR_UNKNOWN errors Why? Should follow up, but hard to debug until P2s are addressed and seem to have stopped.

Describe the bug
The GPU video decoding fails with CUDA_ERROR_UNKNOWN, needing the user to restart the node for future segments. Sometimes it's paired with CUDA_ERROR_OUT_OF_MEMORY or CUDA_ERROR_ILLEGAL_ADDRESS.

To Reproduce
Steps to reproduce the behavior:

  • Unclear as of now.

Expected behavior
Decrease the blast radius of these errors if possible, and figure out the root cause.

Screenshots
ERROR_UNKNOWN
image

ERROR_ILLEGAL_ADDRESS
image

ERROR_OUT_OF_MEMORY
image

Additional context

Stack-trace for future reference:
LPMS - https://github.com/livepeer/lpms/blob/master/ffmpeg/decoder.c#L250
FFmpeg - entry-point https://github.com/FFmpeg/FFmpeg/blob/870bfe16a12bf09dca3a4ae27ef6f81a2de80c40/libavutil/hwcontext.c#L610
most-probable line causing the error https://github.com/FFmpeg/FFmpeg/blob/870bfe16a12bf09dca3a4ae27ef6f81a2de80c40/libavutil/hwcontext.c#L629
cuda-specific ctx creation routine
https://github.com/FFmpeg/FFmpeg/blob/870bfe16a12bf09dca3a4ae27ef6f81a2de80c40/libavutil/hwcontext_cuda.c#L379
cuCtxCreate call https://github.com/FFmpeg/FFmpeg/blob/870bfe16a12bf09dca3a4ae27ef6f81a2de80c40/libavutil/hwcontext_cuda.c#L363

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions