-
Notifications
You must be signed in to change notification settings - Fork 72
Description
debug CUDA_ERROR_UNKNOWN errors Why? Should follow up, but hard to debug until P2s are addressed and seem to have stopped.
Describe the bug
The GPU video decoding fails with CUDA_ERROR_UNKNOWN, needing the user to restart the node for future segments. Sometimes it's paired with CUDA_ERROR_OUT_OF_MEMORY or CUDA_ERROR_ILLEGAL_ADDRESS.
To Reproduce
Steps to reproduce the behavior:
- Unclear as of now.
Expected behavior
Decrease the blast radius of these errors if possible, and figure out the root cause.
Additional context
Stack-trace for future reference:
LPMS - https://github.com/livepeer/lpms/blob/master/ffmpeg/decoder.c#L250
FFmpeg - entry-point https://github.com/FFmpeg/FFmpeg/blob/870bfe16a12bf09dca3a4ae27ef6f81a2de80c40/libavutil/hwcontext.c#L610
most-probable line causing the error https://github.com/FFmpeg/FFmpeg/blob/870bfe16a12bf09dca3a4ae27ef6f81a2de80c40/libavutil/hwcontext.c#L629
cuda-specific ctx creation routine
https://github.com/FFmpeg/FFmpeg/blob/870bfe16a12bf09dca3a4ae27ef6f81a2de80c40/libavutil/hwcontext_cuda.c#L379
cuCtxCreate call https://github.com/FFmpeg/FFmpeg/blob/870bfe16a12bf09dca3a4ae27ef6f81a2de80c40/libavutil/hwcontext_cuda.c#L363


