-
Notifications
You must be signed in to change notification settings - Fork 60
Abstract filters support #853
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
I forgot to mention that I did not change cpu device interface yet in a similar way to cuda ( |
I've added one more commit to update CPU device interface to use filters path introduce in |
bcde35e
to
a27a9e6
Compare
I've rebase the PR on top of the recently landed #831 prerequisite. Change is ready for review. @scotts @NicolasHug |
a27a9e6
to
7de6c57
Compare
I pushed update to fix a linter issue which I have overlooked.
This error in CI test seems unrelated to the change. |
7de6c57
to
78bb3b4
Compare
Pushed update to address another lint issue in the last commit. |
78bb3b4
to
f400a6d
Compare
Rebased on top of: |
avFrame->width, | ||
"x", | ||
avFrame->height); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure what we're enabling by exposing DeviceInterface::initializeFiltersContext()
and then calling it here in SingleStreamDecoder
. That is, if DeviceInterface
just didn't expose the concept of filter graphs at all, each device implementation could still decide to use a filter graph on its own.
We already have the ability to ask a device to do device-specific frame conversions with DeviceInterface::convertAVFrameToFrameOutput()
. Pulling out filter graph capabilities from that, and then forcing that to be orchestrated at the level of SingleStreamDecoder
does not, as far as I can tell, buy us any new capabilities. But it does come at a cost, because CpuDeviceInterface::convertAVFrameToFrameOutput()
is now (to me) quite strange: it does swscale based conversion and filter graph cleanup.
The logic in this PR is effectively:
if (stream.kind == AUDIO) {
convertAudio(inFrame, outFrame);
}
else if (stream.kind == VIDEO && device != nullptr) {
filterGraph = device->getFilterGraph();
if (filterGraph != nullptr) {
inFrame = filterGraph->convert(inFrame);
}
device->convertVideo(inFrame, outFrame);
}
return outFrame;
I don't see how the above buys us anything over:
if (stream.kind == AUDIO) {
convertAudio(inFrame, outFrame);
}
else if (stream.kind == VIDEO && device != nullptr) {
// device is free to use filter graph if it wants; it's
// entirely an implementation detail for the device
device->convertVideo(inFrame, outFrame);
}
return outFrame;
I do understand that in the future, we'll likely want to tell a device "Hey, here are the particular filters we want you to apply." But I don't think these interfaces actually buy us that capability. I think we can just as easily through the VideoStreamOptions
we currently pass to DeviceInterface::convertAVFrameToFrameOutput()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR moves ownership and managing of filter graph to SingleStreamDecoder
. Otherwise this will be done in each device interface separately while doing that in SingleStreamDecoder
makes these behave the same and share across all devices. There is further considerations around that which I highlight in #853 (comment).
@dvrogozh, thank you for this work! I left a detailed comment where I'm trying to understand the value we're getting out of the new abstractions - please help me understand that better. At a higher-level, I wonder if we could take the following approach:
I'm sorry if that's the direction you were already going in - I know I pushed to see what the full generalization would look like. One quirk of the approach above is that each device may end up implementing the same patterns for their filter graph implementation. In that case, we could potentially add some member functions to |
@scotts : you have highlighted the alternate design approach to take. I think we have 2 options on a plate:
I am proposing to consider 2nd direction which gives more modular structure for the project and simplifies device interface to a set of trivial operations. I am trying to reduce complexity of writing a device interface for non-CUDA GPUs by reusing as much as possible. In the sense of 2nd direction, this PR is the first step. As you saw, it moves ownership and management of the Filter Graph out of device interface (to
Please, share your thoughts on the above. If you want, we can start by going with 1st direction (keep conversion in device interfaces) and continue weighing 2nd approach. I think enabling encoders might contribute to its justification as we again will consider filter operations. This time before encoding. If we will decide to go with 1st approach for now, then I will submit another PR with the focus on enabling 10-bit support in CUDA device interface. |
@dvrogozh, thanks for explaining the direction you're going in! Based on where we are, I still prefer the first option: keeping all decoder output conversions within the device implementation. I do think that in the future, we'll want better abstractions around filtergraph, swscale and NPP. That is, these are all ways to convert a raw decoded frame into our desired color space and potentially perform some native transforms. But what the exact abstractions we should build is not obvious to me right now. In particular, this PR has us in an in-between state where some stuff remains in the device implementation, and some it outside. By pursuing option 1, I think we'll eventually get to a state where we have three device implementations: CPU and CUDA in this repo, and XPU in an Intel repo. At that point, it will be more clear to us what the patterns are and what abstractions to build. |
For: pytorch#776 Signed-off-by: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
Signed-off-by: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
Signed-off-by: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
f400a6d
to
8f899e1
Compare
Changes:
DeviceInterface::initializeFiltersContext()
API which returns device specific FilterGraph initialization settingsSingleStreamDecoder
levelscale_cuda
scale_cuda
does not currently support conversion of YUV to RGB) and then cuda device interface converts NV12 to RGB (via existing NPP path)Basically idea behind this change is the following. Let device interface to perform trivial and performance optimized conversions in the
convertAVFrameToFrameOutput()
method. If device interface can not handle conversion in theconvertAVFrameToFrameOutput()
, it can setup ffmpeg filters pipeline by returning valid description ininitializeFiltersContext()
.I tested the pipeline on 10-bit videos, h264 and h265. However, this setup should be valid for 12-bit videos which I did not try.
Note:
scale_cuda
does not support format conversion in this ffmpeg version (was added from n5.0)scale_cuda
converted outputs with CPU implementation. I do see the difference in the outputs likely due to different handling of color standards. This is something I was not able to overcome at the moment.CC: @scotts @NicolasHug @eromomon