[SYCL] Fix resource leak related to SYCL_FALLBACK_ASSERT#12532
[SYCL] Fix resource leak related to SYCL_FALLBACK_ASSERT#12532aelovikov-intel merged 4 commits intointel:syclfrom
Conversation
intel#6837 enabled asynchronous buffer destruction for buffers constructed without host data. However, initial fallback assert implementation in intel#3767 predates it and as such had to place the buffer inside `queue_impl` to avoid unintended synchronization point. I don't know if there was the same crash observed on the end-to-end test added as part of this PR prior to intel#3767, but it doesn't even matter because the "new" implementation is both simpler and doesn't result in a crash. I suspect that without it (with the buffer for fallback assert implementation being a data member of `sycl::queue_impl`) we had a cyclic dependency somewhere leading to resource leak and ultimately to the assert in `DeviceGlobalUSMMem::~DeviceGlobalUSMMem()`.
|
✅ With the latest revision this PR passed the C/C++ code formatter. |
|
The newly added test is failing. Should I wait for further changes before review? |
|
My bet is those are yet another issue. AMD/HIP is totally different because it has The only remaining failure that could possibly be related is the leak on Lin/gen12 configuration, but even that is improvement over previous state where we'd fail with an assert because nothing was released. Here, at least, it's just one queue and one event. |
HIP failure is unrelated, so are (likely) extra memory leaks found in CI. The manifestation of the original bug is that almost nothing was freed resulting in an assert in `DeviceGlobalUSMMem::~DeviceGlobalUSMMem()`. That is what this PR addresses and is being verified by the test even without UR_L0_LEAKS_DEBUG.
|
@maarquitos14 , I've removed the leak tracking from the test as those issues are distinct from what is being fixed here. @KseniyaTikhomirova , @sergey-semenov I know you are busy but I think you're the most experienced in this area, would appreciate if you can take a quick look too. |
sergey-semenov
left a comment
There was a problem hiding this comment.
LGTM, relying on the deferred release is much better.
|
Post commit failure (Arc GPU): Looks same as the failure in #12555 (comment) (on a different test). |
It's been unused since intel#12532 awaiting ABI breaking window for the final removal.
) It's been unused since #12532 awaiting ABI breaking window for the final removal.
#6837 enabled asynchronous buffer destruction for buffers constructed without host data. However, initial fallback assert implementation in
#3767 predates it and as such had to place the buffer inside
queue_implto avoid unintended synchronization point. I don't know if there was the same crash observed on the end-to-end test added as part of this PR prior to#3767, but it doesn't even matter because the "new" implementation is both simpler and doesn't result in a crash.
I suspect that without it (with the buffer for fallback assert implementation being a data member of
sycl::queue_impl) we had a cyclic dependency somewhere leading to resource leak and ultimately to the assert inDeviceGlobalUSMMem::~DeviceGlobalUSMMem().