Adopt any_resource in BufferResource and RmmResourceAdaptor#772
Adopt any_resource in BufferResource and RmmResourceAdaptor#772bdice wants to merge 6 commits intorapidsai:mainfrom
Conversation
Change BufferResource::device_mr_ member from rmm::device_async_resource_ref to cuda::mr::any_resource<cuda::mr::device_accessible> so that the buffer resource owns the memory resource it wraps. This ensures the memory resource remains valid for the lifetime of the BufferResource, matching the ownership semantics now used by rmm::device_buffer. Changes: - Add #include <cuda/memory_resource> - Change device_mr_ type to any_resource - Make device_mr() accessor non-const (required because CCCL's resource_ref can only be constructed from non-const any_resource&)
Change RmmResourceAdaptor's memory resource members from rmm::device_async_resource_ref to cuda::mr::any_resource so that the adaptor owns the memory resources it wraps. Changes: - Add #include <cuda/memory_resource> - Change primary_mr_ to any_resource - Change fallback_mr_ to std::optional<any_resource> - Make get_upstream_resource() and get_fallback_resource() non-const (required because CCCL's resource_ref can only be constructed from non-const any_resource&) - Update get_fallback_resource() to construct resource_ref from optional - Update do_is_equal() to compare any_resource members directly, with explicit optional value extraction to avoid CCCL's recursive constraint satisfaction issue with std::optional::operator==
| [[nodiscard]] rmm::device_async_resource_ref device_mr() noexcept { | ||
| return device_mr_; | ||
| } |
There was a problem hiding this comment.
Can you explain why this "return device_mr_ as resource_ref cannot be const?
There was a problem hiding this comment.
Let’s discuss on this RMM PR thread, it is the same question: rapidsai/rmm#2201 (comment)
| * - Memory usage tracking. | ||
| * - Fallback memory resource support upon out-of-memory in the primary resource. | ||
| */ | ||
| class RmmResourceAdaptor final : public rmm::mr::device_memory_resource { |
There was a problem hiding this comment.
I suppose we're not yet reading to drop the inheritance like we have in RMM?
There was a problem hiding this comment.
I am planning to drop the inheritance and remove the device_memory_resource class as a later step. For now, my immediate goals are to drop pointer-based resources device_memory_resource* (in favor of refs), and, if a ref is used as a member (e.g. of an adaptor or container), switch to any_resource so it is owning. This prevents a class of use-after-free bugs that have plagued RMM.
| bool fallbacks_equal; | ||
| if (fallback_mr_.has_value() != cast->fallback_mr_.has_value()) { | ||
| fallbacks_equal = false; | ||
| } else if (!fallback_mr_.has_value()) { | ||
| fallbacks_equal = true; | ||
| } else { | ||
| fallbacks_equal = (*fallback_mr_ == *cast->fallback_mr_); | ||
| } |
There was a problem hiding this comment.
I prefer the old logic that initialises fallbacks_equal immediately, is there a functional reason for this change? I think no.
There was a problem hiding this comment.
I’ll investigate but I think it was failing that way.
Summary
cuda::mr::any_resource<cuda::mr::device_accessible>inBufferResourceandRmmResourceAdaptorto own upstream memory resourcesany_resourcemembers (requires non-constthis)do_is_equalcomparison by comparing owned resources directlyPart of RMM issue 2011.
Depends on rapidsai/rmm#2200.