-
Notifications
You must be signed in to change notification settings - Fork 7
Description
The current caching allocator has a separate cache per instance of the class, which is templated on <value_type, space>. Separate per space is necessary, but per-type is not. It would be better to have global per-space caches. It may also be useful to expose pool allocations to clib/fortran, currently the container types will use caching allocators, but the clib routines call the underlying backend allocation. I don't think we should change the default for clib, but add an option to pool allocate.
It may be worth adding an option to plug in RMM as an alternative to direct cuda calls, to see if that improves things for nvidia. If it does, i.e. better or equal performance and less space used for application runs, then may be worth investing in a cross-backend implementation. We could also explore adapting Gator from YAKL.