-
Notifications
You must be signed in to change notification settings - Fork 70
feat: inplace pin memory for safetensors in /dev/shm/ #58
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: inplace pin memory for safetensors in /dev/shm/ #58
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This pull request adds an optimization for loading safetensors checkpoint files stored in /dev/shm/ by enabling in-place memory pinning, which avoids copying data and reduces memory consumption by half. When safetensors files are detected in /dev/shm/, the code now pins the memory-mapped file directly instead of allocating separate pinned memory and copying the tensors.
Key changes:
- Added inplace pin memory path for safetensors files in
/dev/shm/using CUDA'scudaHostRegister - Implemented manual safetensors header parsing to extract tensor metadata without loading through the safetensors library
- Parallelized inplace pinning operations using ThreadPoolExecutor
- Preserved existing checkpoint loading path as fallback for non-safetensors files or files outside
/dev/shm/
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
648e61b to
272355d
Compare
93f4470 to
3b3371b
Compare
9ce3f31 to
37d8f0b
Compare
blahgeek
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
15e8dba to
93f3fa9
Compare
93f3fa9 to
4d68fb3
Compare
resolve #60
Safetensors file have aligned storage layout. If safetensors files are in
/dev/shm, we can pin it inplace without copying it, which will not cost double memory usage.