feat: inplace pin memory for safetensors in /dev/shm/ #58

specture724 · 2025-12-02T12:26:16Z

resolve #60
Safetensors file have aligned storage layout. If safetensors files are in /dev/shm, we can pin it inplace without copying it, which will not cost double memory usage.

Copilot

Pull request overview

This pull request adds an optimization for loading safetensors checkpoint files stored in /dev/shm/ by enabling in-place memory pinning, which avoids copying data and reduces memory consumption by half. When safetensors files are detected in /dev/shm/, the code now pins the memory-mapped file directly instead of allocating separate pinned memory and copying the tensors.

Key changes:

Added inplace pin memory path for safetensors files in /dev/shm/ using CUDA's cudaHostRegister
Implemented manual safetensors header parsing to extract tensor metadata without loading through the safetensors library
Parallelized inplace pinning operations using ThreadPoolExecutor
Preserved existing checkpoint loading path as fallback for non-safetensors files or files outside /dev/shm/

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

checkpoint_engine/ps.py

blahgeek

lgtm

specture724 requested review from HubertZhang, blahgeek, Copilot and weixiao-huang December 2, 2025 12:26

Copilot started reviewing on behalf of specture724 December 2, 2025 12:26 View session

Copilot finished reviewing on behalf of specture724 December 2, 2025 12:29

Copilot AI reviewed Dec 2, 2025

View reviewed changes

specture724 force-pushed the feat/inplace-pin-memory branch from 648e61b to 272355d Compare December 5, 2025 07:35

specture724 self-assigned this Dec 5, 2025

specture724 force-pushed the feat/inplace-pin-memory branch 2 times, most recently from 93f4470 to 3b3371b Compare December 5, 2025 08:06

blahgeek reviewed Dec 8, 2025

View reviewed changes

checkpoint_engine/ps.py Outdated Show resolved Hide resolved

checkpoint_engine/ps.py Outdated Show resolved Hide resolved

checkpoint_engine/ps.py Outdated Show resolved Hide resolved

checkpoint_engine/ps.py Outdated Show resolved Hide resolved

specture724 force-pushed the feat/inplace-pin-memory branch from 9ce3f31 to 37d8f0b Compare December 8, 2025 10:31

blahgeek approved these changes Dec 10, 2025

View reviewed changes

specture724 added 9 commits December 11, 2025 04:29

feat: inplace pin memory for safetensors in /dev/shm/

8b0bb10

feat: inplace pin and normal pin compatible

3d2ad5d

feat: inplace-pin-memory need synchronization barrier

20a8bf5

feat: test for inplace pin memory added

df3bd0e

feat: add header format check and key __metadata__ ignored

9794cf3

fix: fix PR issues

ded2865

feat: remove temp files in test

892aea5

fix: fix PR issues

b7906db

misc: add "files deleted" warning in doc string

c380f0c

specture724 force-pushed the feat/inplace-pin-memory branch 2 times, most recently from 15e8dba to 93f3fa9 Compare December 11, 2025 06:00

misc

4d68fb3

specture724 force-pushed the feat/inplace-pin-memory branch from 93f3fa9 to 4d68fb3 Compare December 11, 2025 06:05

blahgeek merged commit e88d462 into MoonshotAI:main Dec 11, 2025
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: inplace pin memory for safetensors in /dev/shm/ #58

feat: inplace pin memory for safetensors in /dev/shm/ #58

Uh oh!

specture724 commented Dec 2, 2025 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

blahgeek left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

feat: inplace pin memory for safetensors in /dev/shm/ #58

feat: inplace pin memory for safetensors in /dev/shm/ #58

Uh oh!

Conversation

specture724 commented Dec 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

blahgeek left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

specture724 commented Dec 2, 2025 •

edited

Loading