Skip to content

explicitly permit JIT caching for model init#577

Open
matthew-frank wants to merge 1 commit intomlcommons:masterfrom
matthew-frank:mfrank/jit-cache-proposal
Open

explicitly permit JIT caching for model init#577
matthew-frank wants to merge 1 commit intomlcommons:masterfrom
matthew-frank:mfrank/jit-cache-proposal

Conversation

@matthew-frank
Copy link
Copy Markdown
Contributor

Explicitly permit using JIT caching to reduce model init time.

@matthew-frank matthew-frank requested review from a team as code owners February 25, 2026 00:43
@github-actions
Copy link
Copy Markdown

MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅

@ShriyaRishab
Copy link
Copy Markdown
Contributor

Background:

Because resume-from-checkpoint time is important for large model training, it is now a common industry practice for JITs used in LLM libraries to cache and reuse their kernels on shared persistent storage. Libraries like Megatron-core, Triton, Torch Titan and Hybrid EP all now support this initialization-time optimization.

The current rules neither prohibit nor explicitly permit using JIT caches.

Pros:

JIT time has been a real issue when trying to collect results for very large benchmarks during the final weeks leading up to the deadline. The cost of gpus (e.g. 5 minutes of extra init per run for 2k-8k gpus 170-680 gpu hours) is one issue, but developer time when there is risk that runs will fail and have to be restarted is also an issue. Waiting for big models to get through init and start training has been stressful. Allowing JIT caching during init would reduce the developer stress and hardware costs.

Potential Cons:
If there’s a case where Jitting is currently taking > 30 minutes (max model init time), this change would allow that competitor to “hide” the jit time by reading from the cache. We actually believe that this is desirable: JITting with caching is not much different than profile-directed compilation, and (offline) profile-directed compilation is already permitted (and encouraged).

@ShriyaRishab
Copy link
Copy Markdown
Contributor

ShriyaRishab commented Mar 19, 2026

WG discussion: Generally ok with this but only concern is that there could be a submitter who's init time is close to 30 mins so they would have to exceed the 30 min init time if not for cacheing although this rule gives them a bonus.
We don't think this is a normal scenario but if it comes up, review committee can make a decision.

Approved. Add note in the rule change that if init time is very close to 30 mins, review committee can investigate further if jit cacheing gives an unfair advantage to a submitter.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants