Adding L1-L4 design to RFC-0050 by can-gaa-hou · Pull Request #93 · pytorch/rfcs

can-gaa-hou · 2026-04-14T03:22:21Z

As @ZainRizvi mentioned here, I am opening this PR for adding L1-L4 design in our RFC. This may help us better develop L2-L4 implementation.

cc @fffrog @ZainRizvi @albanD

groenenboomj · 2026-04-15T13:30:37Z

+
+#### L4: Always-On PR Checks
+
+`L4` uses the same callback and synchronization behavior as `L3`, but it does not wait for a label. The `L4` scenario is effectively the same as `L3` Scenario 1 except that the downstream result is represented in upstream PR Checks by default, without requiring explicit labeling.


Was there a plan to allow out of tree testing to block PRs? It seems like we'd need a very tight agreement on availability if out of tree tests can halt merging on pytorch/pytorch.

Hi @groenenboomj, we've discussed this in detail here: pytorch/pytorch#175022 (comment). For L4 level repos, they have the capabilities to gate PyTorch PR, but there will be a very high standard to achieve L4 (and the PyTorch core team remains the right to choose whether a repo is L4 even though it meets all the requirements).

Okay, I see that was added in the last level breakdown expansion in March. Has there been more discussion on the requirements for gating?

The main requirement is going to be this one:

Very few accelerators would reach this level. One only becomes L4 if pytorch maintainers are highly invested in it and find it worthwhile to add the signal to every PR.

It's intentionally left somewhat fuzzy, since it's a balance of "how much value will this provide vs how much pain does this inflict" since the idea itself is somewhat subjective. The automatic downgrade will need a firm reliability metric, but even that will likely need fine tuning based on experimentation, so hard numbers here in the short term are likely to be unreliable.

Do you have concerns about the approach @groenenboomj ?

Okay, I see that was added in the last level breakdown expansion in March. Has there been more discussion on the requirements for gating?

Hey @groenenboomj, based on previous discussions with @albanD and @ZainRizvi, most accelerators will fall into L3. It’s difficult to set a well-defined standard for L4 because it is effectively equivalent to being in-tree, just with a different code location.

To me, a key requirement for L4 is broad community usage and official investment from core maintainers. Testing and feature coverage are necessary, but they aren't the only deciding factors.

@ZainRizvi @fffrog

Less major concerns other than making sure we have enough formality documented so it's rigorous enough to not lead to as much debate when advancing through the levels. With that addressed it should be fine.

@groenenboomj

Absolutely, I'm with you on that.

In my view, the CRCR consists of four levels. The formality of L1 and L2 should be clear and well-defined; I don't think many people would disagree with that. L3 isn't much of a concern either—we can progressively flesh it out and make it more concrete with input from @albanD and @ZainRizvi.

Regarding L4, I don't believe it's an urgent issue that requires immediate resolution. Since it is considered equivalent to "in-tree" accelerators like CUDA, and most accelerators won't be able to reach this level in the short term, we can afford to de-prioritize it for now.

cc @albanD @ZainRizvi

ZainRizvi

Could you please sync this PR with the design specified in this one?

They're both discussing the same goal, so it would be good to keep the discussion in one place and not repeat points across them.

can-gaa-hou · 2026-04-17T08:35:47Z

Could you please sync this PR with the design specified in this one?

Sure! I will discuss this with @jewelkm89 ASAP and update it to the RFC as well.

fffrog · 2026-04-21T06:12:44Z

Hey @can-gaa-hou, the LF AI & Data Foundation is wrong, could you please correct this one in passing, it should be LF Pytorch Foundation

Ok, let me make this change

can-gaa-hou · 2026-04-21T08:24:25Z

 ```

 ## HUD Integration



@jewelkm89, as mentioned in https://docs.google.com/document/d/1w10Agz_YkSA1IDurobWjuNoYo4u9aykprjuymDFE6MU/edit?tab=t.5c09vnsaacwx#heading=h.t7zh9oz5xu0s, you can add the HUD section here and paste the Google Doc link for reference.

ZainRizvi · 2026-04-24T23:44:00Z

@fffrog Do we have signup instructions documented anywhere?

E.g. If someone wants to join the L1 tier, they need to join the allowlist and add the github app to their repo. How does one find the link for the github app?

can-gaa-hou · 2026-04-25T02:00:44Z

@fffrog Do we have signup instructions documented anywhere?

@ZainRizvi, I added an instruction a few days ago in the PyTorch documents. Please refer to it. https://github.com/pytorch/pytorch/blob/main/docs/source/accelerator/ci.md

fffrog · 2026-04-25T02:15:58Z

@fffrog Do we have signup instructions documented anywhere?

Hi @ZainRizvi,

Sure thing. We've added a dedicated documentation page for this. It's currently a preliminary version and will be updated as the CRCR move forward.

As mentioned in the previous comments by @can-gaa-hou, the link is https://github.com/pytorch/pytorch/blob/main/docs/source/accelerator/ci.md. However, the workflow seems to be consistently failing due to timeouts.

As a result, the rendered pages aren't being generated correctly, so users can currently only access the raw Markdown file rather than the properly formatted web pages.

Could you please take a look at this workflow? @can-gaa-hou tried to fix it, but it seems to be beyond our scope/control

ZainRizvi · 2026-04-28T19:51:55Z

However, the workflow seems to be consistently failing due to timeouts.
As a result, the rendered pages aren't being generated correctly, so users can currently only access the raw Markdown file rather than the properly formatted web pages.

Thanks for the pointer, will raise the problem internally. There was one bug causing that workflow to fail that was fixed already, but there seems to be another one causing the deployment step to fail as well: https://github.com/pytorch/docs/actions/workflows/pages/pages-build-deployment

meta-cla Bot added the cla signed label Apr 14, 2026

Update RFC-0050

3590059

can-gaa-hou force-pushed the cjh branch from 7f51e31 to 3590059 Compare April 14, 2026 03:30

groenenboomj reviewed Apr 15, 2026

View reviewed changes

ZainRizvi reviewed Apr 16, 2026

View reviewed changes

fffrog reviewed Apr 21, 2026

View reviewed changes

update LF Pytorch Foundation

a924440

can-gaa-hou commented Apr 21, 2026

View reviewed changes


		#### L4: Always-On PR Checks

		`L4` uses the same callback and synchronization behavior as `L3`, but it does not wait for a label. The `L4` scenario is effectively the same as `L3` Scenario 1 except that the downstream result is represented in upstream PR Checks by default, without requiring explicit labeling.

Conversation

can-gaa-hou commented Apr 14, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ZainRizvi left a comment

Choose a reason for hiding this comment

Uh oh!

can-gaa-hou commented Apr 17, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ZainRizvi commented Apr 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

can-gaa-hou commented Apr 25, 2026

Uh oh!

fffrog commented Apr 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ZainRizvi commented Apr 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

ZainRizvi commented Apr 24, 2026 •

edited

Loading

fffrog commented Apr 25, 2026 •

edited

Loading