Skip to content

Adding L1-L4 design to RFC-0050#93

Open
can-gaa-hou wants to merge 2 commits intopytorch:masterfrom
can-gaa-hou:cjh
Open

Adding L1-L4 design to RFC-0050#93
can-gaa-hou wants to merge 2 commits intopytorch:masterfrom
can-gaa-hou:cjh

Conversation

@can-gaa-hou
Copy link
Copy Markdown

As @ZainRizvi mentioned here, I am opening this PR for adding L1-L4 design in our RFC. This may help us better develop L2-L4 implementation.

cc @fffrog @ZainRizvi @albanD

@meta-cla meta-cla Bot added the cla signed label Apr 14, 2026

#### L4: Always-On PR Checks

`L4` uses the same callback and synchronization behavior as `L3`, but it does not wait for a label. The `L4` scenario is effectively the same as `L3` Scenario 1 except that the downstream result is represented in upstream PR Checks by default, without requiring explicit labeling.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was there a plan to allow out of tree testing to block PRs? It seems like we'd need a very tight agreement on availability if out of tree tests can halt merging on pytorch/pytorch.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @groenenboomj, we've discussed this in detail here: pytorch/pytorch#175022 (comment). For L4 level repos, they have the capabilities to gate PyTorch PR, but there will be a very high standard to achieve L4 (and the PyTorch core team remains the right to choose whether a repo is L4 even though it meets all the requirements).

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I see that was added in the last level breakdown expansion in March. Has there been more discussion on the requirements for gating?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main requirement is going to be this one:

Very few accelerators would reach this level. One only becomes L4 if pytorch maintainers are highly invested in it and find it worthwhile to add the signal to every PR.

It's intentionally left somewhat fuzzy, since it's a balance of "how much value will this provide vs how much pain does this inflict" since the idea itself is somewhat subjective. The automatic downgrade will need a firm reliability metric, but even that will likely need fine tuning based on experimentation, so hard numbers here in the short term are likely to be unreliable.

Do you have concerns about the approach @groenenboomj ?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I see that was added in the last level breakdown expansion in March. Has there been more discussion on the requirements for gating?

Hey @groenenboomj, based on previous discussions with @albanD and @ZainRizvi, most accelerators will fall into L3. It’s difficult to set a well-defined standard for L4 because it is effectively equivalent to being in-tree, just with a different code location.

To me, a key requirement for L4 is broad community usage and official investment from core maintainers. Testing and feature coverage are necessary, but they aren't the only deciding factors.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ZainRizvi @fffrog

Less major concerns other than making sure we have enough formality documented so it's rigorous enough to not lead to as much debate when advancing through the levels. With that addressed it should be fine.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@groenenboomj

Absolutely, I'm with you on that.

In my view, the CRCR consists of four levels. The formality of L1 and L2 should be clear and well-defined; I don't think many people would disagree with that. L3 isn't much of a concern either—we can progressively flesh it out and make it more concrete with input from @albanD and @ZainRizvi.

Regarding L4, I don't believe it's an urgent issue that requires immediate resolution. Since it is considered equivalent to "in-tree" accelerators like CUDA, and most accelerators won't be able to reach this level in the short term, we can afford to de-prioritize it for now.

cc @albanD @ZainRizvi

Copy link
Copy Markdown
Contributor

@ZainRizvi ZainRizvi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please sync this PR with the design specified in this one?

They're both discussing the same goal, so it would be good to keep the discussion in one place and not repeat points across them.

@can-gaa-hou
Copy link
Copy Markdown
Author

Could you please sync this PR with the design specified in this one?

Sure! I will discuss this with @jewelkm89 ASAP and update it to the RFC as well.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @can-gaa-hou, the LF AI & Data Foundation is wrong, could you please correct this one in passing, it should be LF Pytorch Foundation

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, let me make this change

```

## HUD Integration

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jewelkm89, as mentioned in https://docs.google.com/document/d/1w10Agz_YkSA1IDurobWjuNoYo4u9aykprjuymDFE6MU/edit?tab=t.5c09vnsaacwx#heading=h.t7zh9oz5xu0s, you can add the HUD section here and paste the Google Doc link for reference.

@ZainRizvi
Copy link
Copy Markdown
Contributor

ZainRizvi commented Apr 24, 2026

@fffrog Do we have signup instructions documented anywhere?

E.g. If someone wants to join the L1 tier, they need to join the allowlist and add the github app to their repo. How does one find the link for the github app?

@can-gaa-hou
Copy link
Copy Markdown
Author

@fffrog Do we have signup instructions documented anywhere?

@ZainRizvi, I added an instruction a few days ago in the PyTorch documents. Please refer to it. https://github.com/pytorch/pytorch/blob/main/docs/source/accelerator/ci.md

@fffrog
Copy link
Copy Markdown
Contributor

fffrog commented Apr 25, 2026

@fffrog Do we have signup instructions documented anywhere?

Hi @ZainRizvi,

Sure thing. We've added a dedicated documentation page for this. It's currently a preliminary version and will be updated as the CRCR move forward.

As mentioned in the previous comments by @can-gaa-hou, the link is https://github.com/pytorch/pytorch/blob/main/docs/source/accelerator/ci.md. However, the workflow seems to be consistently failing due to timeouts.

As a result, the rendered pages aren't being generated correctly, so users can currently only access the raw Markdown file rather than the properly formatted web pages.

Could you please take a look at this workflow? @can-gaa-hou tried to fix it, but it seems to be beyond our scope/control

@ZainRizvi
Copy link
Copy Markdown
Contributor

However, the workflow seems to be consistently failing due to timeouts.
As a result, the rendered pages aren't being generated correctly, so users can currently only access the raw Markdown file rather than the properly formatted web pages.

Thanks for the pointer, will raise the problem internally. There was one bug causing that workflow to fail that was fixed already, but there seems to be another one causing the deployment step to fail as well: https://github.com/pytorch/docs/actions/workflows/pages/pages-build-deployment

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants