Skip to content

Conversation

@rithwik-db
Copy link
Collaborator

@rithwik-db rithwik-db commented Mar 7, 2025

What does this PR do?

Updated PyTorch like in #167, but also disabling sparse support (only supporting grouped support) until we fix the issue with triton. Also updated pycln version that's breaking pre-commit checks.

For full context:

triton 3.2.0 introduced some change to how it handles dtype promotion when two binary operands have different dtypes, and as a result we're encountering an int16 overflow in the stk dependency of megablocks which results in an illegal memory access (IMA). A fix like https://github.com/stanford-futuredata/stk/pull/17/files needs to be applied to to stk instead of megablocks or triton to resolve this initial issue. Even if we don't hit an IMA, not having this casting will lead to different outputs in triton 3.1 and triton 3.2 given the same inputs.

@dakinggg
Copy link
Contributor

dakinggg commented Mar 7, 2025

Please add more information to the PR description explaining why we are doing this.

@XiaohanZhangCMU
Copy link

XiaohanZhangCMU commented Mar 10, 2025

The change and PR description LGTM.

Although, I'm not sure why the tests still showing pytorch2.5.1 even though we already changed the workflow.

@rithwik-db rithwik-db requested a review from dakinggg March 11, 2025 19:43
@rithwik-db rithwik-db requested a review from dakinggg March 11, 2025 23:42
@rithwik-db rithwik-db merged commit cf059d9 into databricks:main Mar 12, 2025
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants