-
Notifications
You must be signed in to change notification settings - Fork 31.3k
Fix fp8 + some enhancement #42455
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Fix fp8 + some enhancement #42455
Conversation
Co-authored-by: Yang Kai <kai.yang@intel.com>
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
|
There's one more question: For this test, should we explicitly add: Or, considering the case Could I have your thoughts on this? Thanks! I wrote a simple reproduction script to observe this situation: The script output is: |
Any idea where the Pytorch catching allocator happens ? We have our own caching allocator but it happens after |
Sorry for the confusion. Regarding torch, I was referring to here. Understood, I'll wait for the fix. Thanks! |
|
btw @YangKai0616, even when setting |
Using this PR, I can get the expected output as follows: My testing environment is: But I don't have a |
|
[For maintainers] Suggested jobs to run (before merge) run-slow: finegrained_fp8, mxfp4 |
|
Thanks for confirming that it works on your hardware ! I will update it so that it doesn't fail on your side too |
|
for the multi-gpu tests, I will probably fix this in a follow-up PR as I will need to update a lot of methods |
What does this PR do?
This PR fixes a bunch of code related to fp8 + some enhancement to make the code simpler to maintain.
Related issue #42442
Thanks to @YangKai0616 for spotting those.