-
Notifications
You must be signed in to change notification settings - Fork 41
Use FP8_e5m2 automatically when using quantized kv cache FP8 on trillium #1136
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
3fde814 to
2669cde
Compare
Signed-off-by: zixi-qi <qizixi@meta.com>
2669cde to
e12ab16
Compare
Also, this shouldn't be the case? Can you verify again? |
Signed-off-by: zixi-qi <qizixi@meta.com>
I verified multiple times and still have the issue. Here is my setup: full server log: @kyuyeunk Would you mind sharing some suggestions on how to debug this issue further? Thanks in advance! |
|
Thanks for investigating this. I do have some vague guess on where the issue is coming from. Instead of using |
Noob question, is
But anyways even after reverting this change, I still see the same error: |
|
ah okay. I think i know what the problem is. I believe i used some latest feature of jax when I wrote this: #818 what is your jax version? and can you update it to the latest one? I have verified that your PR works without error (and automatically interprets fp8 as fp8_e5m2) when I've ran the command you've pasted |
Yeah seems this is the issue thanks for debugging this! Now e2e passes |
kyuyeunk
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm. but will require our ci to pass before being merged.
i'll run it manually and get back to you when i have the results (might take some time. like few days at most ¯\(ツ)/¯)
Description
Implement changes described in #1112 to use FP8_e5m2 automatically when using quantized kv cache FP8 on trillium.
Tests
Checklist
Before submitting this PR, please make sure: