ARM: MultiHeadAttention fp16s/a bf16s#4139
Closed
EdVince wants to merge 4 commits intoTencent:masterfrom
Closed
Conversation
Codecov Report
@@ Coverage Diff @@
## master #4139 +/- ##
==========================================
- Coverage 94.43% 94.40% -0.04%
==========================================
Files 748 749 +1
Lines 179004 180668 +1664
==========================================
+ Hits 169046 170551 +1505
- Misses 9958 10117 +159
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. |
1cfd819 to
127f414
Compare
Contributor
Author
|
这几个fail掉的都是在test的时候精度不够报错的。
up有什么看法呢?不知道是我写的有问题,还是mha计算链太长了,全用fp16sa精度遭不住,在0周围精度崩掉了。 |
8a8ec47 to
92d6fc5
Compare
Contributor
|
写一下 int8 的呗... 这么做和 PR 3940 是冲突的... |
Contributor
Author
3940我看您好像都做完了呀,而且您的int8是naive实现呀,应该不会冲突的吧? |
Contributor
|
没写 arm 的 int8 |
Contributor
Author
好嘞,不过我得先去学一下int8是咋整的 |
Contributor
|
很简单的,就是 weight/input 用 int8。 softmax 那个地方量化了会掉点,我试过 int4 softmax. |
Contributor
|
哪天有空我得再试试 int4 softmax,贼心不死。 |
Member
|
move to #4463 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
现在的arm的multiheadattention只有我好几个月前pr的neon fp32 pack4的实现,这次pr把剩下的补齐: