-
-
Notifications
You must be signed in to change notification settings - Fork 11.3k
EAGLE Support DP>1 #26086
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
EAGLE Support DP>1 #26086
Conversation
Signed-off-by: Rémi Delacourt <remi@mistral.ai>
Signed-off-by: Rémi Delacourt <remi@mistral.ai>
|
Nice work! Can you also add a quality correctness test by using test_eagle_correctness() in DP > 1 setting? The added test just checks if the process hangs or not. |
Signed-off-by: Rémi Delacourt <remi@mistral.ai>
Thank you for pointing this out! I've just modified the test to compare outputs with an engine without eagle. |
|
Hey @benchislett @ekagra-ranjan @luccafong, any updates on this ? |
|
This pull request has merge conflicts that must be resolved before it can be |
Signed-off-by: Rémi Delacourt <remi@mistral.ai>
Signed-off-by: Rémi Delacourt <remi@mistral.ai>
Signed-off-by: Rémi Delacourt <remi@mistral.ai>
|
This pull request has merge conflicts that must be resolved before it can be |
Signed-off-by: Rémi Delacourt <54138269+Flechman@users.noreply.github.com>
Signed-off-by: Rémi Delacourt <remi@mistral.ai>
|
@benchislett this one is ready for review 🙏 |
benchislett
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks!
|
@Flechman please fix the broken test. |
Signed-off-by: Rémi Delacourt <54138269+Flechman@users.noreply.github.com>
20207fc to
62d14d2
Compare
Signed-off-by: remi <remi@mistral.ai>
Purpose
When using DP>1, the
set_forward_contextgathers the number of tokens across all DP ranks to get static batch size for all ranks for CudaGRAPH EP compatibility.Even though CudaGRAPH is not yet implemented for the newly padded-speculative design, this DP-all-reduce/gather already serves for the target model and will serve for the draft model later.
When using EAGLE and DP>1 with num_speculative_tokens=
k, and sending a single request, one DP rank will handle that request (going throughpropose()) and all other DP ranks will do a dummy forward. The dummy forward in the eagle module only performs a single forward-pass (so single DP all-reduce viaset_forward_context), and we getkeagle forward-passes duringpropose(). This leads to a "phase shift" between the DP rank doingpropose()(group A) and the DP ranks doing the dummy forward (group B): first step in group A will DP-all-reducektimes, and group B will DP-all-reduce a single time but doingksteps. If thefinish-synchappens everyksteps (in our case finish-sync happens every 32 steps), group B will enter a finish-sync all-reduce while group A is still doing the eagle DP-all-reduce because it's only entering step 2. This results in hanging.Test Plan
Add an e2e correctness test with eagle and DP=2.
Test Result
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.