Open
Conversation
…inline a bunch of the function calls in order to begin OpenACC implementation
… face-initializations). It is slow right now due to too much data transfer. Will optimize this once all portions of the compute are running on the GPU and producing the correct results.
…alculated. Next step is to figure out how to parallelize over energy groups.
… we only launch one kernel per octant iteration instead of 3
… all 5 loops. Also collapsed some of the inner-computational loops. Still need to resolve the issue of the spacial loops. Can't collapse when using not-equals...
* Could not parallelize spacial dimensions due to the unpredictable direction of the sweep. This change addresses the need by: * Rewrote the sweep to only sweep in one direction. This allowed me to parallelize all 3 loops. Related/future task(s): * Potentially tweaking which loops are collapsed in the gang layer and which ones are collapsed in the vector layer. We are at the tuning/optimization stage now.
…uding the spacial parallelization
…massive data overhead because the local array must increase in size dramatically
…t array access. This should give us better memory coalescence.
…KBA threading pattern
…ll 8 directions asynchronously. Each octant runs a gang-parallel KBA wavefront iteration with vector-parallel in-gridcell computations
… in your cmake file will enable the OpenACC version of the code
…as well as devices within nodes if enough ranks are used.
…g an issue when building for multicore CPU
…ollide. This is not good for performance
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Merging the OpenACC version of Minisweep.