-
Notifications
You must be signed in to change notification settings - Fork 15
Open
Labels
performanceIssues related to code performanceIssues related to code performance
Description
The current OpenMP implementation is not optimal (2x performance for 8 threads). There are two parts to OpenMP acceleration: building the interaction matrix (A), and solving the linear system Ax = b.
The issue for building A is probably load-balancing: the A/B vsh translation coefficients involve recursion relations that depend on inter-particle separation. Some threads will finish before others.
The issue for solving Ax = b is less obvious. Since this is a widely famous problem, it's worth looking into existing software solutions.
There are a few things that can easily be parallelized: source decomposition, cross-section evaluation, force/torque evaluation, E/H field evaluation
Lastly, there are two algorithm optimizations not being used:
- Using rotation-translation-rotation algorithm to construct A matrix
- There might exist an optimal solver for the linear system based on the physical problem, see Xu papers.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
performanceIssues related to code performanceIssues related to code performance