Skip to content

Impoving OpenMP acceleration  #6

@johnaparker

Description

@johnaparker

The current OpenMP implementation is not optimal (2x performance for 8 threads). There are two parts to OpenMP acceleration: building the interaction matrix (A), and solving the linear system Ax = b.

The issue for building A is probably load-balancing: the A/B vsh translation coefficients involve recursion relations that depend on inter-particle separation. Some threads will finish before others.

The issue for solving Ax = b is less obvious. Since this is a widely famous problem, it's worth looking into existing software solutions.

There are a few things that can easily be parallelized: source decomposition, cross-section evaluation, force/torque evaluation, E/H field evaluation

Lastly, there are two algorithm optimizations not being used:

  1. Using rotation-translation-rotation algorithm to construct A matrix
  2. There might exist an optimal solver for the linear system based on the physical problem, see Xu papers.

Metadata

Metadata

Assignees

No one assigned

    Labels

    performanceIssues related to code performance

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions