Skip to content

pfft 3D data decomposition #37

@octupole

Description

@octupole

Hi,
I have been testing the scaling of pfft on our cluster (a few thousands of broadwell nodes with 28 cores each). Although the scaling for my problem (3D grid of 128^3, quite a small grid indeed!) is satisfying, I find that its overall performance with respect to a code such as fftwpp (from Bowen's group), which uses a 2D data decomposition, is poor. Indeed, up until 64 CPU's I find that pfft is consistently 10 times slower that fftwpp. I am not surprised that using a 3D data decomposition with respect to a 2D, the performance would downgrade because of extra communications (as explained in the original paper). But the loss of performance buffles me, and I frankly think I might be doing something wrong somewhere in compiling pfft or in linking it to the system fftw3-mpi. Could you give me some clue on this?

Thank you in advance
Max.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions