Skip to content

Conversation

@rainwoodman
Copy link
Contributor

This is a brief version of r2c /c2r benchmark program.

There are issues. But I can already observe with a 384x384x384 mesh pfft is slower than fftw running on a single rank by about 10%.

I am not quite sure how to fix the large FFTW errors. Also needs a commandline flag to add PADDED support.

[yfeng1@waterfall tests]$ ./bench_r2c -pfft_n 384 384 384 -pfft_cmp_fftw -pfft_inplace -pfft_patience 0 -pfft_destroy_input
******************************************************************************************************
* Computation of loops=1 parallel forward and backward FFTs (change with -pfft_loops *)
* for n[0] x n[1] x n[2] = 384 x 384 x 384 Fourier coefficients (change with -pfft_n * * *)
* on  np[0] x np[1] x np[2] = 1 x 1 x 1 processes (change with -pfft_np * * *)
* with:
*      - non-transposed data layout (change with -pfft_transposed)
*      - non-verbose output (change with -pfft_verbose)
*      - in-place transforms (change with -pfft_inplace)
*      - disabled decomposition comparison (change with -pfft_cmp_decomp)
*      - enabled FFTW comparison (change with -pfft_cmp_fftw)
*      - disabled comparison of all planner flags (change with -pfft_cmp_flags)
*      - disabled output of internal PFFT timer (change with -pfft_timer)
*      - pfft_flags = PFFT_ESTIMATE | PFFT_NO_TUNE | PFFT_DESTROY_INPUT
*        (change with [-pfft_patience  0|1|2|3] [-pfft_tune] [-pfft_destroy_input])
*******************************************************************************************************

!!! Warning: inplace transforms do not support DESTROY_INPUT flag !!!
* PFFT runtimes (1d data decomposition):
Flags: PFFT_NO_TUNE, PFFT_ESTIMATE, PFFT_DESTROY_INPUT, 
tune_forw = 2.58e-03; tune_back = 2.56e-03, exec_forw/loops = 1.34e+00, exec_back/loops = 1.35e+00
error = 6.44e-14

* FFTW_MPI runtimes (1d data decomposition):
Flags: FFTW_ESTIMATE, FFTW_PRESERVE_INPUT
tune_forw = 2.89e-03; tune_back = 1.21e-04, exec_forw/loops = 1.21e+00, exec_back/loops = 1.21e+00
error = 9.48e+02
Flags: FFTW_MEASURE, FFTW_PRESERVE_INPUT
tune_forw = 1.34e+01; tune_back = 1.13e-04, exec_forw/loops = 9.63e-01, exec_back/loops = 9.61e-01
error = 9.48e+02
* serial FFTW runtimes (no data decomposition at all):
Flags: FFTW_ESTIMATE, FFTW_PRESERVE_INPUT
tune_forw = 1.26e-04; tune_back = 7.99e-05, exec_forw/loops = 9.62e-01, exec_back/loops = 9.62e-01
error = 9.48e+02
Flags: FFTW_MEASURE, FFTW_PRESERVE_INPUT
tune_forw = 1.29e-04; tune_back = 8.11e-05, exec_forw/loops = 9.61e-01, exec_back/loops = 9.64e-01
error = 9.48e+02

The FFTW runs are giving wrong results. Also PADDED is added manually.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant