-
Notifications
You must be signed in to change notification settings - Fork 22
direct numerical simulation
License
mt5555/dns
Folders and files
| Name | Name | Last commit message | Last commit date | |
|---|---|---|---|---|
Repository files navigation
Copyright 2007. Los Alamos National Security, LLC. This material was
produced under U.S. Government contract DE-AC52-06NA25396 for Los
Alamos National Laboratory (LANL), which is operated by Los Alamos
National Security, LLC for the U.S. Department of Energy. The
U.S. Government has rights to use, reproduce, and distribute this
software. NEITHER THE GOVERNMENT NOR LOS ALAMOS NATIONAL SECURITY,
LLC MAKES ANY WARRANTY, EXPRESS OR IMPLIED, OR ASSUMES ANY LIABILITY
FOR THE USE OF THIS SOFTWARE. If software is modified to produce
derivative works, such modified software should be clearly marked, so
as not to confuse it with the version available from LANL.
Additionally, this program is free software; you can redistribute it
and/or modify it under the terms of the GNU General Public License as
published by the Free Software Foundation; either version 2 of the
License, or (at your option) any later version. Accordingly, this
program is distributed in the hope that it will be useful, but WITHOUT
ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
for more details.
===================================================================
Sandia/LANL DNS code
Mark Taylor
mataylo@sandia.gov
===================================================================
README: updated 3/19/2013:
added 'pseudospectral.pdf' to svn repo
README: updated 2/20/2009:
Added support for the SDSC P3DFFT parallel FFT library.
P3DFFT is up to 2x faster than the our internal transpose + FFTW code.
To use P3DFFT, build the "dnsp3" model.
This obsoletes are previous "dnsp" optimized model.
README: updated 2/5/2009:
Added new optimizations to dnsp code to skip
on processor transpose_to/from_y operations.
To enable, padding must be "0 2 0" and ref decomp must be
y-pencil. The vorticity3 trick uses some ffts with this optimization
and some ffts without, so one should benchmark with and without "-nov3"
To verify that this optimization is enabled, the code will
print this message to stdout:
"Ref decomp is y-pencil decomp: skipping transpose_to_y calls"
README: updated 9/26/2008:
Updated to note that for the pure x-y slab decomposition
for a problem of size N^3, NCPUS can be as large as N.
(we used to have to switch to a pencil decomposition for
NCPUS>N/2 )
README: updated 7/22/2008:
minor edits
README: updated 1/20/2008:
We now support the exact dealiasing based on phase shifting +
spherical truncation. To measure/test dealiasing error, see
testing/dealias.sh.
README: updated 8/16/2007:
For pencil decompositions, we now (8/15/2006) have a more efficient
model "dnsp" that should be used instead of "dns" (but dnsp does not
yet allow for passive tracers) To compile, type "make dnsp" instead of
"make dns", and the executable will be named "dnsp".
===================================================================
This directory and its subdirectories contain the Sandia/LANL DNS
code. This code solves viscous fluid dynamics equations in a periodic
rectangular domain (2D or 3D) with a pseudo-spectral method or 4th
order finite differences and with the standard RK4 time stepping
scheme.
The code is documented in
[1] Taylor, Kurien and Eyink, Phys. Rev. E 68, 2003.
[2] Kurien and Taylor, Los Alamos Science 29, 2005.
[3] 'pseudospectral.pdf' included with the source code
The code has options to solve the following equations:
1. Navier-Stokes (primitive variables) 2D and 3D
2. Navier-Stokes (stream function/vorticity) 2D only
2. Lagrangian Averaged Navier Stokes (The Alpha Model) 2D and 3D
3. Shallow water and Shallow water alpha 2D only
4. Boussinesque (with stratification) 3D only
In can optionally allow for rotation, arbitrary aspect ratio and an
arbitrary number of passive scalars. It is an MPI code and can use a
3D domain decomposition, (allowing for slab decomposition, pencil
decomposition or cube decomposition). For a grid of size N^3, it can
run on up to N^3/8 processors. It has been run on as many as 18432
processors (with grids up to 4096^3)
This README file contains instructions to compile and run
the purest form of the code: Navier-Stokes with deterministic
low waver number forcing, use RK4 and a pseudo-spectral
method. The equations are solved in a triply periodic cube
of side length 1 with no passive scalars or rotation.
The deterministic low wave number forcing is a simplified
version of Overholt and Pope, Comput. Fluids 27 1998, and
is documented in detail in [1]. Some more, but incomplete documentation
for this code are in the files: pseudospectral.tex and dns.doc
Documentation for running the Boussinesq version is in
rotation_doc/rotation.tex.
The steps required to compile and run the code are:
0. Choose the resolution
1. set up the grid dimensions
2. compile the code
3. edit the input file, forcing12.inp or forcing12-bench.inp
4. run
5. validate the results
6. analysis of output
Step 0.
Choose the resolution and resolution requirements.
The parameters set by the user are the grid resolution and viscosity
coefficieint. The forcing we use is such that epsilon=3.58 and
KE=1.89, and so the eddy turnover time is 1.05. (These values are
from N=1024^3. At other values of N, they may change slightly)
But if we assume these values of KE and epsilon, we can then
determine the correct viscosity to use for a given N:
grid of size: N^3
resolution constraint: G = eta*kmax
Suggested value of G for this problem (forced low wave number
turbulence) G=1.0. (For improved resolution in the viscous regime,
some people use the more restrictive G=1.5)
The code supports several types of dealasing:
1. partial spherical (the most efficient, not fully dealiased)
2. 2/3 rule (the most efficient in terms of retained modes)
3. phase shifting (the most efficient in terms of maximum spherical
wave number)
with phase shifting or partial spherical, kmax = 2*pi*N*sqrt(2)/3
with 2/3 dealising: kmax = 2*pi*N/3
(the 2pi shows up in the wave number because our box is of side
length 1).
Given the choice of G and N, we can then determine the viscosity.
Using that eta = (mu^3/epsilon)^.25, we have:
mu = epsilon**(1/3) * [G/kmax]**4/3
The resulting Taylor Reynolds number:
Rl = KE sqrt(20/(3*mu*epsilon))
Example 1
1024^3, with 2/3 dealiasing and G=1. The viscosity
coefficient used should be: mu = 5.5e-5.
The expected Rl = 347
Example 2:
1024^3, with spherical dealiasing and G=1. The viscosity
coefficient used should be: mu = 3.5e-5
The expected Rl = 437
Example 3:
64^3, with spherical dealiasing and G=1. The viscosity
coefficient used should be: mu = .0014
The expected Rl = 69
Step 1.
Set up the grid dimensions.
I use a python script, 'gridsetup.py' which will create the
needed 'params.h' file. To run an N^3 problem, with a
domain decompostion of nx*ny*nz (so NCPUS=nx*ny*nz):
./gridsetup.py nx ny nz N N N 0 0 2
The 0,0,2 specifies how the arrays are padded in x,y,z directions.
0,0,2 is required when using P3DFFT. When using the internal
parallel fft, 2,0,0 is optimal. P3DFFT requires that nx=1.
Here are some examples:
For P3DFFT, which only supports slab and pencil decompositions:
A 32x32x32 grid, to run using just 1 cpu:
% cd dns/src
% ./gridsetup.py 1 1 1 32 32 32 0 0 2
A 64x64x64 grid, on 4 cpu:
(parallel decomposition: 1x1x4, so 4 hyperslabs in the z-direction)
% cd dns/src
% ./gridsetup.py 1 1 4 64 64 64 0 0 2
A 64x64x64 grid, on 64 cpu:
(parallel decomposition: 2x1x32, so a pensil decomposition)
% cd dns/src
% ./gridsetup.py 1 2 32 64 64 64 0 0 2
The DNS code's internal parallel FFT supports cube decompositions
and also supports slab decompositions up to size N:
% ./gridsetup.py 1 1 4 64 64 64 2 0 0
% ./gridsetup.py 2 2 4 64 64 64 2 0 0
% ./gridsetup.py 1 1 64 64 64 64 2 0 0
For a grid N^3, N must be a power of 2,3,5, and N/nx, N/ny and N/nz
must be an even integers (with one exception for the case
nx=1, ny=1, nz=N, which is allowed).
There are some other restrictions because
of how we wrote our transpose routines. gridsetup.py will issue
warnings if they are violated, and, if the code is run it will print
error messages and stop.
If you do not have python installed on your system, you can instead
copy the file 'params.h.test' to params.h' and then edit params.h by
hand. By default, it is set up to run a 32^3 simulation on 1 cpu.
NOTE: The most efficient configuration for large processor counts
will be with ny=1 and nx<nz. For example, to run a 4096^3 simulation
on 8192 processors, I would modify the instructions below to:
./gridsetup.py 1 4 2048 4096 4096 4096 0 0 2
Other possibilities which might be more efficient:
(we need a performance model :-)
./gridsetup.py 1 8 1024 4096 4096 4096 0 0 2
./gridsetup.py 1 16 512 4096 4096 4096 0 0 2
./gridsetup.py 1 32 256 4096 4096 4096 0 0 2
./gridsetup.py 1 64 128 4096 4096 4096 0 0 2
Step 2.
Compilation
A makefile is included which should run on Linux, SGI, OSF1 (Compaq)
AIX and SunOS, but some editing will probably be required. On Linux,
the makefile by default used the Intel F90 compiler, but you can edit
the file and switch this to Lahey or PGI. For the other systems, it
uses the vendor supplied F90 compiler.
P3DFFT: For the best performance with high resolution runs on large
processor counts, use the version of the code that used the SDSC
p3dfft() parallel FFT. Both P3DFFT and FFTW must be built and
installed in advance. Currently we require that P3DFFT be built
double precision and with the -DSTRIDE1 option. You must also
edit the makefile to build with -DUSE_P3DFFT -DUSE_FFTW and
configure appropriate include and lib paths. Then:
% cd dns/src
% make dnsp3
To use the DNS code's internal parallel FFT:
% cd dns/src
% make dns
There is also an optimized version of dns, "dnsp", with uses the DNS code's
internal parallel FFT and is but limited to pencil decompositions like
P3DFFT. But "dnsp" is no longer supported since "dnsp3" is faster.
For low resolutions on moderate numbers of processors, there is almost
no difference between dns, dnsp and dnsp3.
Step 3.
Edit the input file
There are two choices for input files for this forced case in
the src directory:
forcing12.inp runs 1 eddy turnover time, with output
forcing12-bench.inp runs 5 timesteps, only diagnostic output
reports cpu time per timestep, averaged
over the last 4 timesteps.
Parameters of interest:
viscosity coefficient mu (line 9) change to value computed above
in step 0.
derivative method (line 14) set to "fft-dealias" for 2/3 rule
(exact dealiasing)
or "fft-sphere" for spherical dealising
(partial dealiasing, suitable for k^-5/3 or steeper spectra)
or "fft-phase" for phase shifted + spherical dealising
(exact dealiasing, 2x more expensive)
time to run (line 26) time is measured in dimensional units
For this problem, 1 eddy turnover time
(after the code equilibriates) is
close to 1 dimensional time, so set
this to 1.0
Step 4.
Run the code:
To run the code on a single processor:
./dnsp3 -i forcing12-bench.inp output_name
using the input file 'forcing12-bench.inp'. This runs a triply
periodic forced turbulence problem. The forcing is in
wave numbers 1 and 2.
The output files are:
output_name0000.0000.u u component of velocity
output_name0000.0000.v v component of velocity
output_name0000.0000.w w component of velocity
output_name0000.0000.spec power spectrum (1D and spherical)
output_name0000.0000.spect transfer spectrum
output_name0000.0000.scalars KE, dissipation rates, etc...
output_name0000.0000.scalars-turb skewness, other scalars...
where 0000.0000 is the time of the snapshot. So for t=0.25,
the filename would be: output_name0000.2500.u.
For a parallel run:
mpirun -np X ./dnsp3 -mio -i forcing12.inp output_name
The "-mio" option will turn on MPI-IO. Without MPI-IO, all output
will be funneled through processor 0. With MPI-IO, the code will
still produce identical files, but will have up to M
processors write data with asynchronious non-overlapping
writes. The default value of M is 32, but it can be tweaked by
editing subroutine "mpi_io_init". M can be set to be equal to
the number of processes if your MPI-IO library and/or parallel file
system has good I/O aggregation.
These output files are also used as restart files.
The restart is exact. To do a restart run copy (or create links)
the snapshot to used to restart.u, restart.v and restart.w,
and then add the "-r" option when running the code.
Step 5.
Validate the results.
There are several test cases and scripts which run short problems
and make sure the output is identical to the reference solutions.
(These are not yet documented.)
A quick test: run the 64^3 problem, with the forcing12.inp
input file, with mu=.0014, "fft-sphere", and time=1.0.
To run this test on 2 processors:
cd src
./gridsetup.py 1 1 2 64 64 64 2 2 0
make dnsp3
./dnsp3 -i forcing12.inp
To run on more processors, see scripts/readme.job
At simulation time 1.0, the data (given in stdout) should be approximatly:
*** dns code w/ FFT99 ********************************************************
AMD Athlon PGI compiler g95 AMD X2
FC4 FC6 pentium M powerPC F7 gfortran
intel gfortran Linux RHEL4 MAC OS10.4 2.1GHZ DDR2-800
2cpu 1 core 2 core
kmax*eta = .9990 1.0214 .9839 .9626 1.0214 same
KE =1.6800 1.6265 1.6828 1.7197 1.6265
R_lambda =61.47 62.22 59.7 58.4 62.22
run time 20min 22.8 66min 12.65 8.41 4.96
2.4ghz quad core, DDR2-666, gfortran, 4 cores: 1.30min
*** dnsp code w/ FFT99 ******************************************************
intel compiler PGI compiler g95 intel on Thunderbird
AMD Athlon pentium M 2ghz powerPC intel x86_64
Linux FC4 Linux RHEL4 MAC OS10.4 Linux
CPUS = 1cpu 2cpu 1x1x32 2x1x32 4x1x32
kmax*eta = .9626 same .9990 SAME SAME
KE = 1.7197 1.6801
R_lambda = 58.4 61.47
d/dt vis = -4.1263 -3.5569
d/dt f = 4.3034 4.5351
run time = 14.5m 11.4m .42m
Other collected run times:
2.4ghz Xeon Linux, ifort, 1 cpu: 13.81
2.4ghz quad core, DDR2-666, gfortran, 1 cores: 2.97
2.4ghz quad core, DDR2-666, gfortran, 4 cores: 1.26min
*** dnsp3 code (P3DFFT) ************************************************
2.4ghz quad core, DDR2-666, gfortran, 4 cores: 1.23min
kmax*eta = 0.9641
KE = 1.6525
R_lambda = 58.9
d/dt vis = -3.747
d/dt f = 4.122
**************************************************************************
The initial condition has random phases, and so these results are
compiler and OS dependent. I haven't done detailed sensitivity study,
but you can get a sence of the fluctuations by looking at the above
numbers.
Step 6.
Looking at the data.
Not yet documented - contact Mark Taylor (mataylo@sandia.gov) for help.
There are matlab scripts in the dns/matlab directory for
reading and processing all the output produced by the code.
Some more complex analysis, such as computing structure
functions and PDFs is done with fortran programs in the dns/src
directory.
About
direct numerical simulation
Resources
License
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published