Merge MPI version with changes in develop by JamieJQuinn · Pull Request #35 · Trovemaster/TROVE

JamieJQuinn · 2021-06-02T16:25:50Z

This will contain all the changes required for MPI with TROVE. More functionality will be added via additional pull requests targetting this branch (merge-develop-mpi).
The comments below reflect the original changes (until 4096f62)

This PR implements optional MPI in TROVE. This includes

initial development by Arjen
modification of MPI implementation to allow for compilation and running on non-MPI systems
merging of changes made to develop since the MPI development started in early 2019

This has been tested against the existing benchmarks using the following combinations:

non-MPI gfortran
non-MPI intel
Intel MPI + ifort (on CSD3)
OpenMPI + gfortran (on local laptop)

The intensity calculation currently does not work with MPI enabled.

…ting of split files, reading of MPI-IO formatted files, some pblas experiments

… sure data is distributed on read in perturbation.f90

[perturbation.f90] mpi_aux compatibility update

…gle-file only for now)

…ion.f90, clean output

…arge cases, work around by transposing manually and passing as 'T' instead of 'N'

…duce-to-root is fine)

…r call to symm_mat_element_vector_k. Old method caused massive (3x) slowdown, now eliminated.

- blacs_ctxt not used elsewhere, no need to be public

Co-authored-by: ageorgou <1186102+ageorgou@users.noreply.github.com>

Refactor comparison script and add testing of intensity log

…tell which files are differing and where

… integration

* Start running with Intel compiler * Use correct compiler command I don't like this but it seems there's no way to conditionally set the environment variable. See also actions/runner#409 Will possibly change to conditional steps later. * Remember Intel-related parameters The scripts set the PATH (and maybe other variables?) but those are not preserved across steps. Since the build step is becoming more distinct between the two compilers, let's try splitting it in two conditional ones. * Try caching Intel installation * Look for Intel libraries before tests * Try to cache whole directory * Allow running tests with MPI * Fix syntax * Install MPI compilers when needed * Try to force use of ifort with MPI By default, the Intel version of mpif90 uses gfortran. We can either switch to using mpiifort instead, or try to control the underlying compiler this way. See also, for example: https://www.hpc.cineca.it/center_news/important-use-intel-mpi-wrappers-mpif90-mpicc-mpicxx * Make sure cache names don't clash with/without MPI * Update variables so MKL and BLACS can be found * Fix MKL installation * Show some more info and build faster on CI * Only use one MPI process on CI for debugging See if the error still happens when testing. * Avoid reading from stdin * Build faster on CI with gfortran * Avoid segmentation fault with quick gfortran build This reverts commit b8413c6. * Replicate execution on cluster temporarily Adding the mpiio option may be required for now but will eventually be removed. * Don't test file_intensity with MPI It's currently failing when run with MPI and > 1 processes (not just on GitHub Actions, also on CSD3). * Simplify USE_MPI in Makefile and CI Now behaving similar to the other Makefile variables. This also lets us make the GitHub Actions job a bit simpler. * Don't install MKL twice when using MPI, clean up

Previously there were two versions of this function, `symm_mat_element_vector` and `symm_mat_element_vector_k`, each dealing with a different symmetry in the molecule. These have been refactored into one function. This means molecules with euler symmetry can be processed with the MPI version of TROVE (at least for e.g. file1 of the CH4 benchmark).

ageorgou · 2021-08-09T14:23:22Z

perturbation.f90

    end do
    !
    ! Collect all pre-calculated hcontr values to MPI root. Non-local values have been initialised to 0 so it's safe to just do
-    ! MPI_SUM.


Was this removed on purpose?

Yep, didn't seem particularly relevant.

ageorgou · 2021-08-09T14:27:10Z

perturbation.f90

               hcontr(ideg,jdeg,jrow) = 0.0_rk
             else
-               hcontr(ideg,jdeg,jrow) = func(icontr,jcontr,jrot,k_i,k_j,tau_i,tau_j)
+               if (job%rotsym_do) then


Haven't checked but I'm assuming job is in scope here. Even so, Is it worth having an explicit extra argument do_rotsym in the subroutine instead of relying on the global?

Job seems to be a variable at module-scope so yes, it should be available here. I think it probably is worth having this as an explicit argument actually, yes.

ageorgou · 2021-08-09T14:31:29Z

fields.f90

+   if (trim(trove%symmetry)=='C2VN') then
+     if (sym%N<job%bset(0)%range(2)) then


Is this just for readability? Does the .and. not shortcircuit?

This fixes a segfault so I'm guessing either sym%N or job%bset(0)%range(2) are only allocated when trove%symmetry)=='C2VN'.

I assumed it wouldn't matter but apparently short-circuting is not standard! (even though gfortran with optimisations supports it, for instance)

ageorgou · 2021-08-09T14:35:22Z

fields.f90

         kindex = 0 ; kindex(imode) = 1
         !
-         coordtransform(:,imode) = FLvect_finitediffs(job_is,trove%Ncoords,kindex,q_eq,step,irho)
+         coordtransform(:,imode) = FLvect_finitediffs(job_is,trove%Nmodes,kindex,q_eq,step,irho)


This seems like a big thing to have been missed for so long! Do the benchmark problems have Nmodes == Ncoords?

The changed argument actually sets the size of the output of the function FLvect_finitediffs, which is then assigned to coordtransform. The first dim of coordtransform is of length trove%Nmodes which is inconsistent with the value passed. In this situation, intel fortran was broadcasting only what fit into the output.

hi Jamie,
Please be careful with this parameter. I can see that there is a conflict with the definition of the array coordtransform. However trove%Ncoords was correct here. I am pretty sure using trove%Nmodes will break cases where trove%Nmodes/=trove%Ncoords such as CH4.
I will try to understand the source of this bug. Most likely it is the definition of coordtransform.

Never mind, Jamie. I think you are right and trove%Nmodes should give the correct size for these vectors.

…time to deal with these errors!)

- fixes incorrect filename `matelem.chk` - fixes unassociated array access - fixes unit tests to only check intensity output if MPI disabled

ArjenTamerus and others added 30 commits February 4, 2019 16:22

More gfortran fixes

5bc1f51

Merge branches + fix some string formats, memory bug

8540f56

gfortran-compatible MPI

146a947

Fix MPI + gfortran, use MPI datatype for data exchange

74a221a

co_write_matrix_distr: don't write out empty buffer columns

f121d38

perturbation.f90 - WIP MPI-IO file formatting work + write fixes

970c162

coarray_aux => mpi_aux -- no longer using coarrays

763a291

Work around slow MPI-IO write on some systems, implement parallel wri…

ac0fd70

…ting of split files, reading of MPI-IO formatted files, some pblas experiments

Remove accidentally committed test code

64e38db

Some MPI documentation + SLURM script + makefile update

3f68ac0

Commented out some unfinished test code

75a7220

Fix issues w/ parallel I/O, avoids double write in tran.f90 and makes…

4cff7f4

… sure data is distributed on read in perturbation.f90

[mpi_aux.f90] Fixed blacs implementation + cleanup

0d17cbc

[perturbation.f90] mpi_aux compatibility update

[mpi_aux.f90] return allocation status in co_block_type_init

08c1c06

[tran.f90] Implement Parallel DGEMM to enable MPI/memory scaling (sin…

727916c

…gle-file only for now)

[tran.f90] Fix x/y dimension confusion, plus minor cleanup

9870c34

Implement SPLIT file i/o for MPI version

6c4d732

Fix MPI/POSIX IO switch, fix calculation of lower matrix in perturbat…

ab20174

…ion.f90, clean output

gfortran fixes

e37d217

Bugfix: only call co_sum if select_gamma(isym) is true

0206dae

Workaround: MKL's scalapack crashes with (correct) input on psi for l…

13af4b4

…arge cases, work around by transposing manually and passing as 'T' instead of 'N'

Clean up redundant statements + minor makefile fix

a8ed85f

[input.f90] output 'echo' statements only on master process

ab66ff7

[perturbation.f90]Fix small mistake in co_sum call (allreduce when re…

2f3edf8

…duce-to-root is fine)

Rough fix to avoid 'Non-diagonal element' issue

88913fd

Improve co_sum workaround by calling reduction operation only once pe…

696463e

…r call to symm_mat_element_vector_k. Old method caused massive (3x) slowdown, now eliminated.

Refactored MPI-IO to be more robust across implementations

aa47a1b

Pretend we're always using MPI

b9060cd

- blacs_ctxt not used elsewhere, no need to be public

Use dummy MPI type in co_block_type_init

7e8b53a

Use correct BLACS process numbering

c5a07b2

JamieJQuinn and others added 12 commits June 2, 2021 16:31

Don't strip comments when running quantum energy comparison

0d64c38

Sort intensity numbers by relevant ID before comparison

b8d19a7

Implement error checking when finding block in chkpoint file

964ce80

Update test/compare_results.py

4da59e1

Co-authored-by: ageorgou <1186102+ageorgou@users.noreply.github.com>

Handle paths maturely

11d0127

Merge pull request #32 from Trovemaster/feature/test-intensity-log

8bfa19a

Refactor comparison script and add testing of intensity log

Cleanup makefile (fixes #33)

3d8d80a

Merge branch 'develop' into merge-develop-mpi

188e27e

Improve error handling in comparison script; it's now much easier to …

042127b

…tell which files are differing and where

TROVE built with gfortran -O0 segfaults; change to -O2 for continuous…

e2e352b

… integration

Incorporate MPI options into makefile

1250f52

Move traceback flag to debug options

4096f62

JamieJQuinn marked this pull request as ready for review June 3, 2021 16:13

JamieJQuinn requested a review from ageorgou June 3, 2021 16:13

ageorgou mentioned this pull request Jun 4, 2021

Unified version with and without MPI #30

Closed

ageorgou and others added 8 commits June 29, 2021 17:01

Properly copy array, don't just re-reference (fixes double free error)

9cdb17f

Fix array bounds error

ec16a55

Fix more array bounds errors

65a9004

Comment out segfaulting deallocation

a1d90ab

Assume openmpi if using gfortran and add more debug compilation flags

313d098

Fix bash errors

c9639d6

ageorgou reviewed Aug 9, 2021

View reviewed changes

JamieJQuinn added 3 commits August 10, 2021 18:52

Fix unassociated array access

846b278

Disable some runtime checks in debug compiler flags (just don't have …

2e949c9

…time to deal with these errors!)

Fix incorrect filename and fix spacing in error message

3a2b77c

JamieJQuinn mentioned this pull request Aug 11, 2021

Euler symmetry not implemented with MPI #38

Open

JamieJQuinn and others added 2 commits August 11, 2021 11:39

Only compare intensity output if MPI disabled

a754bd9

Merge pull request #39 from Trovemaster/fix/ch4-file2-name

193f693

- fixes incorrect filename `matelem.chk` - fixes unassociated array access - fixes unit tests to only check intensity output if MPI disabled

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Merge MPI version with changes in develop#35

Merge MPI version with changes in develop#35
JamieJQuinn wants to merge 83 commits intodevelopfrom
merge-develop-mpi

JamieJQuinn commented Jun 2, 2021 •

edited by ageorgou

Loading

Uh oh!

ageorgou Aug 9, 2021

Uh oh!

JamieJQuinn Aug 9, 2021

Uh oh!

ageorgou Aug 9, 2021

Uh oh!

JamieJQuinn Aug 9, 2021

Uh oh!

ageorgou Aug 9, 2021

Uh oh!

JamieJQuinn Aug 9, 2021

Uh oh!

ageorgou Aug 9, 2021

Uh oh!

ageorgou Aug 9, 2021

Uh oh!

JamieJQuinn Aug 9, 2021

Uh oh!

Trovemaster Aug 9, 2021

Uh oh!

Trovemaster Aug 9, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Comments

		if (trim(trove%symmetry)=='C2VN') then
		if (sym%N<job%bset(0)%range(2)) then

Conversation

JamieJQuinn commented Jun 2, 2021 • edited by ageorgou Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Comments

JamieJQuinn commented Jun 2, 2021 •

edited by ageorgou

Loading