Asynchronous combi WIP by obersteiner · Pull Request #9 · SGpp/DisCoTec

obersteiner · 2020-04-07T11:50:19Z

No description provided.

…nousCombi

Also changed the file to show relative error instead whatever wa sthere before

uncommented TIMING definition, for plots. Add modified test_manger that can handle any dimensions

…rtion to avoid segfault in case of multiple processes per group

…archized values

…th minimal number of hierarchizations and dehierarchizations

freifrauvonbleifrei · 2021-01-13T08:42:35Z

distributedcombigrid/src/sgpp/distributedcombigrid/task/Task.hpp

+  static MPI_Request *requestAsync;
+
+  /** SubspaceSized for Async Reduce*/
+  static std::vector<int> *subspaceSizes;


I feel like this should not be a static member of Task because it already exists in DistributedSparseGridUniform, and I think that this is where it should be.

As far as I know this is in task because the Sparse Grid gets destroyed frequently but this information should be constant over multiple combinations.

But this is just an optimization so that the sparse grid sizes do not need to be communicated again. For the request array I would need to investigate if it is necessary to be in task.

But one could also put the subspaceSizes as static member of the SparseGrid class I think...

Yes, you are right, we can leave it like this here. on third-level, subspacesDataSizes_ is introduced which makes this redundant. I had not noticed that it wasn't on master.

freifrauvonbleifrei · 2021-01-13T08:49:47Z

I am just investigating a merge into https://github.com/SGpp/DisCoTec/tree/third-level , but seems that -- unsurprisingly -- some of the distributed sparse grid data management has changed for the widely distributed combination technique.

This is why I thought it was the right time to leave a comment on the (in my opinion unneccessary) member Task::subspaceSizes .

Quoting @mhurler123 in case he has an opinion on that ;)

freifrauvonbleifrei · 2021-01-13T09:14:13Z

distributedcombigrid/src/sgpp/distributedcombigrid/manager/ProcessGroupWorker.cpp

+
+  Stats::startEvent("combine global reduce init");
+
+  Task::bufAsync = new std::vector<CombiDataType>[numGrids];


memory leak?

This could be the case. We could add a delete before. Or switch to smart pointers.

or maybe make this a member of DistributedFullGridUniform? that is also connected to the next question...

The buffer is deleted in line 854

We could also put this as a member to the fullgrid directly. But I am not sure if this is improves things.

freifrauvonbleifrei · 2021-01-13T10:21:26Z

distributedcombigrid/src/sgpp/distributedcombigrid/manager/ProcessGroupWorker.cpp

+  }
+
+  for(int g=0; g<numGrids; g++){
+    CombiCom::distributedGlobalReduceAsyncInit( *combinedUniDSGVector_[g], Task::subspaceSizes[g], Task::bufAsync[g], Task::requestAsync[g]);


actually, I don't get why this is done once for each task -- a single time should suffice, right?

it is only done a single time. The numGrids does not refer to the number of tasks but to the number of grids each task owns. Usually numGrids is 1 but for example with multiple species in GENE numGrids=nspecies.

freifrauvonbleifrei · 2021-01-13T12:23:21Z

distributedcombigrid/examples/combiAsync_example/TaskExample.hpp

+    assert(initialized_);
+
+    int lrank;
+    MPI_Comm_rank(lcomm, &lrank);


=> auto lrank = theMPISystem()->getLocalRank(); -- use the buffered rank

oh, it's unused anyways ;)

freifrauvonbleifrei · 2023-09-08T09:13:12Z

oops, this was unintentionally closed by making the main branch default. reopening as #101.

Your Name and others added 22 commits May 2, 2018 10:51

Added Combine Async

fd56602

Merge remote-tracking branch 'origin/combi_gene_faults' into asynchro…

addbdda

…nousCombi

This is the async changes that were made.

d414343

added a new example file

37395c3

Merge remote-tracking branch 'origin/combi_gene_faults' into asynchro…

6f16d65

…nousCombi

Added test all ainstead of exlplicitly changing the code

7b9f4b7

Also changed the file to show relative error instead whatever wa sthere before

Added CombiAsyncOddEven scheme with only one deheirarchization.

b9888a8

uncommented TIMING definition, for plots. Add modified test_manger that can handle any dimensions

Merge remote-tracking branch 'origin/master' into asynchronousCombi

b0cce06

fixed compiler issues + tests are now running

a88638d

Merge remote-tracking branch 'origin/master' into asynchronousCombi

60e4baa

fixed issues with compilation; still need to check why TIMING is set

a0bcf8f

Merge remote-tracking branch 'origin/master' into asynchronousCombi

181b3a9

added Makefile in Async example to adjust_examples + added template

d9f49d1

fixed example: made destructor public + removed warnings + added asse…

c06914a

…rtion to avoid segfault in case of multiple processes per group

generalized run.sh

e1c0d35

added valid configuration for ctparam

6675303

added comments to ctparam

33c91f7

fixed example for new combi framework version

56df7f8

removed unnecessary dehierarchization by saving the intermediate hier…

01ae059

…archized values

simplified code

1bbee65

fix if subspaces change + final combi + fix for oddeven

3c6a569

changed to nodal save of old values + simplified code -> works now wi…

f31416c

…th minimal number of hierarchizations and dehierarchizations

freifrauvonbleifrei reviewed Jan 13, 2021

View reviewed changes

vancraar deleted the branch master January 16, 2023 14:49

vancraar closed this Jan 16, 2023


		Stats::startEvent("combine global reduce init");

		Task::bufAsync = new std::vector<CombiDataType>[numGrids];

Conversation

obersteiner commented Apr 7, 2020

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

freifrauvonbleifrei commented Jan 13, 2021

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

obersteiner Jan 14, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

freifrauvonbleifrei commented Sep 8, 2023

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

3 participants

obersteiner Jan 14, 2021 •

edited

Loading