Skip to content

Performance Analysis

Allen McPherson edited this page Feb 5, 2016 · 4 revisions

Some initial performance numbers (move this section elsewhere later).

Darwin, bigmem nodes, 1/6/2016:

Serial (no MPI, no global)

  • brute force: 20.2998 secs
  • mtree sampling: 820.9340 secs
  • flann sampling: 1043.4309 secs

Parallel (MPI, no global):

  • brute (1 node, 1 core): 20.7532 secs
  • brute (2 nodes, 1 core each): 11.0950 secs
  • brute (4 nodes, 1 core each): 6.2169 secs
  • brute (1 node, 2 cores): 10.9313 secs
  • brute (1 node, 4 cores): 6.2682 secs
  • brute (1 node, 8 cores): 3.9775 secs
  • brute (1 node, 16 cores): 2.1779 secs
  • mtree (4 nodes, 1 core each): 292.9982 secs
  • mtree (1 node, 4 cores): 304.5630 secs
  • mtree (1 node, 8 cores): 196.8076 secs
  • mtree (1 node, 16 cores): 109.3888 secs
  • flann (4 nodes, 1 core each): 367.2816 secs

Commands to run with MPI on Darwin:

mpirun --map-by core -np 4 ./lulesh
mpirun --map-by node -np 4 ./lulesh

Output from the mtree sampling run on 1 core:

Using adaptive sampling...
Number of Kriging models: 296 (average), 58 (min), 817 (max)
Number of (query,value) pairs: 635 (average), 205 (min), 1513 (max)
Scaled query average = 0.799788, max = 7.41582
Scaled value average = 1.15282e+11, max = 2.68319e+17
Total time: 820.9340 secs

Output of mtree sampling on 4 cores (showing imbalance in sampling/core):

Number of Kriging models: 194 (average), 62 (min), 582 (max)
Number of (query,value) pairs: 529 (average), 205 (min), 1493 (max)
Number of Kriging models: 254 (average), 122 (min), 455 (max)
Number of (query,value) pairs: 662 (average), 345 (min), 1152 (max)
Number of Kriging models: 213 (average), 98 (min), 529 (max)
Number of (query,value) pairs: 591 (average), 307 (min), 1405 (max)
Number of Kriging models: 512 (average), 270 (min), 823 (max)
Number of (query,value) pairs: 767 (average), 659 (min), 1053 (max)
Scaled query average = 0.907076, max = 3.93806
Scaled value average = 1.5597e+06, max = 1.66419e+12
Scaled query average = 0.43438, max = 7.41556
Scaled value average = 4.92135e+11, max = 2.68119e+17
Scaled query average = 0.62595, max = 5.389
Scaled value average = 1.22014e+09, max = 1.17537e+15
Scaled query average = 1.1662, max = 1.6102
Scaled value average = 174.664, max = 19471.3
Total time: 292.9982 secs

Output with non-global FLANN run on one core:

Using adaptive sampling...
Using FLANN library...
   flann_n_trees: 1
   flann_n_checks: 20
Number of Kriging models: 218 (average), 4 (min), 797 (max)
Number of (query,value) pairs: 444 (average), 16 (min), 1108 (max)
Scaled query average = 0.780871, max = 4.68305
Scaled value average = 1.24271e+08, max = 1.31932e+14
Total time: 1043.4309 secs

In related testing, using the latest ( 709f5d9874101da12ebe62dd6fc9581c7043e30a ) revision and all running without MPI

For Flann, non-global (-f -s) ran for 515 steps. Global (-f -g -s -r) segfaulted after step 121

For m-tree, non-global (-s) ran for 526 steps. Global (-s -r -g), it slows down drastically around the same point (post step 120) and segfaults after step 211.

Clone this wiki locally