Skip to content

corrupted size vs. prev_size error when using -mutsel #23

@berkalpay

Description

@berkalpay

Following the command mpirun -n 15 pb_mpi -d ../aligned_RNA_seqs_postprocessed.phylip -cat -gtr -mutsel run02, I get the following error:

--------------------------------------------------------------------------
WARNING: There was an error initializing an OpenFabrics device.

  Local host:   compute-a-16-46
  Local device: mlx4_0
--------------------------------------------------------------------------

model:
stick-breaking Dirichlet process mixture (cat)

read data from file : ../aligned_RNA_seqs_postprocessed.phylip
number of taxa  : 1139
number of sites : 711
number of states: 4

chain name : run02
run started

[compute-a-16-46.o2.rc.hms.harvard.edu:11425] 14 more processes have sent help message help-mpi-btl-openib.txt / error in device init
[compute-a-16-46.o2.rc.hms.harvard.edu:11425] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages
*** Error in `pb_mpi': corrupted size vs. prev_size: 0x0000000004e03120 ***
======= Backtrace: =========
/lib64/libc.so.6(+0x7f7c4)[0x7fca5756c7c4]
/lib64/libc.so.6(+0x82fd4)[0x7fca5756ffd4]
/lib64/libc.so.6(__libc_malloc+0x4c)[0x7fca57572adc]
/n/app/gcc/6.2.0/lib64/libstdc++.so.6(_Znwm+0x18)[0x7fca5807ecd8]
pb_mpi[0x4de838]
pb_mpi[0x49c180]
pb_mpi[0x4e66a5]
pb_mpi[0x488cb9]
pb_mpi[0x404d9b]
/lib64/libc.so.6(__libc_start_main+0xf5)[0x7fca5750f505]
pb_mpi[0x421927]

followed by a memory map.

The error occurs before the first MCMC iteration but after the 0th iteration has been written to the .trace file. Strangely, the error occurs very frequently but not always when running the command. It also occurs with a variety of settings of -n.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions