MPI ERROR MESSAGE - MPI_ABORT causes Open MPI to kill all MPI processes.

while running HPC Application nwchem parallel I got the following error message.

ERROR 1
MPI_ABORT was invoked on rank 0 in communicator MPI COMMUNICATOR 4 DUP FROM 0
with errorcode -1.

NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
You may or may not see output from other processes, depending on
exactly when Open MPI kills them.
--------------------------------------------------------------------------
0:0:nwchem: rtdb_close failed:: -1
(rank:0 hostname:cn0774 pid:46607):ARMCI DASSERT fail. ../../ga-5-3/armci/src/common/armci.c:ARMCI_Error():208 cond:0
0:0:nwchem: rtdb_close failed:: -1
(rank:0 hostname:cn0774 pid:46615):ARMCI DASSERT fail. ../../ga-5-3/armci/src/common/armci.c:ARMCI_Error():208 cond:0
0:0:nwchem: rtdb_close failed:: -1
(rank:0 hostname:cn0774 pid:46610):ARMCI DASSERT fail. ../../ga-5-3/armci/src/common/armci.c:ARMCI_Error():208 cond:0
APPLICATION TERMINATED WITH THE EXIT STRING: Segmentation fault (signal 11)


ERROR 2
[mpiexec@cn1071] HYD_pmcd_pmi_alloc_pg_scratch (./pm/pmiserv/pmiserv_utils.c:755): failed to allocate -192 bytes [mpiexec@cn1071] HYD_pmci_launch_procs (./pm/pmiserv/pmiserv_pmci.c:172): error allocating pg scratch space [mpiexec@cn1071] main (./ui/mpich/mpiexec.c:697): process manager returned error launching processes.

Reason For the Error Message
I load the following Module.
module load Nwchem-6.5
module load openmpi-1.6.4
module load openmpi-1.6.4_intel
module load openmpi-1.6.4_scratch
module load intel-cluster-studio-2013.
In this case I was installed nwchem with openmpi.But for running in parallel I used intel-mpi. So mpi process communication it may conflict between one another. So I just load ordinary openmpi then the problem got solved.
module load Nwchem-6.5
module load openmpi-1.6.4
Then the problem got solved.

0 comments:

Post a Comment