System Architecture Images


MPI ERROR MESSAGE - MPI_ABORT causes Open MPI to kill all MPI processes.

while running HPC Application nwchem parallel I got the following error message.

ERROR 1
MPI_ABORT was invoked on rank 0 in communicator MPI COMMUNICATOR 4 DUP FROM 0
with errorcode -1.

NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
You may or may not see output from other processes, depending on
exactly when Open MPI kills them.
--------------------------------------------------------------------------
0:0:nwchem: rtdb_close failed:: -1
(rank:0 hostname:cn0774 pid:46607):ARMCI DASSERT fail. ../../ga-5-3/armci/src/common/armci.c:ARMCI_Error():208 cond:0
0:0:nwchem: rtdb_close failed:: -1
(rank:0 hostname:cn0774 pid:46615):ARMCI DASSERT fail. ../../ga-5-3/armci/src/common/armci.c:ARMCI_Error():208 cond:0
0:0:nwchem: rtdb_close failed:: -1
(rank:0 hostname:cn0774 pid:46610):ARMCI DASSERT fail. ../../ga-5-3/armci/src/common/armci.c:ARMCI_Error():208 cond:0
APPLICATION TERMINATED WITH THE EXIT STRING: Segmentation fault (signal 11)

IBM High Performance Computing Cluster Health Check - Important Points

To Download Link - DOWNLOAD

1)HPC cluster seems intimidating,many cluster problems are in fact easily resolved through careful verification steps.
2)cluster health check tools mostly test components and a little point-to-point network performance.
3)The purpose of a High Performance Computing (HPC) cluster is to solve scalable problems in a shorter time through parallelism.
4)The goal of verification stage(TESTING) is to gain confidence in hardware and software before introducing it to the user.
5)healthy cluster is built from the bottom to the top. Therefore, you first must make sure that each single device works
as expected before performing the next step.
This approach leads to the pyramid model of verification shown in Figure 3-2 on page 34.

HPC Administration Interview Question.

1)How to create queue in pbs.(How to remove the nodes from the pbs).
2)Luster Module (Component(mgs,mds,mdt,oss,ost, & how to create the Luster file system).
3)HPL Benchmark attributes value.
4)pxeboot Different Stages.
5)./configure different option(--prefix=,--bindir=DIR,--sbindir=DIR,LDFLAGS="-L/LIBRARYPATH" CPPFLAGS="-I/INCLUDEPATH/")
6)Install OFED Driver & Command(ibstat,ibdiagnet)

Tophat Installation Error Message Undefined reference.

/app/setups/tophat-2.0.12/src/segment_juncs.cpp:4917: undefined reference to `boost::thread::join()'
/app/setups/tophat-2.0.12/src/segment_juncs.cpp:4918: undefined reference to `boost::thread::~thread()'
/app/setups/tophat-2.0.12/src/segment_juncs.cpp:4994: undefined reference to `boost::thread::join()'
/app/setups/tophat-2.0.12/src/segment_juncs.cpp:4995: undefined reference to `boost::thread::~thread()'
segment_juncs.o: In function `thread<SegmentSearchWorker>':
/usr/include/boost/thread/detail/thread.hpp:191: undefined reference to `boost::thread::start_thread()'
/usr/include/boost/thread/detail/thread.hpp:191: undefined reference to `boost::thread::start_thread()'
segment_juncs.o: In function `~thread_data':
/usr/include/boost/thread/detail/thread.hpp:40: undefined reference to `boost::detail::thread_data_base::~thread_data_base()'
/usr/include/boost/thread/detail/thread.hpp:40: undefined reference to `boost::detail::thread_data_base::~thread_data_base()'
/usr/include/boost/thread/detail/thread.hpp:40: undefined reference to `boost::detail::thread_data_base::~thread_data_base()'
/usr/include/boost/thread/detail/thread.hpp:40: undefined reference to `boost::detail::thread_data_base::~thread_data_base()'
segment_juncs.o: In function `thread_data_base':
/usr/include/boost/thread/pthread/thread_data.hpp:65: undefined reference to `vtable for boost::detail::thread_data_base'

IB 100% Non Blocking FAT TREE Topology Inter Connect Architecture Formula

1)IB Non Blocking Mode: Source Ports has the dedicated Channel to the destination ports without looping.
2)IB Non Blocking is made through FAT Tree Topology.
FAT Tree Topology Is one of the method.
Using 36 Port IB Switch
SS LS Max No Connection (or) Ports Dedicate Connectivity Between SS & LS
SS LS=SS* 2 LS*18 Possibility=36/LS=0



2 4 72 9
3 6 108 6
4 8 144 4.5
5 10 180 3.6
6 12 216 3
7 14 252 2.57
8 16 288 2.25
9 18 324 2
10 20 360 1.8
11 22 396 1.64
12 24 432 1.5
13 26 468 1.38
14 28 504 1.29
16 32 576 1.13
17 34 612 1.06
18 36 648 1
19 38 684 0.95
NOTE 1 : SS – Spine Switch LS – Leaf Switch NOTE 2 : Possible IB Switch connectivity are made in the light YELLOW color

MPIBLAST Program - Error And Solution.

1)$ /app/intel/impi/4.1.0.024/intel64/bin/mpirun -np 192 /app/mpiblast-n/bin/mpiblast -p blastx -i /scratch/sbag/K12.fa -d nr_again/nr -o /scratch/sbag/test_7july.txt -e 0.00001 -v 1 -m 9 -b 1
[cn1081:mpi_rank_47][error_sighandler] Caught error: Segmentation fault (signal 11)
[cn1081:mpi_rank_74][error_sighandler] Caught error: Segmentation fault (signal 11)
[cn1081:mpi_rank_114][error_sighandler] Caught error: Segmentation fault (signal 11)
[cn1081:mpi_rank_124][error_sighandler] Caught error: Segmentation fault (signal 11)
[cn1081:mpi_rank_130][error_sighandler] Caught error: Segmentation fault (signal 11)
APPLICATION TERMINATED WITH THE EXIT STRING: Segmentation fault (signal 11)
SOLUTION: Actually we are trying to run the 192 Processor Program on single node(12 Processor).