Page 1 of 1

cosmoMC with MPI error

Posted: June 22 2006
by Ana Vasile

I am running cosmoMC on 8 nodes and after a while I am getting this error message:

[cli_2]: aborting job:
Fatal error in MPI_Testall: Other MPI error, error stack:
MPI_Testall(237)..........................: MPI_Testall(count=7, req_array=0xb4\
e3ff38, flag=0xbff9ca70, status_array=0xbff9ca90) failed
MPIDI_CH3_Progress_test(102)..............: an error occurred while handling an\
event returned by MPIDU_Sock_Wait()
MPIDU_Socki_handle_read(649)..............: connection failure (set=0,sock=7,er\
rno=104:(strerror() not found))
rank 7 in job 2 wn-1-2.spacescience.ro_34766 caused collective abort of all \
exit status of rank 7: killed by signal 9
rank 2 in job 2 wn-1-2.spacescience.ro_34766 caused collective abort of all \
exit status of rank 2: killed by signal 9

Could you please tell me what is the problem?