Hi!
I am running cosmoMC on 8 nodes and after a while I am getting this error message:
[cli_2]: aborting job:
Fatal error in MPI_Testall: Other MPI error, error stack:
MPI_Testall(237)..........................: MPI_Testall(count=7, req_array=0xb4\
e3ff38, flag=0xbff9ca70, status_array=0xbff9ca90) failed
MPIDI_CH3_Progress_test(102)..............: an error occurred while handling an\
event returned by MPIDU_Sock_Wait()
MPIDI_CH3I_Progress_handle_sock_event(422):
MPIDU_Socki_handle_read(649)..............: connection failure (set=0,sock=7,er\
rno=104:(strerror() not found))
rank 7 in job 2 wn-1-2.spacescience.ro_34766 caused collective abort of all \
ranks
exit status of rank 7: killed by signal 9
rank 2 in job 2 wn-1-2.spacescience.ro_34766 caused collective abort of all \
ranks
exit status of rank 2: killed by signal 9
Could you please tell me what is the problem?
cosmoMC with MPI error
-
- Posts: 25
- Joined: March 26 2006
- Affiliation: Institute for Space Sciences
- Contact: