Hi,
I compiled cosmomc with the intel 8.1 64EMt compiler aon a cluster with Opteron processors and it runs fine on a single processor. However when I try to run it on more than one processor using
mpirun -np 2 ./cosmomc ./param.ini
I get an error
/data/3/soft1/cosmomc # /data/3/soft2/mpich-1.2.6/bin/mpirun -np 2 ./cosmomc ./params.ini
Error opening parameter file
Total time: 0 ( 0.000000000000000E+000 hours)
Slow proposals: 0
forrtl: severe (59): list-directed I/O syntax error, unit -5, file Internal List-Directed Read
Image PC Routine Line Source
cosmomc 00000000004F3B54 Unknown Unknown Unknown
cosmomc 00000000004F342E Unknown Unknown Unknown
cosmomc 00000000004C5616 Unknown Unknown Unknown
cosmomc 0000000000490516 Unknown Unknown Unknown
cosmomc 0000000000490A7C Unknown Unknown Unknown
cosmomc 00000000004A5B65 Unknown Unknown Unknown
cosmomc 00000000004A51DE Unknown Unknown Unknown
cosmomc 000000000042A78D Unknown Unknown Unknown
cosmomc 0000000000404CB4 Unknown Unknown Unknown
Unknown 0000002A96048C9E Unknown Unknown Unknown
cosmomc 0000000000404BEA Unknown Unknown Unknown
p1_30070: (0.000000) net_recv failed for fd = 3
p1_30070: p4_error: net_recv read, errno = : 104
bm_list_8360: (0.031250) wakeup_slave: unable to interrupt slave 0 pid 8359
From the second line it iseems the problem is that the parameter file is not found. However I have checked that the parameter file is indeed there. I am very confused with this ... I am using MPICH 1.2.6 and after compiling it the tests run fine so I assume this has nothing to do with MPI. I also tried giving an absolute path for the parameter file and the problem persists. Has anyone seen this before?
Thanks for your help,
Luis
CosmoMC: Problems on a cluster
-
- Posts: 3
- Joined: January 13 2005
- Affiliation: Astrophysics Group, Imperial College London
-
- Posts: 1941
- Joined: September 23 2004
- Affiliation: University of Sussex
- Contact:
Re: CosmoMC: Problems on a cluster
It does look like a problem finding the file (though check param.ini vs params.ini !). I can't think of any reason why a fully qualified path like /home/user/.../params.ini wouldn't work (as long as the path is to a disk accessible from all the nodes).
-
- Posts: 183
- Joined: September 24 2004
- Affiliation: Brookhaven National Laboratory
- Contact:
CosmoMC: Problems on a cluster
I hit the same problem. On mpich, if compiled with ifort, just the first node process sees the parameters... The following snippet hacks around this:
Code: Select all
....
#ifndef MPI
InputFile = GetParam(1)
if (InputFile == '') call DoStop('No parameter input file')
numstr = GetParam(2)
if (numstr /= '') then
read(numstr,*) instance
rand_inst = instance
else
23 instance = 0
end if
#endif
#ifdef MPI
if (instance /= 0) call DoStop('With MPI should not have second parameter')
call mpi_comm_rank(mpi_comm_world,MPIrank,ierror)
instance = MPIrank +1 !start at 1 for chains
write (numstr,*) instance
rand_inst = instance
if (ierror/=MPI_SUCCESS) call DoStop('MPI fail')
call mpi_comm_size(mpi_comm_world,MPIchains,ierror)
if (instance.eq.1) then
print *, 'Number of MPI processes:',mpichains
InputFile=GetParam(1)
end if
CALL MPI_Bcast(InputFile, LEN(InputFile), MPI_CHARACTER, 0, MPI_COMM_WORLD, ierror)
....