Page 1 of 1

CosmoMC: Problems on a cluster

Posted: February 23 2005
by Luis Mendes
I compiled cosmomc with the intel 8.1 64EMt compiler aon a cluster with Opteron processors and it runs fine on a single processor. However when I try to run it on more than one processor using

mpirun -np 2 ./cosmomc ./param.ini

I get an error

/data/3/soft1/cosmomc # /data/3/soft2/mpich-1.2.6/bin/mpirun -np 2 ./cosmomc ./params.ini
Error opening parameter file
Total time: 0 ( 0.000000000000000E+000 hours)
Slow proposals: 0
forrtl: severe (59): list-directed I/O syntax error, unit -5, file Internal List-Directed Read
Image PC Routine Line Source
cosmomc 00000000004F3B54 Unknown Unknown Unknown
cosmomc 00000000004F342E Unknown Unknown Unknown
cosmomc 00000000004C5616 Unknown Unknown Unknown
cosmomc 0000000000490516 Unknown Unknown Unknown
cosmomc 0000000000490A7C Unknown Unknown Unknown
cosmomc 00000000004A5B65 Unknown Unknown Unknown
cosmomc 00000000004A51DE Unknown Unknown Unknown
cosmomc 000000000042A78D Unknown Unknown Unknown
cosmomc 0000000000404CB4 Unknown Unknown Unknown
Unknown 0000002A96048C9E Unknown Unknown Unknown
cosmomc 0000000000404BEA Unknown Unknown Unknown
p1_30070: (0.000000) net_recv failed for fd = 3
p1_30070: p4_error: net_recv read, errno = : 104
bm_list_8360: (0.031250) wakeup_slave: unable to interrupt slave 0 pid 8359

From the second line it iseems the problem is that the parameter file is not found. However I have checked that the parameter file is indeed there. I am very confused with this ... I am using MPICH 1.2.6 and after compiling it the tests run fine so I assume this has nothing to do with MPI. I also tried giving an absolute path for the parameter file and the problem persists. Has anyone seen this before?
Thanks for your help,

Re: CosmoMC: Problems on a cluster

Posted: February 25 2005
by Antony Lewis
It does look like a problem finding the file (though check param.ini vs params.ini !). I can't think of any reason why a fully qualified path like /home/user/.../params.ini wouldn't work (as long as the path is to a disk accessible from all the nodes).

CosmoMC: Problems on a cluster

Posted: December 14 2005
by Anze Slosar
I hit the same problem. On mpich, if compiled with ifort, just the first node process sees the parameters... The following snippet hacks around this:

Code: Select all

#ifndef MPI
        InputFile = GetParam(1)
        if (InputFile == '') call DoStop('No parameter input file')
        numstr = GetParam(2)

        if (numstr /= '') then
         read(numstr,*) instance
         rand_inst = instance
   23      instance = 0
        end if


#ifdef MPI

        if (instance /= 0) call DoStop('With MPI should not have second parameter')
        call mpi_comm_rank(mpi_comm_world,MPIrank,ierror)

        instance = MPIrank +1 !start at 1 for chains                                              
        write (numstr,*) instance
        rand_inst = instance
        if (ierror/=MPI_SUCCESS) call DoStop('MPI fail')

        call mpi_comm_size(mpi_comm_world,MPIchains,ierror)

        if (instance.eq.1) then
          print *, 'Number of MPI processes:',mpichains
        end if

        CALL MPI_Bcast(InputFile, LEN(InputFile), MPI_CHARACTER, 0, MPI_COMM_WORLD, ierror)