CosmoMC: Problems on a cluster

Use of Healpix, camb, CLASS, cosmomc, compilers, etc.
Post Reply
Luis Mendes
Posts: 3
Joined: January 13 2005
Affiliation: Astrophysics Group, Imperial College London

CosmoMC: Problems on a cluster

Post by Luis Mendes » February 23 2005

Hi,
I compiled cosmomc with the intel 8.1 64EMt compiler aon a cluster with Opteron processors and it runs fine on a single processor. However when I try to run it on more than one processor using

mpirun -np 2 ./cosmomc ./param.ini

I get an error

/data/3/soft1/cosmomc # /data/3/soft2/mpich-1.2.6/bin/mpirun -np 2 ./cosmomc ./params.ini
Error opening parameter file
Total time: 0 ( 0.000000000000000E+000 hours)
Slow proposals: 0
forrtl: severe (59): list-directed I/O syntax error, unit -5, file Internal List-Directed Read
Image PC Routine Line Source
cosmomc 00000000004F3B54 Unknown Unknown Unknown
cosmomc 00000000004F342E Unknown Unknown Unknown
cosmomc 00000000004C5616 Unknown Unknown Unknown
cosmomc 0000000000490516 Unknown Unknown Unknown
cosmomc 0000000000490A7C Unknown Unknown Unknown
cosmomc 00000000004A5B65 Unknown Unknown Unknown
cosmomc 00000000004A51DE Unknown Unknown Unknown
cosmomc 000000000042A78D Unknown Unknown Unknown
cosmomc 0000000000404CB4 Unknown Unknown Unknown
Unknown 0000002A96048C9E Unknown Unknown Unknown
cosmomc 0000000000404BEA Unknown Unknown Unknown
p1_30070: (0.000000) net_recv failed for fd = 3
p1_30070: p4_error: net_recv read, errno = : 104
bm_list_8360: (0.031250) wakeup_slave: unable to interrupt slave 0 pid 8359

From the second line it iseems the problem is that the parameter file is not found. However I have checked that the parameter file is indeed there. I am very confused with this ... I am using MPICH 1.2.6 and after compiling it the tests run fine so I assume this has nothing to do with MPI. I also tried giving an absolute path for the parameter file and the problem persists. Has anyone seen this before?
Thanks for your help,
Luis

Antony Lewis
Posts: 1485
Joined: September 23 2004
Affiliation: University of Sussex
Contact:

Re: CosmoMC: Problems on a cluster

Post by Antony Lewis » February 25 2005

It does look like a problem finding the file (though check param.ini vs params.ini !). I can't think of any reason why a fully qualified path like /home/user/.../params.ini wouldn't work (as long as the path is to a disk accessible from all the nodes).

Anze Slosar
Posts: 183
Joined: September 24 2004
Affiliation: Brookhaven National Laboratory
Contact:

CosmoMC: Problems on a cluster

Post by Anze Slosar » December 14 2005

I hit the same problem. On mpich, if compiled with ifort, just the first node process sees the parameters... The following snippet hacks around this:

Code: Select all

....
#ifndef MPI
        InputFile = GetParam(1)
        if (InputFile == '') call DoStop('No parameter input file')
        numstr = GetParam(2)

        if (numstr /= '') then
         read(numstr,*) instance
         rand_inst = instance
        else
   23      instance = 0
        end if

#endif

#ifdef MPI

        if (instance /= 0) call DoStop('With MPI should not have second parameter')
        call mpi_comm_rank(mpi_comm_world,MPIrank,ierror)


        instance = MPIrank +1 !start at 1 for chains                                              
        write (numstr,*) instance
        rand_inst = instance
        if (ierror/=MPI_SUCCESS) call DoStop('MPI fail')

        call mpi_comm_size(mpi_comm_world,MPIchains,ierror)

        if (instance.eq.1) then
          print *, 'Number of MPI processes:',mpichains
          InputFile=GetParam(1)
        end if


        CALL MPI_Bcast(InputFile, LEN(InputFile), MPI_CHARACTER, 0, MPI_COMM_WORLD, ierror)
....

Post Reply