Page 1 of 1

CosmoMC scalability problem

Posted: July 19 2006
by Tom Chan
Hi all,

I'm new to this forum. I'm compiling the cosmomc code (with some modification) on our cluster for a physics student.

I am using intel fortran compiler 9.1 with Intel MKL cluster edition 8.1 and mpich 1.2.7p1.

When I run the program using mpirun on 1, 2 and 4 chains, the program terminates successfully, but the program does not terminate correctly when run on 8 chains. I wonder if anyone has encountered the same problem before.

The command I used to run cosmomc was:
time mpirun -machinefile mpihost -np 8 -nolocal cosmomc cg_params.ini

I'm doing a test of 2000 iterations. When I looked into the nodes, I find that the simulation terminates for 1 or 2 of the chains, and other chains halt after 1400 to 1600 iterations. I have no idea on what happened to those chains. I have tried different optimization settings, even without optimization, but I still get the same result.

I would like to know if most people run cosmomc on 4 chains or less.

Thanks in advance.

Re: CosmoMC scalability problem

Posted: July 19 2006
by Antony Lewis
There shouldn't be any problem with more chains - I usually run on 6 or 8. The MPI version is more designed for running till convergence rather than stopping after some number of samples (which might possibly give an apparently error termination even if perfectly OK - see if the samples files are fine).