CosmoMC scalability problem

Use of Healpix, camb, CLASS, cosmomc, compilers, etc.
Post Reply
Tom Chan
Posts: 1
Joined: July 19 2006
Affiliation: CUHK

CosmoMC scalability problem

Post by Tom Chan » July 19 2006

Hi all,

I'm new to this forum. I'm compiling the cosmomc code (with some modification) on our cluster for a physics student.

I am using intel fortran compiler 9.1 with Intel MKL cluster edition 8.1 and mpich 1.2.7p1.

When I run the program using mpirun on 1, 2 and 4 chains, the program terminates successfully, but the program does not terminate correctly when run on 8 chains. I wonder if anyone has encountered the same problem before.

The command I used to run cosmomc was:
time mpirun -machinefile mpihost -np 8 -nolocal cosmomc cg_params.ini

I'm doing a test of 2000 iterations. When I looked into the nodes, I find that the simulation terminates for 1 or 2 of the chains, and other chains halt after 1400 to 1600 iterations. I have no idea on what happened to those chains. I have tried different optimization settings, even without optimization, but I still get the same result.

I would like to know if most people run cosmomc on 4 chains or less.

Thanks in advance.

Antony Lewis
Posts: 1594
Joined: September 23 2004
Affiliation: University of Sussex

Re: CosmoMC scalability problem

Post by Antony Lewis » July 19 2006

There shouldn't be any problem with more chains - I usually run on 6 or 8. The MPI version is more designed for running till convergence rather than stopping after some number of samples (which might possibly give an apparently error termination even if perfectly OK - see if the samples files are fine).

Post Reply