Hi all,
I'm new to this forum. I'm compiling the cosmomc code (with some modification) on our cluster for a physics student.
I am using intel fortran compiler 9.1 with Intel MKL cluster edition 8.1 and mpich 1.2.7p1.
When I run the program using mpirun on 1, 2 and 4 chains, the program terminates successfully, but the program does not terminate correctly when run on 8 chains. I wonder if anyone has encountered the same problem before.
The command I used to run cosmomc was:
time mpirun -machinefile mpihost -np 8 -nolocal cosmomc cg_params.ini
I'm doing a test of 2000 iterations. When I looked into the nodes, I find that the simulation terminates for 1 or 2 of the chains, and other chains halt after 1400 to 1600 iterations. I have no idea on what happened to those chains. I have tried different optimization settings, even without optimization, but I still get the same result.
I would like to know if most people run cosmomc on 4 chains or less.
Thanks in advance.
CosmoMC scalability problem
-
- Posts: 1943
- Joined: September 23 2004
- Affiliation: University of Sussex
- Contact:
Re: CosmoMC scalability problem
There shouldn't be any problem with more chains - I usually run on 6 or 8. The MPI version is more designed for running till convergence rather than stopping after some number of samples (which might possibly give an apparently error termination even if perfectly OK - see if the samples files are fine).