CosmoMc: Scaling with openMP & MPI
-
- Posts: 13
- Joined: April 05 2013
- Affiliation: Institute of Astronomy/Kavli Institute for Cosmology Cambridge
CosmoMc: Scaling with openMP & MPI
Dear all,
has anyone tested the scaling of the new CosmoMC with openMP and MPI? In general, is it more efficient to run more chains or to assign more CPUs to a small number of chains?
What would be the most efficient combination of MPI x openMP using up to 12 or 16 cores? 16 is the maximum for a single job on our cluster.
If I want to use more than 16 cores, I have to submit more than one job and the different jobs cannot communicate via MPI. Does it make sense to submit more than one job and merge them "by hand"? Again, what would the optimal combination of MPI x openMP be?
Thanks a lot.
Cheers,
Bjoern
has anyone tested the scaling of the new CosmoMC with openMP and MPI? In general, is it more efficient to run more chains or to assign more CPUs to a small number of chains?
What would be the most efficient combination of MPI x openMP using up to 12 or 16 cores? 16 is the maximum for a single job on our cluster.
If I want to use more than 16 cores, I have to submit more than one job and the different jobs cannot communicate via MPI. Does it make sense to submit more than one job and merge them "by hand"? Again, what would the optimal combination of MPI x openMP be?
Thanks a lot.
Cheers,
Bjoern
-
- Posts: 1943
- Joined: September 23 2004
- Affiliation: University of Sussex
- Contact:
Re: CosmoMc: Scaling with openMP & MPI
If depends what you mean by efficient. In terms of numerical cost (i.e. energy or total dollars), it's better to use a small number of cores per chains and wait (most chains will still converge in well under a day, with e.g. 2-4 cores per chain). But it scales moderately well to 8-16 more cores per chain as long as the likelihoods are fast. Increasing the number of chains above a few does not speed burn in of each chain much, and hence is inefficient when you go to large numbers. I usually recommend running 4-8 chains, each on 2-8 CPUs; there's rarely any point generating more than 8 chains.
If 16 core are available I'd run 4 chains on 4 cores each if you are using likelihoods that are fast or paralellize well; this is a confirugration that works well on many systems. If any likelihoods you are using do not parallelieze well and are slow then do 8 chains on 2 cores.
If 16 core are available I'd run 4 chains on 4 cores each if you are using likelihoods that are fast or paralellize well; this is a confirugration that works well on many systems. If any likelihoods you are using do not parallelieze well and are slow then do 8 chains on 2 cores.
-
- Posts: 13
- Joined: April 05 2013
- Affiliation: Institute of Astronomy/Kavli Institute for Cosmology Cambridge
CosmoMc: Scaling with openMP & MPI
Hi Antony,
thanks for the quick reply. Let me add one more question:
How much do the MPI features like MPI_Learn_Propose speed the code up? Or asking the other way round: Is it a waste of CPU time to run e.g. on 2 nodes with 2 chains per node, and MPI communication is only possible among chains on the same node?
At the moment I'd rather minimize wall clock time, but of course without wasting a big amount of CPU time.
Thanks a lot.
Cheers,
Bjoern
thanks for the quick reply. Let me add one more question:
How much do the MPI features like MPI_Learn_Propose speed the code up? Or asking the other way round: Is it a waste of CPU time to run e.g. on 2 nodes with 2 chains per node, and MPI communication is only possible among chains on the same node?
At the moment I'd rather minimize wall clock time, but of course without wasting a big amount of CPU time.
Thanks a lot.
Cheers,
Bjoern
-
- Posts: 1943
- Joined: September 23 2004
- Affiliation: University of Sussex
- Contact:
Re: CosmoMc: Scaling with openMP & MPI
MPI options will help a lot unless you already have a good .covmat covariance file for your posterior. MPI should work between nodes in a cluster.