Parallelization issue in CosmoMC

Use of Healpix, camb, CLASS, cosmomc, compilers, etc.
Post Reply
vivian sabla
Posts: 4
Joined: July 24 2019
Affiliation: Dartmouth College

Parallelization issue in CosmoMC

Post by vivian sabla » July 25 2019

Hello! I am new to using CosmoMC and am trying to figure out how to properly parallelize the running of chains. I have a computer with 32 CPUs (16 cores, 2 threads per core) and I am trying to figure out the most efficient way to run 8 chains.

I am running

Code: Select all

 nohup mpirun -np 8 ./cosmomc planck_test.ini &  
I have set num_threads=2 in the ini file but only 8 processes seem to be running. I figured it would run each chain across 2 threads if num_threads=2.

It also has only been updating chain 0 for a few hours. Output of top command is below:
Screen Shot 2019-07-24 at 10.06.03 AM.JPG
Screen Shot 2019-07-24 at 10.06.03 AM.JPG (194.63 KiB) Viewed 1773 times
It doesn't seem to actually be running in parallel and I was hoping someone might help me figure out what I am doing wrong! How should I be setting OMP_NUM_THREADS and num_threads in order to run 8 chains across more than one CPU?

Thanks,
Vivian

Antony Lewis
Posts: 1522
Joined: September 23 2004
Affiliation: University of Sussex
Contact:

Re: Parallelization issue in CosmoMC

Post by Antony Lewis » July 26 2019

If you set OMP_NUM_THREADS you shouldn't need to set num_threads (set to zero to use default). I'm not sure why only the first chain seems to be using 200% cpu as expected.

vivian sabla
Posts: 4
Joined: July 24 2019
Affiliation: Dartmouth College

Re: Parallelization issue in CosmoMC

Post by vivian sabla » July 26 2019

Ok thank you!

Looking at the data files and doing an ltrace and strace on the chains shows only the first chain communicating with any other part of the system for about a day and a half. The chains should be independent of each other but it seems something about the first chain is stifling progress on the rest of them. Is that a possibility?

Post Reply