Hi,
I was trying to set up my python script based on Cobaya to perform a parallel mcmc sampling for multiple chains. I would like to have several cores per chain.
As I am not sure if I did things correctly, to speed up sampling, I am posting here: (or maybe I could simply use the flag time for a simple script, but I put the command here as it can be useful for someone else if correct).
On Nersc-Cori, I use
srun -n-tasks=8 --cpus-per-task=8 python cobaya_test.py
-n-tasks will specify the number of chains
--cpus-per-task cpus per chain
Does this make sense?
Best,
Omar
Cobaya multiple cpus per chain
-
- Posts: 11
- Joined: October 01 2019
- Affiliation: unige
-
- Posts: 37
- Joined: April 15 2013
- Affiliation: RWTH Aachen
- Contact:
Re: Cobaya multiple cpus per chain
Hi Omar,
Looks OK at first sight. Does it work as intended? I.e. do lines printed by Cobaya start with "[#]" where "#" is the rank of the process? (if all #=0, then MPI is not configured correctly.
Looks OK at first sight. Does it work as intended? I.e. do lines printed by Cobaya start with "[#]" where "#" is the rank of the process? (if all #=0, then MPI is not configured correctly.
-
- Posts: 11
- Joined: October 01 2019
- Affiliation: unige
Re: Cobaya multiple cpus per chain
Hi Jesus,
yes it was giving a rank for each chain. In the end I also used:
srun -n 8 -c 64 --cpu_bind=cores python .....
Now I have a problem that the chains after giving r-1 ~ 0.013 start to 'diverge' again, increasing it to 0.03 then coming back to 0.014 etc... I though the proposal matrix was not too much important for convergence (but for convergence speed yes). Probably I will have to read the paper to understand what it is going on.
yes it was giving a rank for each chain. In the end I also used:
srun -n 8 -c 64 --cpu_bind=cores python .....
Now I have a problem that the chains after giving r-1 ~ 0.013 start to 'diverge' again, increasing it to 0.03 then coming back to 0.014 etc... I though the proposal matrix was not too much important for convergence (but for convergence speed yes). Probably I will have to read the paper to understand what it is going on.