CLASS issues in COBAYA

Use of Cobaya. camb, CLASS, cosmomc, compilers, etc.
Post Reply
Ali Rida Khalife
Posts: 18
Joined: October 31 2022
Affiliation: Institut d'Astrophysique de Paris

CLASS issues in COBAYA

Post by Ali Rida Khalife » February 27 2023

Hello!
I'm having two problems with CLASS in COBAYA that are similar to some that were mentioned here before but were not solved(viewtopic.php?f=11&t=3522&p=9650#p9650 and viewtopic.php?f=11&t=2665&p=7334#p7334)
1- CLASS is not parallizing properly.
This might be why it's running much slower than CAMB(I noticed this by setting Timing: True in the yaml file). I ran top on the node where I'm running the chains, and the %CPU never goes above 100%. I tried compiling CLASS by changing the optimization flags and the parallellization options in its makefile, but nothing changed. Also, it was mentioned on github that hmcode in CLASS is not parallelizing properly, so I switched to halofit, and still nothing happened. Did anyone face a similar problem and managed to solve it?
2- COBAYA crashes with a segmentation fault when using CLASS.
This happens irrespective of the cosmological model and it happens quite randomly(sometimes after 7 hrs, sometimes after 24...there doesn't seem to be a clear pattern). I tried to check if the last point reach by the chain produces any nans somewhere(as was previously reported github), but everything seemed normal. I included a
self.classy.struct_cleanup()
in cobaya/cobaya/theories/classy.py after line 579, but still didn't work.
I'm using 80gb memory for each run, with 4 chains and 40 ppn(which I think is a lot).
Any help is really appreciated!

Ali Rida Khalife
Posts: 18
Joined: October 31 2022
Affiliation: Institut d'Astrophysique de Paris

Re: CLASS issues in COBAYA

Post by Ali Rida Khalife » March 08 2023

I managed to solve the first problem mentioned above, which could also be related to the second one. I'll keep an eye on this and post an update.
If someone is having problems with CLASS parallelization, here's how I solved it:
1- Make sure that you are loading the most updated modules in your computer cluster for parallel computing. In my case, I had to load the updated version of:

Code: Select all

module load openmpi/4.1.5-intel
module load intelpython/3-2023.0.0
module load cfitsio/4.2.0
2- Compile CLASS with different optimization and openmp flags. Note that you might need to look into all possible combinations of the two(i.e. for each optmization flag, check all possible openmp flags untill you get max parallelization). In my case, it was:

Code: Select all

OPTFLAG = -Ofast
OMPFLAG   = -fopenmp
3- In COBAYA job submission documentation, you might not need to use

Code: Select all

export OMP_PLACES=threads
export OMP_PROC_BIND=spread
In order to check if CLASS is parallelizing properly:
1-Go to a computing node in your cluster
2- open a screen: type the command:

Code: Select all

$ screen -S NameOfScreen
3- Submit a job normally
4-exit the screen(ctrl a+ctrl d)
5- run:

Code: Select all

$ top 
on the node and check the the %CPU goes above OMP_NUM_THREADS x 100 regularly(it might fluctuate below or above this number)
Hope this is useful!

Post Reply