.chk files not created

Use of Healpix, camb, CLASS, cosmomc, compilers, etc.
Post Reply
Sunny Vagnozzi
Posts: 43
Joined: August 15 2016
Affiliation: Oskar Klein Centre, Stockholm University

.chk files not created

Post by Sunny Vagnozzi » July 20 2017

Hi Anthony et al.,

I recently switched cluster (if it can help, from Edison at NERSC to Cori, always at NERSC), and managed to successfully install CosmoMC.

However, when I launch chains now, .chk files are no longer created, even though I am setting

Code: Select allp>

checkpoint=T
in my .ini file (the same .ini file in the previous cluster resulted in .chk files being created without any problem). Also, the chains have been running for quite some time and have a decent convergence (~4000 points sampled and ~0.08 convergence) so it cannot be a problem of chains not having run for enough time yet.

Any ideas what the problem might be? Thanks!
Sunny

Yutong Wang
Posts: 9
Joined: May 06 2014
Affiliation: UCAS

Re: .chk files not created

Post by Yutong Wang » August 16 2017

Sunny Vagnozzi wrote:Hi Anthony et al.,

I recently switched cluster (if it can help, from Edison at NERSC to Cori, always at NERSC), and managed to successfully install CosmoMC.

However, when I launch chains now, .chk files are no longer created, even though I am setting

Code: Select allp>

checkpoint=T
in my .ini file (the same .ini file in the previous cluster resulted in .chk files being created without any problem). Also, the chains have been running for quite some time and have a decent convergence (~4000 points sampled and ~0.08 convergence) so it cannot be a problem of chains not having run for enough time yet.

Any ideas what the problem might be? Thanks!
Sunny
I met the same problem, re-strat the CosmoMC and there is no .chk in chains file, and what reason cause this problem?

Sunny Vagnozzi
Posts: 43
Joined: August 15 2016
Affiliation: Oskar Klein Centre, Stockholm University

.chk files not created

Post by Sunny Vagnozzi » August 16 2017

Hi Yutong,

are you also working on NERSC? Anyway, what happened to me was that the architecture of my cluster was restructured, and even though I can successfully compile CosmoMC, the .chk files don't get created when I run it. I suspect it might have something to do with MPI communication among the chains not functioning properly, which might probably have to do with the architecture restructuring of the cluster, but I still haven't managed to solve the problem.

Antony any suggestions?

Cheers,
Sunny

Yutong Wang
Posts: 9
Joined: May 06 2014
Affiliation: UCAS

Re: .chk files not created

Post by Yutong Wang » August 31 2017

Sunny Vagnozzi wrote:Hi Yutong,

are you also working on NERSC? Anyway, what happened to me was that the architecture of my cluster was restructured, and even though I can successfully compile CosmoMC, the .chk files don't get created when I run it. I suspect it might have something to do with MPI communication among the chains not functioning properly, which might probably have to do with the architecture restructuring of the cluster, but I still haven't managed to solve the problem.

Antony any suggestions?

Cheers,
Sunny
I use openmpi−2.1.1 and ifort 17.0.4.196, I strongly suspect this problem caused by the latest MPI or ifort compiler, at the same time I can't obtain the root converge_stat file. But at the end of last year(at that time I didn't update my MPI and ifort), I can get .chk and converge_stat file.

Sunny Vagnozzi
Posts: 43
Joined: August 15 2016
Affiliation: Oskar Klein Centre, Stockholm University

.chk files not created

Post by Sunny Vagnozzi » October 06 2017

Hi,

I managed to solve the problem specifically for my cluster (NERSC). Since I expect many to use this cluster anyway, I'm posting the solution here in case it might help someone with the same problem (I did meet others who had the same problem):

It's a problem with the latest version of intel, so you want to unload the latest version and load the latest working one:

Code: Select allp>

module unload intel/17.0.2.174
module load intel/16.0.3.210
Then you will have to recompile the Planck likelihood again, and then recompile CosmoMC, and at this point .chk files will be created

Note that this also solves the problem with the fortran version of getdist I raised in http://cosmocoffee.info/viewtopic.php?t=2866&highlight= .

Cheers,
Sunny

Yutong Wang
Posts: 9
Joined: May 06 2014
Affiliation: UCAS

.chk files not created

Post by Yutong Wang » October 06 2017

the latest version ifort 2018.0.128 can also solve this problem, I use the ifort 2018.0.128+openmpi3.0.0 in my workstation, and there is no problem. successfully get .chk and .converge_stat file.

Post Reply