Page 1 of 1

.chk files not created

Posted: July 20 2017
by Sunny Vagnozzi
Hi Anthony et al.,

I recently switched cluster (if it can help, from Edison at NERSC to Cori, always at NERSC), and managed to successfully install CosmoMC.

However, when I launch chains now, .chk files are no longer created, even though I am setting

Code: Select all

checkpoint=T
in my .ini file (the same .ini file in the previous cluster resulted in .chk files being created without any problem). Also, the chains have been running for quite some time and have a decent convergence (~4000 points sampled and ~0.08 convergence) so it cannot be a problem of chains not having run for enough time yet.

Any ideas what the problem might be? Thanks!
Sunny

Re: .chk files not created

Posted: August 16 2017
by Yutong Wang
Sunny Vagnozzi wrote:Hi Anthony et al.,

I recently switched cluster (if it can help, from Edison at NERSC to Cori, always at NERSC), and managed to successfully install CosmoMC.

However, when I launch chains now, .chk files are no longer created, even though I am setting

Code: Select all

checkpoint=T
in my .ini file (the same .ini file in the previous cluster resulted in .chk files being created without any problem). Also, the chains have been running for quite some time and have a decent convergence (~4000 points sampled and ~0.08 convergence) so it cannot be a problem of chains not having run for enough time yet.

Any ideas what the problem might be? Thanks!
Sunny
I met the same problem, re-strat the CosmoMC and there is no .chk in chains file, and what reason cause this problem?

.chk files not created

Posted: August 16 2017
by Sunny Vagnozzi
Hi Yutong,

are you also working on NERSC? Anyway, what happened to me was that the architecture of my cluster was restructured, and even though I can successfully compile CosmoMC, the .chk files don't get created when I run it. I suspect it might have something to do with MPI communication among the chains not functioning properly, which might probably have to do with the architecture restructuring of the cluster, but I still haven't managed to solve the problem.

Antony any suggestions?

Cheers,
Sunny

Re: .chk files not created

Posted: August 31 2017
by Yutong Wang
Sunny Vagnozzi wrote:Hi Yutong,

are you also working on NERSC? Anyway, what happened to me was that the architecture of my cluster was restructured, and even though I can successfully compile CosmoMC, the .chk files don't get created when I run it. I suspect it might have something to do with MPI communication among the chains not functioning properly, which might probably have to do with the architecture restructuring of the cluster, but I still haven't managed to solve the problem.

Antony any suggestions?

Cheers,
Sunny
I use openmpi-2.1.1 and ifort 17.0.4.196, I strongly suspect this problem caused by the latest MPI or ifort compiler, at the same time I can't obtain the root converge_stat file. But at the end of last year(at that time I didn't update my MPI and ifort), I can get .chk and converge_stat file.

.chk files not created

Posted: October 06 2017
by Sunny Vagnozzi
Hi,

I managed to solve the problem specifically for my cluster (NERSC). Since I expect many to use this cluster anyway, I'm posting the solution here in case it might help someone with the same problem (I did meet others who had the same problem):

It's a problem with the latest version of intel, so you want to unload the latest version and load the latest working one:

Code: Select all

module unload intel/17.0.2.174
module load intel/16.0.3.210
Then you will have to recompile the Planck likelihood again, and then recompile CosmoMC, and at this point .chk files will be created

Note that this also solves the problem with the fortran version of getdist I raised in http://cosmocoffee.info/viewtopic.php?t=2866&highlight= .

Cheers,
Sunny

.chk files not created

Posted: October 06 2017
by Yutong Wang
the latest version ifort 2018.0.128 can also solve this problem, I use the ifort 2018.0.128+openmpi3.0.0 in my workstation, and there is no problem. successfully get .chk and .converge_stat file.