Page 1 of 1
issue with CosmoMC chains when compiled with gfortran
Posted: May 04 2017
by Elizabeth Gould
When I run CosmoMC with check-pointing, stop running it, and start again later from the checkpoint, the code seems to have issues and stops. In addition, some of the chain files appear to stop around 20 lines in length, and GetDist fails to read them.
I currently have the Planck Likelihood Code installed with gfortran and CosmoMC compiled with gfortran. I had not had this issue when I used ifort. Due to an issue with the PLC, it had to be reinstalled, however, and it was reinstalled with gfortran.
Here are two samples of the last things written to the screen by CosmoMC:
Fast divided into 1 blocks
35 parameters ( 9 slow ( 0 semi-slow), 26 fast ( 0 semi-fast))
1 Reading checkpoint from chains/CDM01_1.chk
Checking likelihood './data/clik/low_l/bflike/lowl_SMW_70_dx11d_2014_10_03_v5c_Ap.clik' on test data. got -5247.87 expected -5247.87 (diff 3.89829e-07)
----
TT from l=0 to l= 29
EE from l=0 to l= 29
BB from l=0 to l= 29
TE from l=0 to l= 29
starting Monte-Carlo
Checking likelihood './data/clik/low_l/bflike/lowl_SMW_70_dx11d_2014_10_03_v5c_Ap.clik' on test data. got -5247.87 expected -5247.87 (diff 3.89829e-07)
----
TT from l=0 to l= 29
EE from l=0 to l= 29
BB from l=0 to l= 29
TE from l=0 to l= 29
Chain:11 drag accpt: 0.379746825 fast/slow 121.696205 slow: 79
Chain:3 drag accpt: 0.333333343 fast/slow 147.111115 slow: 90
Chain:7 drag accpt: 0.326086968 fast/slow 146.782608 slow: 92
Chain:5 drag accpt: 0.288461536 fast/slow 138.500000 slow: 104
Chain:1 drag accpt: 0.300000012 fast/slow 149.020004 slow: 100
Chain:0 drag accpt: 0.441176474 fast/slow 133.507462 slow: 67
Chain:9 drag accpt: 0.229007632 fast/slow 153.236649 slow: 131
TObjectList: Unknown type to save
Fast divided into 1 blocks
36 parameters (10 slow ( 0 semi-slow), 26 fast ( 0 semi-fast))
1 Reading checkpoint from chains/CDM03_1.chk
info = 0
info = 0
starting Monte-Carlo
----
clik version 723c1a4b0580 MAKEFILE
bflike_smw
----
clik version 723c1a4b0580 MAKEFILE
bflike_smw
Checking likelihood './data/clik/low_l/bflike/lowl_SMW_70_dx11d_2014_10_03_v5c_Ap.clik' on test data. got -5247.87 expected -5247.87 (diff 3.83458e-07)
----
TT from l=0 to l= 29
EE from l=0 to l= 29
BB from l=0 to l= 29
TE from l=0 to l= 29
Checking likelihood './data/clik/low_l/bflike/lowl_SMW_70_dx11d_2014_10_03_v5c_Ap.clik' on test data. got -5247.87 expected -5247.87 (diff 3.83458e-07)
----
TT from l=0 to l= 29
EE from l=0 to l= 29
BB from l=0 to l= 29
TE from l=0 to l= 29
Chain:9 drag accpt: 0.454545468 fast/slow 147.363632 slow: 66
Chain:0 drag accpt: 0.348837197 fast/slow 126.522385 slow: 67
Chain:10 drag accpt: 0.434782594 fast/slow 133.014496 slow: 69
TObjectList: Unknown type to save
Re: issue with CosmoMC chains when compiled with gfortran
Posted: May 08 2017
by Antony Lewis
Which gfortran version is this? (not sure I've ever tested checkpointing with gfortran myself).
issue with CosmoMC chains when compiled with gfortran
Posted: May 08 2017
by Elizabeth Gould
This is the version listed:
gfortran --version
GNU Fortran (Ubuntu 4.9.4-2ubuntu1~14.04.1) 4.9.4
Re: issue with CosmoMC chains when compiled with gfortran
Posted: May 08 2017
by Antony Lewis
You need at least version 6 (I'm amazed it even compile with v4.9..)
issue with CosmoMC chains when compiled with gfortran
Posted: May 18 2017
by Elizabeth Gould
Thanks.
It appears that I did use gfortran 6.2. However, since both are installed it may have compiled partly with gfortran 4.9. I have recompiled it ensuring that the later version of gfortran is used for everything. I am currently rerunning it to see if that fixed the issue.
issue with CosmoMC chains when compiled with gfortran
Posted: June 08 2017
by Elizabeth Gould
I have tried rerunning CosmoMC ensuring I am using gfortran 6.2, and the issue has seemed to persist. For one run when it tries to restart from the checkpoint files, I get this:
....
12 Reading checkpoint from chains/CDM22_12.chk
info = 0
----
clik version 723c1a4b0580 MAKEFILE
bflike_smw
Checking likelihood './data/clik/low_l/bflike/lowl_SMW_70_dx11d_2014_10_03_v5c_Ap.clik' on test data. got -5247.87 expected -5247.87 (diff 5.67619e-07)
----
TT from l=0 to l= 29
EE from l=0 to l= 29
BB from l=0 to l= 29
TE from l=0 to l= 29
11 Reading checkpoint from chains/CDM22_11.chk
info = 0
----
clik version 723c1a4b0580 MAKEFILE
bflike_smw
Checking likelihood './data/clik/low_l/bflike/lowl_SMW_70_dx11d_2014_10_03_v5c_Ap.clik' on test data. got -5247.87 expected -5247.87 (diff 5.67619e-07)
----
TT from l=0 to l= 29
EE from l=0 to l= 29
BB from l=0 to l= 29
TE from l=0 to l= 29
5 Reading checkpoint from chains/CDM22_5.chk
Chain:2 drag accpt: 0.566037714 fast/slow 140.622223 slow: 45
Chain:7 drag accpt: 0.555555582 fast/slow 138.679993 slow: 50
TObjectList: Unknown type to save
From another I get:
....
reading BAO data set: DR11LOWZ
Doing non-linear Pk: F
Doing CMB lensing: T
Doing non-linear lensing: T
TT lmax = 2508
EE lmax = 2508
ET lmax = 2508
BB lmax = 2500
PP lmax = 2500
lmax_computed_cl = 2508
Computing tensors: F
max_eta_k = 14000.0000
transfer kmax = 5.00000000
adding parameters for: lowl_SMW_70_dx11d_2014_10_03_v5c_Ap
adding parameters for: smica_g30_ftl_full_pp
adding parameters for: BKPlanck_detset_comb_dust
adding parameters for: DR11CMASS
adding parameters for: DR11LOWZ
adding parameters for: MGS
adding parameters for: 6DF
adding parameters for: plik_dx11dr2_HM_v18_TTTEEE
Fast divided into 1 blocks
36 parameters (10 slow ( 0 semi-slow), 26 fast ( 0 semi-fast))
5 Reading checkpoint from chains/CDM23_5.chk
1 Reading checkpoint from chains/CDM23_1.chk
starting Monte-Carlo
Chain:7 drag accpt: 0.379746825 fast/slow 147.620255 slow: 79
TObjectList: Unknown type to save
Re: issue with CosmoMC chains when compiled with gfortran
Posted: October 10 2018
by Antony Lewis
This gfortran issue is now hopefully fixed on the github master branch.
Re: issue with CosmoMC chains when compiled with gfortran
Posted: January 05 2019
by Shouvik Roychoudhury
I am using CosmoMC freshly downloaded from github with gcc 7.3.0 and associated gfortran, and openmpi-3.0.0. I can confirm that the problem still exists with gcc 7.3.0.
Here is the output file:
Code: Select all
Number of MPI processes: 4
file_root:trial_vanilla
Random seeds: 17418, 24775 rand_inst: 1
Using clik with likelihood file ./data/clik/hi_l/plik/plik_dx11dr2_HM_v18_TT.clik
Random seeds: 28792, 24784 rand_inst: 4
Random seeds: 1361, 24783 rand_inst: 3
Random seeds: 28456, 24785 rand_inst: 2
----
clik version 6dc2a8cf3965
smica
----
clik version 6dc2a8cf3965
smica
----
clik version 6dc2a8cf3965
smica
----
clik version 6dc2a8cf3965
smica
Checking likelihood './data/clik/hi_l/plik/plik_dx11dr2_HM_v18_TT.clik' on test data. got -380.979 expected -380.979 (diff -8.6809e-09)
----
TT from l=0 to l= 2508
Checking likelihood './data/clik/hi_l/plik/plik_dx11dr2_HM_v18_TT.clik' on test data. got -380.979 expected -380.979 (diff -8.6809e-09)
----
Checking likelihood './data/clik/hi_l/plik/plik_dx11dr2_HM_v18_TT.clik' on test data. got -380.979 expected -380.979 (diff -8.6809e-09)
----
TT from l=0 to l= 2508
TT from l=0 to l= 2508
Checking likelihood './data/clik/hi_l/plik/plik_dx11dr2_HM_v18_TT.clik' on test data. got -380.979 expected -380.979 (diff -8.6809e-09)
----
TT from l=0 to l= 2508
Clik will run with the following nuisance parameters:
A_cib_217^@
cib_index^@
xi_sz_cib^@
A_sz^@
ps_A_100_100^@
ps_A_143_143^@
ps_A_143_217^@
ps_A_217_217^@
ksz_norm^@
gal545_A_100^@
gal545_A_143^@
gal545_A_143_217^@
gal545_A_217^@
calib_100T^@
calib_217T^@
A_planck^@
Using clik with likelihood file ./data/clik/low_l/bflike/lowl_SMW_70_dx11d_2014_10_03_v5c_Ap.clik
BFLike Ntemp = 2876
BFLike Nq = 1407
BFLike Nu = 1407
BFLike Nside = 16
BFLike Nwrite = 32393560
BFLike Ntemp = 2876
BFLike Nq = 1407
BFLike Nu = 1407
BFLike Nside = 16
BFLike Nwrite = 32393560
BFLike Ntemp = 2876
BFLike Nq = 1407
BFLike Nu = 1407
BFLike Nside = 16
BFLike Nwrite = 32393560
BFLike Ntemp = 2876
BFLike Nq = 1407
BFLike Nu = 1407
BFLike Nside = 16
BFLike Nwrite = 32393560
cls file appears to have 5+ columns
assuming it is a CAMB file with l, TT, EE, BB, TE
cls file appears to have 5+ columns
assuming it is a CAMB file with l, TT, EE, BB, TE
cls file appears to have 5+ columns
assuming it is a CAMB file with l, TT, EE, BB, TE
cls file appears to have 5+ columns
assuming it is a CAMB file with l, TT, EE, BB, TE
info = 0
info = 0
info = 0
info = 0
----
clik version 6dc2a8cf3965
bflike_smw
----
clik version 6dc2a8cf3965
bflike_smw
Checking likelihood './data/clik/low_l/bflike/lowl_SMW_70_dx11d_2014_10_03_v5c_Ap.clik' on test data. got -5247.87 expected -5247.87 (diff 4.20627e-07)
----
TT from l=0 to l= 29
EE from l=0 to l= 29
BB from l=0 to l= 29
TE from l=0 to l= 29
Clik will run with the following nuisance parameters:
A_planck^@
Doing non-linear Pk: F
Doing CMB lensing: T
Doing non-linear lensing: T
TT lmax = 2508
EE lmax = 2500
ET lmax = 2500
BB lmax = 2500
PP lmax = 2500
lmax_computed_cl = 2508
Computing tensors: F
max_eta_k = 14000.0000
transfer kmax = 5.00000000
----
clik version 6dc2a8cf3965
bflike_smw
adding parameters for: lowl_SMW_70_dx11d_2014_10_03_v5c_Ap
adding parameters for: plik_dx11dr2_HM_v18_TT
Fast divided into 1 blocks
21 parameters ( 7 slow ( 0 semi-slow), 14 fast ( 0 semi-fast))
----
clik version 6dc2a8cf3965
bflike_smw
1 Reading checkpoint from chains/trial_vanilla/trial_vanilla_1.chk
starting Monte-Carlo
Checking likelihood './data/clik/low_l/bflike/lowl_SMW_70_dx11d_2014_10_03_v5c_Ap.clik' on test data. got -5247.87 expected -5247.87 (diff 4.20627e-07)
----
TT from l=0 to l= 29
EE from l=0 to l= 29
BB from l=0 to l= 29
TE from l=0 to l= 29
Checking likelihood './data/clik/low_l/bflike/lowl_SMW_70_dx11d_2014_10_03_v5c_Ap.clik' on test data. got -5247.87 expected -5247.87 (diff 4.20627e-07)
----
TT from l=0 to l= 29
EE from l=0 to l= 29
BB from l=0 to l= 29
TE from l=0 to l= 29
Checking likelihood './data/clik/low_l/bflike/lowl_SMW_70_dx11d_2014_10_03_v5c_Ap.clik' on test data. got -5247.87 expected -5247.87 (diff 4.20627e-07)
----
TT from l=0 to l= 29
EE from l=0 to l= 29
BB from l=0 to l= 29
TE from l=0 to l= 29
4 Reading checkpoint from chains/trial_vanilla/trial_vanilla_4.chk
2 Reading checkpoint from chains/trial_vanilla/trial_vanilla_2.chk
3 Reading checkpoint from chains/trial_vanilla/trial_vanilla_3.chk
Chain:1 drag accpt: 0.447761208 fast/slow 67.0677948 slow: 59
Chain:3 drag accpt: 0.447761208 fast/slow 66.8793106 slow: 58
TObjectList: Unknown type to save
--------------------------------------------------------------------------
mpirun has exited due to process rank 3 with PID 24516 on
node s83n34 exiting improperly. There are three reasons this could occur:
1. this process did not call "init" before exiting, but others in
the job did. This can cause a job to hang indefinitely while it waits
for all processes to call "init". By rule, if one process calls "init",
then ALL processes must call "init" prior to termination.
2. this process called "init", but exited without calling "finalize".
By rule, all processes that call "init" MUST call "finalize" prior to
exiting or it will be considered an "abnormal termination"
3. this process called "MPI_Abort" or "orte_abort" and the mca parameter
orte_create_session_dirs is set to false. In this case, the run-time cannot
detect that the abort call was an abnormal termination. Hence, the only
error message you will receive is this one.
This may have caused other processes in the application to be
terminated by signals sent by mpirun (as reported here).
You can avoid this message by specifying -quiet on the mpirun command line.
76,2 36%
Re: issue with CosmoMC chains when compiled with gfortran
Posted: January 05 2019
by Antony Lewis
I think this is a gfortran compiler bug. Using this test code in driver.F90
Code: Select all
Type(TObjectList) :: L
Type(TBinaryFile) F
real(mcp) arr(9)
arr=1
call L%Add(arr)
call F%CreateFile('tester.bin')
call L%SaveBinary(F%unit)
call F%Close()
call L%Clear()
call F%Open('tester.bin')
call L%ReadBinary(F%unit)
call F%Close()
call F%CreateFile('tester.bin')
call L%SaveBinary(F%unit)
stop
I can reproduce the issue with gfortran 6.4.1. However it appears to be fixed in gfortran 8.2.1 and 7.3.1 (e.g. try the default cmbant/cosmobox docker container).