issue with CosmoMC chains when compiled with gfortran

Use of Cobaya. camb, CLASS, cosmomc, compilers, etc.
Post Reply
Elizabeth Gould
Posts: 13
Joined: January 30 2015
Affiliation: Queen's University

issue with CosmoMC chains when compiled with gfortran

Post by Elizabeth Gould » May 04 2017

When I run CosmoMC with check-pointing, stop running it, and start again later from the checkpoint, the code seems to have issues and stops. In addition, some of the chain files appear to stop around 20 lines in length, and GetDist fails to read them.

I currently have the Planck Likelihood Code installed with gfortran and CosmoMC compiled with gfortran. I had not had this issue when I used ifort. Due to an issue with the PLC, it had to be reinstalled, however, and it was reinstalled with gfortran.

Here are two samples of the last things written to the screen by CosmoMC:
Fast divided into 1 blocks
35 parameters ( 9 slow ( 0 semi-slow), 26 fast ( 0 semi-fast))
1 Reading checkpoint from chains/CDM01_1.chk
Checking likelihood './data/clik/low_l/bflike/lowl_SMW_70_dx11d_2014_10_03_v5c_Ap.clik' on test data. got -5247.87 expected -5247.87 (diff 3.89829e-07)
----
TT from l=0 to l= 29
EE from l=0 to l= 29
BB from l=0 to l= 29
TE from l=0 to l= 29
starting Monte-Carlo
Checking likelihood './data/clik/low_l/bflike/lowl_SMW_70_dx11d_2014_10_03_v5c_Ap.clik' on test data. got -5247.87 expected -5247.87 (diff 3.89829e-07)
----
TT from l=0 to l= 29
EE from l=0 to l= 29
BB from l=0 to l= 29
TE from l=0 to l= 29
Chain:11 drag accpt: 0.379746825 fast/slow 121.696205 slow: 79
Chain:3 drag accpt: 0.333333343 fast/slow 147.111115 slow: 90
Chain:7 drag accpt: 0.326086968 fast/slow 146.782608 slow: 92
Chain:5 drag accpt: 0.288461536 fast/slow 138.500000 slow: 104
Chain:1 drag accpt: 0.300000012 fast/slow 149.020004 slow: 100
Chain:0 drag accpt: 0.441176474 fast/slow 133.507462 slow: 67
Chain:9 drag accpt: 0.229007632 fast/slow 153.236649 slow: 131
TObjectList: Unknown type to save
Fast divided into 1 blocks
36 parameters (10 slow ( 0 semi-slow), 26 fast ( 0 semi-fast))
1 Reading checkpoint from chains/CDM03_1.chk
info = 0
info = 0
starting Monte-Carlo
----
clik version 723c1a4b0580 MAKEFILE
bflike_smw
----
clik version 723c1a4b0580 MAKEFILE
bflike_smw
Checking likelihood './data/clik/low_l/bflike/lowl_SMW_70_dx11d_2014_10_03_v5c_Ap.clik' on test data. got -5247.87 expected -5247.87 (diff 3.83458e-07)
----
TT from l=0 to l= 29
EE from l=0 to l= 29
BB from l=0 to l= 29
TE from l=0 to l= 29
Checking likelihood './data/clik/low_l/bflike/lowl_SMW_70_dx11d_2014_10_03_v5c_Ap.clik' on test data. got -5247.87 expected -5247.87 (diff 3.83458e-07)
----
TT from l=0 to l= 29
EE from l=0 to l= 29
BB from l=0 to l= 29
TE from l=0 to l= 29
Chain:9 drag accpt: 0.454545468 fast/slow 147.363632 slow: 66
Chain:0 drag accpt: 0.348837197 fast/slow 126.522385 slow: 67
Chain:10 drag accpt: 0.434782594 fast/slow 133.014496 slow: 69
TObjectList: Unknown type to save

Antony Lewis
Posts: 1936
Joined: September 23 2004
Affiliation: University of Sussex
Contact:

Re: issue with CosmoMC chains when compiled with gfortran

Post by Antony Lewis » May 08 2017

Which gfortran version is this? (not sure I've ever tested checkpointing with gfortran myself).

Elizabeth Gould
Posts: 13
Joined: January 30 2015
Affiliation: Queen's University

issue with CosmoMC chains when compiled with gfortran

Post by Elizabeth Gould » May 08 2017

This is the version listed:
gfortran --version
GNU Fortran (Ubuntu 4.9.4-2ubuntu1~14.04.1) 4.9.4

Antony Lewis
Posts: 1936
Joined: September 23 2004
Affiliation: University of Sussex
Contact:

Re: issue with CosmoMC chains when compiled with gfortran

Post by Antony Lewis » May 08 2017

You need at least version 6 (I'm amazed it even compile with v4.9..)

Elizabeth Gould
Posts: 13
Joined: January 30 2015
Affiliation: Queen's University

issue with CosmoMC chains when compiled with gfortran

Post by Elizabeth Gould » May 18 2017

Thanks.

It appears that I did use gfortran 6.2. However, since both are installed it may have compiled partly with gfortran 4.9. I have recompiled it ensuring that the later version of gfortran is used for everything. I am currently rerunning it to see if that fixed the issue.

Elizabeth Gould
Posts: 13
Joined: January 30 2015
Affiliation: Queen's University

issue with CosmoMC chains when compiled with gfortran

Post by Elizabeth Gould » June 08 2017

I have tried rerunning CosmoMC ensuring I am using gfortran 6.2, and the issue has seemed to persist. For one run when it tries to restart from the checkpoint files, I get this:
....

12 Reading checkpoint from chains/CDM22_12.chk
info = 0
----
clik version 723c1a4b0580 MAKEFILE
bflike_smw
Checking likelihood './data/clik/low_l/bflike/lowl_SMW_70_dx11d_2014_10_03_v5c_Ap.clik' on test data. got -5247.87 expected -5247.87 (diff 5.67619e-07)
----
TT from l=0 to l= 29
EE from l=0 to l= 29
BB from l=0 to l= 29
TE from l=0 to l= 29
11 Reading checkpoint from chains/CDM22_11.chk
info = 0
----
clik version 723c1a4b0580 MAKEFILE
bflike_smw
Checking likelihood './data/clik/low_l/bflike/lowl_SMW_70_dx11d_2014_10_03_v5c_Ap.clik' on test data. got -5247.87 expected -5247.87 (diff 5.67619e-07)
----
TT from l=0 to l= 29
EE from l=0 to l= 29
BB from l=0 to l= 29
TE from l=0 to l= 29
5 Reading checkpoint from chains/CDM22_5.chk
Chain:2 drag accpt: 0.566037714 fast/slow 140.622223 slow: 45
Chain:7 drag accpt: 0.555555582 fast/slow 138.679993 slow: 50
TObjectList: Unknown type to save
From another I get:
....

reading BAO data set: DR11LOWZ
Doing non-linear Pk: F
Doing CMB lensing: T
Doing non-linear lensing: T
TT lmax = 2508
EE lmax = 2508
ET lmax = 2508
BB lmax = 2500
PP lmax = 2500
lmax_computed_cl = 2508
Computing tensors: F
max_eta_k = 14000.0000
transfer kmax = 5.00000000
adding parameters for: lowl_SMW_70_dx11d_2014_10_03_v5c_Ap
adding parameters for: smica_g30_ftl_full_pp
adding parameters for: BKPlanck_detset_comb_dust
adding parameters for: DR11CMASS
adding parameters for: DR11LOWZ
adding parameters for: MGS
adding parameters for: 6DF
adding parameters for: plik_dx11dr2_HM_v18_TTTEEE
Fast divided into 1 blocks
36 parameters (10 slow ( 0 semi-slow), 26 fast ( 0 semi-fast))
5 Reading checkpoint from chains/CDM23_5.chk
1 Reading checkpoint from chains/CDM23_1.chk
starting Monte-Carlo
Chain:7 drag accpt: 0.379746825 fast/slow 147.620255 slow: 79
TObjectList: Unknown type to save

Antony Lewis
Posts: 1936
Joined: September 23 2004
Affiliation: University of Sussex
Contact:

Re: issue with CosmoMC chains when compiled with gfortran

Post by Antony Lewis » October 10 2018

This gfortran issue is now hopefully fixed on the github master branch.

Shouvik Roychoudhury
Posts: 31
Joined: August 14 2016
Affiliation: IIT Bombay

Re: issue with CosmoMC chains when compiled with gfortran

Post by Shouvik Roychoudhury » January 05 2019

I am using CosmoMC freshly downloaded from github with gcc 7.3.0 and associated gfortran, and openmpi-3.0.0. I can confirm that the problem still exists with gcc 7.3.0.

Here is the output file:

Code: Select all

Number of MPI processes:           4
 file_root:trial_vanilla
 Random seeds: 17418, 24775 rand_inst:   1
 Using clik with likelihood file ./data/clik/hi_l/plik/plik_dx11dr2_HM_v18_TT.clik
 Random seeds: 28792, 24784 rand_inst:   4
 Random seeds:  1361, 24783 rand_inst:   3
 Random seeds: 28456, 24785 rand_inst:   2
----
clik version 6dc2a8cf3965
  smica
----
clik version 6dc2a8cf3965
  smica
----
clik version 6dc2a8cf3965
  smica
----
clik version 6dc2a8cf3965
  smica
Checking likelihood './data/clik/hi_l/plik/plik_dx11dr2_HM_v18_TT.clik' on test data. got -380.979 expected -380.979 (diff -8.6809e-09)
----
   TT from l=0 to l=        2508
Checking likelihood './data/clik/hi_l/plik/plik_dx11dr2_HM_v18_TT.clik' on test data. got -380.979 expected -380.979 (diff -8.6809e-09)
----
Checking likelihood './data/clik/hi_l/plik/plik_dx11dr2_HM_v18_TT.clik' on test data. got -380.979 expected -380.979 (diff -8.6809e-09)
----
   TT from l=0 to l=        2508
   TT from l=0 to l=        2508
Checking likelihood './data/clik/hi_l/plik/plik_dx11dr2_HM_v18_TT.clik' on test data. got -380.979 expected -380.979 (diff -8.6809e-09)
----
   TT from l=0 to l=        2508
 Clik will run with the following nuisance parameters:
 A_cib_217^@
 cib_index^@
 xi_sz_cib^@
 A_sz^@
 ps_A_100_100^@
 ps_A_143_143^@
 ps_A_143_217^@
 ps_A_217_217^@
 ksz_norm^@
 gal545_A_100^@
 gal545_A_143^@
 gal545_A_143_217^@
 gal545_A_217^@
 calib_100T^@
 calib_217T^@
 A_planck^@
 Using clik with likelihood file ./data/clik/low_l/bflike/lowl_SMW_70_dx11d_2014_10_03_v5c_Ap.clik
 BFLike Ntemp  =        2876
 BFLike Nq     =        1407
 BFLike Nu     =        1407
 BFLike Nside  =          16
 BFLike Nwrite =    32393560
 BFLike Ntemp  =        2876
 BFLike Nq     =        1407
 BFLike Nu     =        1407
 BFLike Nside  =          16
 BFLike Nwrite =    32393560
 BFLike Ntemp  =        2876
 BFLike Nq     =        1407
 BFLike Nu     =        1407
 BFLike Nside  =          16
 BFLike Nwrite =    32393560
 BFLike Ntemp  =        2876
 BFLike Nq     =        1407
 BFLike Nu     =        1407
 BFLike Nside  =          16
 BFLike Nwrite =    32393560
 cls file appears to have 5+ columns
 assuming it is a CAMB file with l, TT, EE, BB, TE
 cls file appears to have 5+ columns
 assuming it is a CAMB file with l, TT, EE, BB, TE
 cls file appears to have 5+ columns
 assuming it is a CAMB file with l, TT, EE, BB, TE
 cls file appears to have 5+ columns
 assuming it is a CAMB file with l, TT, EE, BB, TE
 info =            0
 info =            0
 info =            0
 info =            0
----
clik version 6dc2a8cf3965
  bflike_smw
----
clik version 6dc2a8cf3965
  bflike_smw
Checking likelihood './data/clik/low_l/bflike/lowl_SMW_70_dx11d_2014_10_03_v5c_Ap.clik' on test data. got -5247.87 expected -5247.87 (diff 4.20627e-07)
----
   TT from l=0 to l=          29
   EE from l=0 to l=          29
   BB from l=0 to l=          29
   TE from l=0 to l=          29
 Clik will run with the following nuisance parameters:
 A_planck^@
 Doing non-linear Pk: F
 Doing CMB lensing: T
 Doing non-linear lensing: T
 TT lmax =  2508
 EE lmax =  2500
 ET lmax =  2500
 BB lmax =  2500
 PP lmax =  2500
 lmax_computed_cl  =  2508
 Computing tensors: F
 max_eta_k         =    14000.0000
 transfer kmax     =    5.00000000
----
clik version 6dc2a8cf3965
  bflike_smw
 adding parameters for: lowl_SMW_70_dx11d_2014_10_03_v5c_Ap
 adding parameters for: plik_dx11dr2_HM_v18_TT
 Fast divided into            1  blocks
 21 parameters ( 7 slow ( 0 semi-slow), 14 fast ( 0 semi-fast))
----
clik version 6dc2a8cf3965
  bflike_smw
           1 Reading checkpoint from chains/trial_vanilla/trial_vanilla_1.chk
 starting Monte-Carlo
Checking likelihood './data/clik/low_l/bflike/lowl_SMW_70_dx11d_2014_10_03_v5c_Ap.clik' on test data. got -5247.87 expected -5247.87 (diff 4.20627e-07)
----
   TT from l=0 to l=          29
   EE from l=0 to l=          29
   BB from l=0 to l=          29
   TE from l=0 to l=          29
Checking likelihood './data/clik/low_l/bflike/lowl_SMW_70_dx11d_2014_10_03_v5c_Ap.clik' on test data. got -5247.87 expected -5247.87 (diff 4.20627e-07)
----
   TT from l=0 to l=          29
   EE from l=0 to l=          29
   BB from l=0 to l=          29
   TE from l=0 to l=          29
Checking likelihood './data/clik/low_l/bflike/lowl_SMW_70_dx11d_2014_10_03_v5c_Ap.clik' on test data. got -5247.87 expected -5247.87 (diff 4.20627e-07)
----
   TT from l=0 to l=          29
   EE from l=0 to l=          29
   BB from l=0 to l=          29
   TE from l=0 to l=          29
           4 Reading checkpoint from chains/trial_vanilla/trial_vanilla_4.chk
           2 Reading checkpoint from chains/trial_vanilla/trial_vanilla_2.chk
           3 Reading checkpoint from chains/trial_vanilla/trial_vanilla_3.chk
 Chain:1 drag accpt:  0.447761208     fast/slow   67.0677948     slow:          59
 Chain:3 drag accpt:  0.447761208     fast/slow   66.8793106     slow:          58
 TObjectList: Unknown type to save
 --------------------------------------------------------------------------
mpirun has exited due to process rank 3 with PID 24516 on
node s83n34 exiting improperly. There are three reasons this could occur:

1. this process did not call "init" before exiting, but others in
the job did. This can cause a job to hang indefinitely while it waits
for all processes to call "init". By rule, if one process calls "init",
then ALL processes must call "init" prior to termination.

2. this process called "init", but exited without calling "finalize".
By rule, all processes that call "init" MUST call "finalize" prior to
exiting or it will be considered an "abnormal termination"

3. this process called "MPI_Abort" or "orte_abort" and the mca parameter
orte_create_session_dirs is set to false. In this case, the run-time cannot
detect that the abort call was an abnormal termination. Hence, the only
error message you will receive is this one.

This may have caused other processes in the application to be
terminated by signals sent by mpirun (as reported here).

You can avoid this message by specifying -quiet on the mpirun command line.
76,2 36%

Antony Lewis
Posts: 1936
Joined: September 23 2004
Affiliation: University of Sussex
Contact:

Re: issue with CosmoMC chains when compiled with gfortran

Post by Antony Lewis » January 05 2019

I think this is a gfortran compiler bug. Using this test code in driver.F90

Code: Select all

   Type(TObjectList) :: L
   Type(TBinaryFile) F
   real(mcp) arr(9)
    
    arr=1
    call L%Add(arr)
    call F%CreateFile('tester.bin')
    call L%SaveBinary(F%unit)
    call F%Close()
    call L%Clear()
    call F%Open('tester.bin')
    call L%ReadBinary(F%unit)
    call F%Close()
    call F%CreateFile('tester.bin')
    call L%SaveBinary(F%unit)
    stop  
I can reproduce the issue with gfortran 6.4.1. However it appears to be fixed in gfortran 8.2.1 and 7.3.1 (e.g. try the default cmbant/cosmobox docker container).

Post Reply