CosmoCoffee

daan meerburg · Post by **daan meerburg** » March 23 2013

Hi Antony
I installed cosmomc and the planck likelihood and they run. However after a few minutes, I encounter a segmentation fault:

[meerburg@della-068 cosmomc]$ ./cosmomc test.ini
Number of MPI processes: 1
Random seeds: 14404, 4590 rand_inst: 1
WMAP options (beam TE TT) T T T
Using clik with likelihood file ./data/clik/CAMspec_v6.2TN_2013_02_26_dist.clik
----
clik version 5887
CAMspec e61cec87-3a37-43ca-8ed1-edcfcaf5c00a
Checking likelihood './data/clik/CAMspec_v6.2TN_2013_02_26_dist.clik' on test data. got -3908.71 expected -3908.71 (diff -3.71156e-08)
----
TT from l=0 to l= 2500
Clik will run with the following nuisance parameters:
A_ps_100
A_ps_143
A_ps_217
A_cib_143
A_cib_217
A_sz
r_ps
r_cib
n_Dl_cib
cal_100
cal_217
xi_sz_cib
A_ksz
Bm_1_1
Using clik with likelihood file ./data/clik/commander_v4.1_lm49.clik
----
clik version 5887
gibbs d462e865-e178-449a-ac29-5c16ab9b38f5
Checking likelihood './data/clik/commander_v4.1_lm49.clik' on test data. got 3.2784 expected 3.2784 (diff -2.55579e-10)
----
TT from l=0 to l= 49
Using clik with likelihood file ./data/clik/lowlike_v222.clik
Initializing Planck low-likelihood, version v2.1
----
clik version 5887
lowlike "lowlike v222"
Checking likelihood './data/clik/lowlike_v222.clik' on test data. got -1007.04 expected -1007.04 (diff -1.87824e-07)
----
TT from l=0 to l= 32
EE from l=0 to l= 32
BB from l=0 to l= 32
TE from l=0 to l= 32
adding parameters for: lowlike_v222.clik
adding parameters for: commander_v4.1_lm49.clik
adding parameters for: CAMspec_v6.2TN_2013_02_26_dist.clik
WARNING: zero padding ext cls in LoadFiducialHighLTemplate
Computing tensors: F
Doing CMB lensing: T
Doing non-linear Pk: 0
lmax = 6500
lmax_computed_cl = 2500
max_eta_k = 6625.00000000000
transfer kmax = 0.800000011920929
Number of C_ls = 4
Fast divided into 1 blocks
Varying 20 parameters ( 6 slow ( 2 semi-slow), 14 fast ( 0 semi-fast))
starting Monte-Carlo
Initialising BBN Helium data...
Chain:0 mult:1 accept drag, accpt: 1.000000 fast/slow 26.00000
Chain:0 mult:5 accept drag, accpt: 0.4285714 fast/slow 38.60000
Chain:0 mult:1 accept drag, accpt: 0.5000000 fast/slow 41.83333
Chain:0 mult:1 accept drag, accpt: 0.5555556 fast/slow 42.57143
Chain:0 mult:6 accept drag, accpt: 0.4000000 fast/slow 48.08333
Chain:0 mult:4 accept drag, accpt: 0.3684210 fast/slow 48.75000
Chain:0 mult:2 accept drag, accpt: 0.3809524 fast/slow 48.82353
Chain:0 mult:3 accept drag, accpt: 0.3750000 fast/slow 49.40000
Chain:0 mult:1 accept drag, accpt: 0.4000000 fast/slow 49.14286
Chain:0 mult:3 accept drag, accpt: 0.3928571 fast/slow 48.70833
Chain:0 mult:3 accept drag, accpt: 0.3870968 fast/slow 48.66667
Chain:0 mult:5 accept drag, accpt: 0.3611111 fast/slow 49.18750
Chain:0 mult:1 accept drag, accpt: 0.3783784 fast/slow 48.60606
Chain:0 mult:1 accept drag, accpt: 0.3947369 fast/slow 48.14706
Chain:0 mult:1 accept drag, accpt: 0.4102564 fast/slow 48.42857
Chain:0 mult:5 accept drag, accpt: 0.3863636 fast/slow 47.75000
Chain:0 mult:2 accept drag, accpt: 0.3913043 fast/slow 48.02381
Chain:0 mult:1 accept drag, accpt: 0.4042553 fast/slow 48.02325
Chain:0 mult:1 accept drag, accpt: 0.4166667 fast/slow 48.15909
forrtl: severe (174): SIGSEGV, segmentation fault occurred
Image PC Routine Line Source
cosmomc 0000000000426178 Unknown Unknown Unknown
cosmomc 00000000004C5614 Unknown Unknown Unknown
cosmomc 00000000004F787C Unknown Unknown Unknown
cosmomc 00000000004FDBC6 Unknown Unknown Unknown
cosmomc 000000000042272C Unknown Unknown Unknown
libc.so.6 00002ABF654099C4 Unknown Unknown Unknown
cosmomc 0000000000422639 Unknown Unknown Unknown

Any idea what might cause this SF? Everything compiled correctly. The only thing I was confused about is the planck data directory. According to your notes you had this stored in the plc folder. I also do not completely understand how the planck likelihood knows the locations of the data (with wmap you had to specify this in the code), although I guess this dealt with with the linker.

I am adding some debugging flags, to see if I can trace the source. But maybe you know more already.

Thanks in advance
Daan

daan meerburg · Post by **daan meerburg** » March 23 2013

I added some flags and now I get:
....
Chain:0 mult:2 accept drag, accpt: 0.3725490 fast/slow 48.25000
Chain:0 mult:1 accept drag, accpt: 0.3846154 fast/slow 48.65854
forrtl: severe (174): SIGSEGV, segmentation fault occurred
Image PC Routine Line Source
cosmomc 000000000042B08F objectlists_mp_va 479 ObjectLists.f90
cosmomc 00000000004F3D97 paramdef_mp_addmp 547 paramdef.F90
cosmomc 000000000052A26B montecarlo_mp_mcm 737 MCMC.f90
cosmomc 000000000052EE2D MAIN__ 346 driver.F90
cosmomc 000000000042255C Unknown Unknown Unknown
libc.so.6 00002AF494BB89C4 Unknown Unknown Unknown
cosmomc 0000000000422469 Unknown Unknown Unknown

Any idea? Could it be due to the compiler I used? I'll check again and make sure it is fortran 2003 proof.
best
daan

Post by **Antony Lewis** » March 23 2013

Are you not running with MPI? If so, I have to say that is completely untested (I always recommend running 4-8 chains).

Which version of ifort are you using? I haven't seen any crashes in object lists with v13 (but it's also possible I messed something up making the public release)

The data file locations are set in e.g. batch1/CAMspec_defaults.ini (for CAMspec). They are fixed to be in ./data/clik which is why the notes say to make a symlink from there to the actual location. The way the Planck runs are configured is to include a separte .ini file for each likelihood being used.

daan meerburg · Post by **daan meerburg** » March 23 2013

The reason I was running only 1 single chain was that I was ldd-ing my process on one of the nodes. I could have run several chains, but I used interactive mode and wanted to be quick in line. I ldd-ed in the first place because I got a crash when running 4 chains (the same crash).

That being said, I contacted the help-desk of the cluster and the mpif90 contains version v12.1, not 13. So I assume it is due to this issue. Unfortunately he also told me that the system here (at princeton) does not have v13 installed. On the other hand, i did not get any compilation errors.

Besides ifort v13, have you tried any other compilers?

best
daan

Post by **Antony Lewis** » March 23 2013

No, but I know ifort 12.x will give lots of crashes and compiler errors (depending on exactly which version being used).

Hopefully they can install version 13 for you.

daan meerburg · Post by **daan meerburg** » March 24 2013

Thanks Antony,
The cluster help-desk will get back to me on Monday and see if they can install ifort v13.
Best
daan

Richard Easther · Post by **Richard Easther** » March 31 2013

We see the same bug, also under 12.xx -- we have 13 on our cluster but it is a test install (thanks to our excellent cluster people); whoever tries it first should post the results here :-)

Sheng Li · Post by **Sheng Li** » April 03 2013

daan meerburg wrote:The reason I was running only 1 single chain was that I was ldd-ing my process on one of the nodes. I could have run several chains, but I used interactive mode and wanted to be quick in line. I ldd-ed in the first place because I got a crash when running 4 chains (the same crash).

That being said, I contacted the help-desk of the cluster and the mpif90 contains version v12.1, not 13. So I assume it is due to this issue. Unfortunately he also told me that the system here (at princeton) does not have v13 installed. On the other hand, i did not get any compilation errors.

Besides ifort v13, have you tried any other compilers?

best
daan

I can also confirm that intel V12.1 will NOT work with class-type data type in new version of CosmoMC.

You can also, I think, install intel 2013 in your home directory, if you would like to do that.
And it is not complicate unless you are NOT allowed (physically not or root required to install it -- for this case, I cannot remind myself if it requires root privilege.) to install it even on your home dir.

Also, I remembered, if i am not correct then forget what I am going to mention you, gfortran will NOT work.
The other famous group, PGI fortran will work with fortran 2013 BUT it is a commercial software.

So, you may try to install intel fortran by your own.

Cheers,

Sheng Li · Post by **Sheng Li** » April 03 2013

Richard Easther wrote:We see the same bug, also under 12.xx -- we have 13 on our cluster but it is a test install (thanks to our excellent cluster people); whoever tries it first should post the results here :-)

If you talked about V13, there is a test report and maybe some indications for how to install the whole CosmoMC 2013.

Best,

CosmoCoffee

cosmomc 2013

cosmomc 2013

cosmomc 2013

Re: cosmomc 2013

cosmomc 2013

Re: cosmomc 2013

cosmomc 2013

cosmomc 2013

Re: cosmomc 2013

Re: cosmomc 2013