error at recombination during run with cosmoMC

Use of Cobaya. camb, CLASS, cosmomc, compilers, etc.
Post Reply
Wallemacq Quentin
Posts: 2
Joined: October 07 2011
Affiliation: Universite de Liege

error at recombination during run with cosmoMC

Post by Wallemacq Quentin » April 26 2012

Dear all,

We have modified CAMB to include a new sector of matter, called mirror
matter, which is described by two parameters. Since mirror matter has
the same physics as ordinary matter, we use the same subroutines for
the recombination, but with different calls.
In single use, the modified CAMB runs well, without any problem on a
wide range of values of the two new parameters.
We therefore added the two new parameters to cosmoMC (by really
defining them, not by redefining unused parameters of cosmoMC). I
checked that the parameters were well passed to CAMB and started to
run cosmoMC. Everything seems to work well (writes in output files,
calculates local acceptance ratio,...etc) up to several thousands(!)
calls to CAMB, but it finally finishes by an error at mirror
recombination : "CAMB error inithermo: failed to find end of mirror
recombination". At that moment, all the parameters have typical
values, without anything particular with respect to those that already
have been realized in the several thousands of previous calls. In
fact, it seems that at one moment, the program gets NaN's for y(1),
y(2) and y(3) in recfast.f90 when it has to integrate with dverk the
equations of recombination. Therefore, the NaN's propagate up to
inithermo in modules.f90 and in particular to variables iv and vfi
that control the end of recombination.
The fact is that we don't understand why these NaN's appear after so
many runs of CAMB. Has anybody already been confronted to this kind of
problem or does someone have an idea or suggestion of what is
happening?
Any remark, comment, idea are welcome and will be very useful.

Below few remarks that could help :
- we only modified the parts of the program corresponding to adiabatic
initial conditions, flat model, scalar perturbations, and no massive
neutrinos;
- sometimes there are missing calls to CAMB (if I ask to print the
number of the iteration in cosmoMC and the parameters in CAMB,
sometimes there are several indices one after the other without any
print of CAMB parameters, meaning that CAMB, or at least
recombination, has not been done);
- for mirror recombination, we had to rescale the different redshifts
defining the different regimes, but also the initial redshift zinitial
in recfast.f90. We had a problem with this because we realized that
cosmoMC "remembered" this change from one call to CAMB to another,
causing an overflow in the end. The problem was invisible in single
use and was solved by simply reinitializing zinitial at each call of
CAMB, at the beginning of Recombination_init in recfast.f90. There may
be another variable doing the same, or maybe we don't understand well
what CosmoMC does, so any explanation on how CosmoMC works is also
welcome.

Thank you in advance,

Quentin W.
Paolo C.

Antony Lewis
Posts: 1945
Joined: September 23 2004
Affiliation: University of Sussex
Contact:

Re: error at recombination during run with cosmoMC

Post by Antony Lewis » April 26 2012

You may be able to compile with -fpe0 (in ifort) to crash rather than give NaN as soon as they appear; together with -traceback and other debugging options etc this may let you figure out where exactly they come from. (or it may not, ifort seeems to be a bit temperamental about actually reliaby giving traceback and crash info)

Wallemacq Quentin
Posts: 2
Joined: October 07 2011
Affiliation: Universite de Liege

error at recombination during run with cosmoMC

Post by Wallemacq Quentin » May 31 2012

I compiled as you suggested with ifort -fpe0 - traceback and effectively it crashed after a few calls to CAMB with a division by zero. This is ok but now it runs, with the same compilation options, up to tens of thousands of calls without any problem until it crashes with this error message :

forrtl: error (65): floating invalid
Image PC Routine Line Source
cosmomc 00000000006FD1C5 Unknown Unknown Unknown
cosmomc 000000000051E63A recombination_mp_ 967 recfast.f90
cosmomc 00000000004C135F dverk_ 976 subroutines.f90
cosmomc 0000000000521754 recombination_mp_ 786 recfast.f90
cosmomc 00000000004D9A2C thermodata_mp_ini 2587 modules.f90
cosmomc 000000000051B761 cambmain_mp_initv 667 cmbmain.f90
cosmomc 00000000005056E1 cambmain_mp_cmbma 173 cmbmain.f90
cosmomc 000000000051C5F5 camb_mp_camb_gett 41 camb.f90
cosmomc 000000000047517E cmb_cls_mp_getcls 109 CMB_Cls_simple.f90
cosmomc 000000000047FBAF calclike_mp_getlo 120 calclike.f90
cosmomc 000000000047F8F6 calclike_mp_getlo 86 calclike.f90
cosmomc 000000000048D123 montecarlo_mp_mcm 537 MCMC.f90
cosmomc 0000000000493EAD MAIN__ 370 driver.F90
cosmomc 000000000040800C Unknown Unknown Unknown
libc.so.6 00007F5C8AA58C4D Unknown Unknown Unknown
cosmomc 0000000000407F09 Unknown Unknown Unknown
Aborted

The first line is unknown, which doesn't help me to find the problem. It seems that, in the subroutine ION in recfast.f90, Tmat becomes negative (after having taken an abnormally large value just before), causing the error when taken to the power b_PPB = -0.6166. The fact is that I'm not able to find from where comes the problem before Tmat.
Any ideas, propositions, or comments will certainly help me to fix this problem.

With the best regards,

Quentin W.

Post Reply