Trouble shooting cosmomc

Use of Cobaya. camb, CLASS, cosmomc, compilers, etc.
Post Reply
Tristan L Smith
Posts: 25
Joined: November 14 2005
Affiliation: Swarthmore
Contact:

Trouble shooting cosmomc

Post by Tristan L Smith » August 10 2021

Hello,

I am running cosmomc with a version of CAMB which includes early dark energy. When I run cosmomc all of the output looks normal (it shows the chi2 for the various likelihoods) but after 10-15 hours the chain stops running. The output does not show any error message, but the error file indicates that there was a seg fault. I am able to re-start the chains and run them again for another 10-15 hours before they stop running.

What can I do to get cosmomc to give more information about the runs so that I can pinpoint why the code is intermittently seg-faulting?

I have set:

Code: Select all

feedback=20
DebugLevel = 5
but these settings haven't produce any useful output for a diagnosis.

Thank you for any advice.

Best,

Tristan

Antony Lewis
Posts: 1936
Joined: September 23 2004
Affiliation: University of Sussex
Contact:

Re: Trouble shooting cosmomc

Post by Antony Lewis » August 10 2021

You can make and run cosmomc_debug, but seg faults don't always generate useful tracebacks. (the early dark energy model built into CAMB or some other custom modification?). Debug builds of CAMB can be very slow.

Tristan L Smith
Posts: 25
Joined: November 14 2005
Affiliation: Swarthmore
Contact:

Re: Trouble shooting cosmomc

Post by Tristan L Smith » August 11 2021

Thanks for the suggestion! I am using the built-in dark energy model with some modifications to the shooting. Is there a possibility that something isn’t being properly de-allocated after each CAMB call when using the dark energy model?

Antony Lewis
Posts: 1936
Joined: September 23 2004
Affiliation: University of Sussex
Contact:

Re: Trouble shooting cosmomc

Post by Antony Lewis » August 11 2021

If there was a deallocation problem you'd likely run out of memory - but can see if memory is increasing. But with allocatable arrays it's not easy to leak memory.

Tristan L Smith
Posts: 25
Joined: November 14 2005
Affiliation: Swarthmore
Contact:

Re: Trouble shooting cosmomc

Post by Tristan L Smith » August 25 2021

I finally got around to running the code with cosmomc_debug and this is what it returned:
forrtl: severe (174): SIGSEGV, segmentation fault occurred
Image PC Routine Line Source
cosmomc_debug 0000000000DCBFF3 Unknown Unknown Unknown
libpthread-2.28.s 0000152C9ED52B20 Unknown Unknown Unknown
cosmomc_debug 00000000009B3E53 gaugeinterface_mp 242 equations.f90
cosmomc_debug 00000000009BA2AE gaugeinterface_mp 424 equations.f90
the last line was then repeated several times and then it lists
cosmomc_debug 0000000000AC1E8A cambmain_mp_gettr 1144 cmbmain.f90
cosmomc_debug 0000000000AC14EC cambmain_mp_trans 1123 cmbmain.f90
libiomp5.so 0000152C9FDCFD43 __kmp_invoke_micr Unknown Unknown
libiomp5.so 0000152C9FD5F63F Unknown Unknown Unknown
libiomp5.so 0000152C9FD5E65C Unknown Unknown Unknown
libiomp5.so 0000152C9FDD02FB Unknown Unknown Unknown
libpthread-2.28.s 0000152C9ED4814A Unknown Unknown Unknown
libc-2.28.so 0000152C9E875F23 clone Unknown Unknown
--------------------------------------------------------------------------
Primary job terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--------------------------------------------------------------------------
forrtl: error (78): process killed (SIGTERM)
I think the rest of the .err output is just related to the rest of the processes being canceled prematurely.

If you have any thoughts on what information this might contain, that would be extremely helpful.

If it is useful, I have set stop_on_error = F.

It is running again so that I can see if these same tracebacks appear when it crashes next time...

In the meantime, is there a way to get output that shows the cosmological parameters at each point of each chain and whether or not the Cls were successfully computed? If I could determine if there is some combination of parameters for which CAMB is failing that would be very helpful.

Antony Lewis
Posts: 1936
Joined: September 23 2004
Affiliation: University of Sussex
Contact:

Re: Trouble shooting cosmomc

Post by Antony Lewis » August 25 2021

You did at least get some source line numbers, can see what you have there? You could add some write statements if you want to see parameters. (Cobaya has much more detailed debug options built in).

Post Reply