CosmoCoffee Forum Index CosmoCoffee

 
 FAQFAQ   SearchSearch  MemberlistSmartFeed   MemberlistMemberlist    RegisterRegister 
   ProfileProfile   Log inLog in 
Arxiv New Filter | Bookmarks & clubs | Arxiv ref/author:

Checkpoint segmentation fault
 
Post new topic   Reply to topic    CosmoCoffee Forum Index -> Computers and software
View previous topic :: View next topic  
Author Message
Pablo Lemos



Joined: 07 Dec 2015
Posts: 9
Affiliation: Cambridge University

PostPosted: November 03 2016  Reply with quote

Hello,

I am trying to restart a previous CosmoMC run from the checkpoint files. The code was running fine the first time, but when I try to restart it I get the following error message:


2 Reading checkpoint from chains/hm_2.chk
3 Reading checkpoint from chains/hm_3.chk
4 Reading checkpoint from chains/hm_4.chk
1 Reading checkpoint from chains/hm_1.chk
starting Monte-Carlo
forrtl: severe (174): SIGSEGV, segmentation fault occurred
Image PC Routine Line Source
cosmomc 00000000007E1239 Unknown Unknown Unknown
cosmomc 00000000007DFB0E Unknown Unknown Unknown
cosmomc 0000000000771C52 Unknown Unknown Unknown
cosmomc 000000000070A1F8 Unknown Unknown Unknown
cosmomc 0000000000712C5B Unknown Unknown Unknown
libpthread.so.0 00007F5DF735C7E0 Unknown Unknown Unknown
cosmomc 000000000045EBD6 Unknown Unknown Unknown
cosmomc 000000000045ECE6 Unknown Unknown Unknown
cosmomc 0000000000554FDC Unknown Unknown Unknown
cosmomc 0000000000526AEE Unknown Unknown Unknown
cosmomc 000000000061791A Unknown Unknown Unknown
cosmomc 0000000000616CF7 Unknown Unknown Unknown
cosmomc 0000000000614249 Unknown Unknown Unknown
cosmomc 00000000004FEE8B Unknown Unknown Unknown
cosmomc 00000000005001CF Unknown Unknown Unknown
cosmomc 00000000005012BF Unknown Unknown Unknown
cosmomc 00000000004FF30A Unknown Unknown Unknown
cosmomc 0000000000519544 Unknown Unknown Unknown
cosmomc 000000000061F303 Unknown Unknown Unknown
cosmomc 0000000000439996 Unknown Unknown Unknown
libc.so.6 00007F5DF6FD7D1D Unknown Unknown Unknown
cosmomc 0000000000439889 Unknown Unknown Unknown
————————————————————————–
mpirun has exited due to process rank 1 with PID 30538 on
node calx024.ast.cam.ac.uk exiting without calling "finalize". This may
have caused other processes in the application to be
terminated by signals sent by mpirun (as reported here).
————————————————————————–


This has happened to me on two different sets of checkpoint files. Any idea about what could be causing it? It would be really useful if I could continue from those checkpoint files

Best
Pablo
Back to top
View user's profile  
Antony Lewis



Joined: 23 Sep 2004
Posts: 1301
Affiliation: University of Sussex

PostPosted: November 03 2016  Reply with quote

Not really, compiling cosmomc_debug instead may give a more helpful traceback.
Back to top
View user's profile [ Hidden ] Visit poster's website
Pablo Lemos



Joined: 07 Dec 2015
Posts: 9
Affiliation: Cambridge University

PostPosted: November 03 2016  Reply with quote

Thanks for your help Antony. When I run it on debug mode, this is what I get. Do you know what it means?

3 Reading checkpoint from chains/hm_3.chk
4 Reading checkpoint from chains/hm_4.chk
1 Reading checkpoint from chains/hm_1.chk
2 Reading checkpoint from chains/hm_2.chk
starting Monte-Carlo
forrtl: error (65): floating invalid
Image PC Routine Line Source
cosmomc_debug 0000000000E0BD39 Unknown Unknown Unknown
cosmomc_debug 0000000000E0A60E Unknown Unknown Unknown
cosmomc_debug 0000000000DA6F12 Unknown Unknown Unknown
cosmomc_debug 0000000000D3F128 Unknown Unknown Unknown
cosmomc_debug 0000000000D47D61 Unknown Unknown Unknown
libpthread.so.0 00007FC1BCBCD7E0 Unknown Unknown Unknown
cosmomc_debug 0000000000E0263B Unknown Unknown Unknown
cosmomc_debug 0000000000BA15A5 nonlinear_mp_tcb_ 747 halofit_ppf.f90
cosmomc_debug 0000000000B9FD59 nonlinear_mp_fill 680 halofit_ppf.f90
cosmomc_debug 0000000000BA1FCE nonlinear_mp_init 797 halofit_ppf.f90
cosmomc_debug 0000000000B99B33 nonlinear_mp_hmco 384 halofit_ppf.f90
cosmomc_debug 0000000000B95965 nonlinear_mp_nonl 98 halofit_ppf.f90
cosmomc_debug 0000000000C4AE24 cambmain_mp_maken 1157 cmbmain.f90
cosmomc_debug 0000000000C26E8F cambmain_mp_cmbma 224 cmbmain.f90
cosmomc_debug 0000000000CB4215 camb_mp_camb_getr 137 camb.f90
cosmomc_debug 0000000000CB3AA4 camb_mp_camb_gett 46 camb.f90
cosmomc_debug 0000000000767B97 calculator_camb_m 191 Calculator_CAMB.f90
cosmomc_debug 000000000075ED11 calclike_cosmolog 77 CalcLike_Cosmology.f90
cosmomc_debug 0000000000A30BA5 calclike_mp_theor 308 calclike.f90
cosmomc_debug 0000000000A2856B calclike_mp_getlo 146 calclike.f90
cosmomc_debug 00000000006CBF7F montecarlo_mp_tsa 94 MCMC.f90
cosmomc_debug 00000000006D161C montecarlo_mp_tme 279 MCMC.f90
cosmomc_debug 00000000006D42CF montecarlo_mp_tfa 353 MCMC.f90
cosmomc_debug 00000000006CD0F7 montecarlo_mp_tch 144 MCMC.f90
cosmomc_debug 0000000000729193 generalsetup_mp_t 137 GeneralSetup.f90
cosmomc_debug 0000000000A47E97 MAIN__ 268 driver.F90
cosmomc_debug 00000000004392A6 Unknown Unknown Unknown
libc.so.6 00007FC1BC848D1D Unknown Unknown Unknown
cosmomc_debug 0000000000439199 Unknown Unknown Unknown
————————————————————————–
mpirun noticed that process rank 2 with PID 11697 on node calx024.ast.cam.ac.uk exited on signal 6 (Aborted).
————————————————————————–


Best
Pablo
Back to top
View user's profile  
Antony Lewis



Joined: 23 Sep 2004
Posts: 1301
Affiliation: University of Sussex

PostPosted: November 08 2016  Reply with quote

Not very helpful, the error in halofit could be something more generally wrong in the power spectra.
Back to top
View user's profile [ Hidden ] Visit poster's website
Display posts from previous:   
Post new topic   Reply to topic    CosmoCoffee Forum Index -> Computers and software All times are GMT + 5 Hours
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © 2001, 2005 phpBB Group. Sponsored by WordWeb online dictionary and dictionary software.