CosmoCoffee Forum Index CosmoCoffee

 
 FAQFAQ   SearchSearch  MemberlistSmartFeed   MemberlistMemberlist    RegisterRegister 
   ProfileProfile   Log inLog in 
Arxiv New Filter | Bookmarks & clubs | Arxiv ref/author:

SegFault in CosmoMC
 
Post new topic   Reply to topic    CosmoCoffee Forum Index -> Computers and software
View previous topic :: View next topic  
Author Message
Vinicius Miranda



Joined: 20 Aug 2014
Posts: 9
Affiliation: Upenn

PostPosted: October 13 2016  Reply with quote

Dear Prof Lewis,

My name is Vinicius Miranda, and I am facing the following issue in CosmoMC

Code:
forrtl: severe (173): A pointer passed to DEALLOCATE points to an object that cannot be deallocated
Image              PC                Routine            Line        Source             
cosmomc_debug      0000000000D702E2  Unknown               Unknown  Unknown
cosmomc_debug      00000000004749A4  objectlists_mp_fr         272  ObjectLists.f90
cosmomc_debug      0000000000483579  objectlists_mp_th         494  ObjectLists.f90
cosmomc_debug      00000000007192C3  samplecollector_m         302  SampleCollector.f90
cosmomc_debug      000000000072029A  samplecollector_m         446  SampleCollector.f90
cosmomc_debug      00000000006CF30D  montecarlo_mp_tch         148  MCMC.f90
cosmomc_debug      000000000072F763  generalsetup_mp_t         137  GeneralSetup.f90
cosmomc_debug      00000000009E945E  MAIN__                    292  driver.F90
cosmomc_debug      00000000004129DE  Unknown               Unknown  Unknown
libc.so.6          00007F0FB9FCDD5D  Unknown               Unknown  Unknown
cosmomc_debug      0000000000412869  Unknown               Unknown  Unknown


I am using the latest CosmoMC - modified to include extra reionization and inflation parameters - so it is possible that I introduced a bug somewhere, somehow that caused this. However, I never faced a similar problem, and so I am created this thread to ask for some guidance. My chains are very long with a very large number of parameters - and it took almost two weeks of continuous run for this bug to appear for the first time. Now when I restart the same chain, the bug appears in a matter of minutes. Non-linear lensing is off and the semi-fast sampler is on. The likelihood is Planck non-binned 2015 TT only both at high-l and low-l (the chains with other Planck likelihood choices- including polarization - have not had this problem but they have not been running for that long). I can send more information if you need.

Thanks a lot in advance.
Best Regards
Vinicius Miranda
Back to top
View user's profile   Visit poster's website
Vinicius Miranda



Joined: 20 Aug 2014
Posts: 9
Affiliation: Upenn

PostPosted: October 14 2016  Reply with quote

The problem seems to be related with this piece of code at SampleCollector.f90, subroutine TMpiChainCollector_UpdateCovAndCheckConverge. All my chains had errors around the same execution time which is compatible with the common requisite of Count > 500000. As a workaround I increased this number, but given that my chains will run for quite some time - at some point I will run out of memory.

Code:
 if (this%Samples%Count > 500000) then
   !Try not to blow memory by storing too many samples
   call this%Samples%Thin(2)
   this%Mpi%MPI_thin_fac = this%Mpi%MPI_thin_fac*2
 end if
Back to top
View user's profile   Visit poster's website
Antony Lewis



Joined: 23 Sep 2004
Posts: 1237
Affiliation: University of Sussex

PostPosted: October 15 2016  Reply with quote

This may be related to some ifort quirks, e.g.

https://software.intel.com/en-us/forums/intel-fortran-compiler-for-linux-and-mac-os-x/topic/390944

(note you should rarely need to get that many samples, usually there's an issue if it doesn't converge with many fewer)
Back to top
View user's profile [ Hidden ] Visit poster's website
Vinicius Miranda



Joined: 20 Aug 2014
Posts: 9
Affiliation: Upenn

PostPosted: October 15 2016  Reply with quote

Thank you Anthony.

There is a good reason why I need that many samples that is related to the Principal Components method I used here (https://arxiv.org/abs/1609.04788) and here (https://arxiv.org/abs/1411.5956).

So the answer is we need intel to fix the bug?
Back to top
View user's profile   Visit poster's website
Antony Lewis



Joined: 23 Sep 2004
Posts: 1237
Affiliation: University of Sussex

PostPosted: October 15 2016  Reply with quote

You can just increase the 500000 number. In practice you're not likely to run out of memory.
Back to top
View user's profile [ Hidden ] Visit poster's website
Vinicius Miranda



Joined: 20 Aug 2014
Posts: 9
Affiliation: Upenn

PostPosted: October 17 2016  Reply with quote

Thank you.
Back to top
View user's profile   Visit poster's website
Display posts from previous:   
Post new topic   Reply to topic    CosmoCoffee Forum Index -> Computers and software All times are GMT + 5 Hours
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © 2001, 2005 phpBB Group. Sponsored by WordWeb online dictionary and dictionary software.