CosmoCoffee Forum Index CosmoCoffee

 
 FAQFAQ   SearchSearch  MemberlistSmartFeed   MemberlistMemberlist    RegisterRegister 
   ProfileProfile   Log inLog in 
Arxiv New Filter | Bookmarks & clubs | Arxiv ref/author:

[astro-ph/0508461] A Nested Sampling Algorithm for Cosmological Model Selection
 
Authors:Pia Mukherjee, David Parkinson, Andrew R. Liddle
Abstract:The abundance of new cosmological data becoming available means that a wider range of cosmological models are testable than ever before. However, an important distinction must be made between parameter fitting and model selection. While parameter fitting simply determines how well a model fits the data, model selection statistics, such as the Bayesian Evidence, are now necessary to choose between these different models, and in particular to assess the need for new parameters. We implement a new evidence algorithm known as nested sampling, which combines accuracy, generality of application and computational feasibility, and apply it to some cosmological datasets and models. We find that a five-parameter model with Harrison-Zel'dovich initial spectrum is currently preferred.
[PDF] [PS] [BibTex] [Bookmark]

Post new topic   Reply to topic    CosmoCoffee Forum Index -> arXiv papers
View previous topic :: View next topic  
Author Message
Antony Lewis



Joined: 23 Sep 2004
Posts: 506
Affiliation: University of Sussex

PostPosted: June 02 2006  Reply with quote

This paper discusses the use of Nested Sampling for cosmological model selection and parameter analysis. This is certainly a neat method, and nice to see it being put to use.

I hope to discuss some of these things at the Sussex conference next week. A couple of things to consider:

A. Intermediate steps in this method require sampling from the prior P(x) conditional on the likelihood L(x) being greater than a particular value Lj. To implement this sampling the authors seem to

1. Construct an ellipse than encloses the current set of samples
2. Expand the ellipse by some constant to allow for the fact that true likelihood contours are not elliptical
3. Importance sample within this expanded ellipse (i.e. take samples until one sample lies in the L(x) > Lj region).

This is in contrast to the original Skilling method where he suggests using MCMC for this sampling step. My worry about the method used here as a general method is the following. In n-dimensions, the volume of a sphere of radius r is ~ rn. If the sphere is expanded by a factor f, the fraction of the new volume contained in the target volume is (1/f)n, i.e. the acceptance rate is approximately 1/fn and the number of likelihood calculations to get the new sample is ~ fn. For n=5, f=1.5 this gives an acceptance rate ~ 0.13, similar to the 20% they report. For n=7, f=1.8 it gives a rate of 0.016. Since fn grows exponentially with dimension, this sampling method would seem to get exponentially bad in high dimensions. i.e. perhaps for n>~8 using MCMC would be a much better idea (though maybe fine for n<=8 considered in this paper). Also expanding the elliptical region is of course not guaranteed to enclose all points in the L(x) > Lj region, so the method is not formally correct, and can only be tested in each case by expanding my a much larger factor (very slow!) and checking for stability of the result.

B. Is it useful to use different 'priors'? e.g.

\int d\theta L(\theta)P(\theta) =\int d\theta \frac{L(\theta) P(\theta)}{P'(\theta)} P'(\theta),

so one could perhaps apply this method sampling from P'(θ) rather than the original P(θ) [and using the appropriately modified likelihood]. i.e. perhaps one could use some approximation to Lβ that can be quickly normalized.
Back to top
View user's profile [ Hidden ] Visit poster's website
Hans Kristian Eriksen



Joined: 25 Sep 2004
Posts: 56
Affiliation: ITA, University of Oslo

PostPosted: June 04 2006  Reply with quote

Antony Lewis wrote:
This is in contrast to the original Skilling method where he suggests using MCMC for this sampling step.


Do you know of any reference on how this actually is done? I've been looking through Skilling's page, and there are a number of papers/tutorials/manuals there describing the idea (which is neatly summarized by Mukherjee et al.), but for some reason I can't find any explicit description of a good sampling algorithm, just general suggestions. And the BayeSys program is a little too massive that I want to spend too much time on figuring out how it's done there. Multi-dimensional Hilbert curves, atom creation and bit swapping sounds complicated.. :-)

Thanks!
Back to top
View user's profile [ Hidden ] Visit poster's website
Antony Lewis



Joined: 23 Sep 2004
Posts: 506
Affiliation: University of Sussex

PostPosted: June 04 2006  Reply with quote

I should think you could use almost anything for simple cases, e.g. just the standard CosmoMC sampler (see the CosmoMC paper, notes and readme for references; basically Metropolis-Hastings with Gaussian jumps in random 2D subspaces).

On the other hand I'm not at all sure how well nested sampling would work on a general distribution. I'd have thought sampling from the sharply-truncated prior could give a very odd disjoint shape to sample from even if the posterior distribution is only weakly multimodal itself - in general very difficult. An advantage of using theromodynamic integration is that it least it is pretty robust to odd distribution shapes.
Back to top
View user's profile [ Hidden ] Visit poster's website
Hans Kristian Eriksen



Joined: 25 Sep 2004
Posts: 56
Affiliation: ITA, University of Oslo

PostPosted: June 05 2006  Reply with quote

Antony Lewis wrote:
I should think you could use almost anything for simple cases, e.g. just the standard CosmoMC sampler (see the CosmoMC paper, notes and readme for references; basically Metropolis-Hastings with Gaussian jumps in random 2D subspaces).


This is the point I guess I don't understand. The proposals are supposed to be drawn from the prior, right? So with a uniform prior, an MCMC chain is basically just a random walk – with the additional constraint that you never go below Llow (= least likely of the N live points). My problem with this is that it seems you would have to make many MC steps to diffuse sufficiently far away from the original point, and for each of these steps you would have to compute a (possibly expensive) likelihood just to make sure you haven't gone outside the allowed region. So my problem is that I don't quite see how an MCMC method is going to be effective..? And as you say, the multi-dimensional spheroid approach of Mukherjee et al. also have problems in high dimensions because of the expansion factor..

Or am I missing something fundamental here..?
Back to top
View user's profile [ Hidden ] Visit poster's website
Antony Lewis



Joined: 23 Sep 2004
Posts: 506
Affiliation: University of Sussex

PostPosted: June 07 2006  Reply with quote

I think that is right - MCMC will certainly involve lots of likelihood evaluations. One could however hope that it would scale with dimension less badly than exponentially.

On the other hand David Mackay pointed out that for a n-D ellipse, almost all the probability volume is around the surface, so it is indeed also very hard for MCMC to find a new sample. He suggested starting chains at one of the just-rejected points - which would probably be valid for some class of unimodal distributions, though certainly not generally.

In general it seems to me that nested sampling is difficult in high dimensions, and may indeed not be better than thermodynamic integration - especially for odd-shaped distributions.
Back to top
View user's profile [ Hidden ] Visit poster's website
Pia Mukherjee



Joined: 02 May 2006
Posts: 3
Affiliation: University of Sussex

PostPosted: June 07 2006  Reply with quote

We tried MCMC-ing (from any one of the remaining points, or from the last rejected point) and found that in general there remained a bias in the answer. In comparison, the sampling from the spheroid scheme was efficient and gave a very accurate and unbiased answer. True, this scheme for finding new points will become less efficient with increasing dimentionality.

As regards comparison with thermodynamic integration (TI), we seemed to require orders of magnitude fewer likelihood evaluations to reach a given final accuracy (based on TI requiring ~107 likelihood evaluations to get to δ(ln E) of 0.1 for a 5 or 6 parameter model. This is based on estimates from Beltran et al. and private communications. Whether TI can be made more efficient, through more tuned thermodynamic scheduling, I dont know.
Back to top
View user's profile  
Andrew Liddle



Joined: 28 Sep 2004
Posts: 18
Affiliation: University of Sussex

PostPosted: June 08 2006  Reply with quote

Hi there,

I think it is pretty generic that MCMC won't work to generate new live points, which is what we found in testing various methods.

The challenge is to sample **uniformly** from the remaining prior, which becomes a small part of the parameter volume as the code progresses. With an MCMC method, the samples will only find (or remain in) the high likelihood region if the sampler is given knowledge of what the likelihood is, but as soon as you do that the sampling stops being uniform and biases the answer.

It would be interesting to explore thermodynamic integration more extensively for large-dimensional problems, where I agree that nested sampling may scale badly. In Beltran et al astro-ph/0501477 we found that Therm Int was very inefficient for cosmological problems, and we subsequently found the same even for simple gaussian likelihoods. But we focussed on one particular annealing schedule and significant optimization may be possible.

best,

Andrew
Back to top
View user's profile   Visit poster's website
Antony Lewis



Joined: 23 Sep 2004
Posts: 506
Affiliation: University of Sussex

PostPosted: June 08 2006  Reply with quote

Andrew Liddle wrote:

The challenge is to sample **uniformly** from the remaining prior, which becomes a small part of the parameter volume as the code progresses. With an MCMC method, the samples will only find (or remain in) the high likelihood region if the sampler is given knowledge of what the likelihood is, but as soon as you do that the sampling stops being uniform and biases the answer.


I'm not sure I understand this. The task is to sample from P*(x), which is equal to P(x) for L(x) > Lj and equal to zero for L(x)<Lj. This is a perfectly well defined function that is flat in the region you want and zero outside (so the chain never moves outside). The likelihood has to be evalutated at each step to see if L(x)>Lj, but the value of the likelihood is not used for anything else. I don't see how this cannot work as long as you have enough MCMC steps that it converges? (which, admittedly, may be a lot!)

David MacKay also suggested reading about multicanonical sampling - this looks potentially interesting (e.g. I found cond-mat/9909236), esp. for multi-modal distributions.
Back to top
View user's profile [ Hidden ] Visit poster's website
Andrew Liddle



Joined: 28 Sep 2004
Posts: 18
Affiliation: University of Sussex

PostPosted: June 11 2006  Reply with quote

Dear Antony,

Yes, you're right. That should work but the number of MCMC steps needed to converge might indeed be quite large. Worth testing.
Your proposal is somewhat different from the failed MCMC attempts we tried.

best,

Andrew
Back to top
View user's profile   Visit poster's website
Display posts from previous:   
Post new topic   Reply to topic    CosmoCoffee Forum Index -> arXiv papers All times are GMT + 5 Hours
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © 2001, 2005 phpBB Group. Sponsored by WordWeb online dictionary and dictionary software.