[astroph/0508461] A Nested Sampling Algorithm for Cosmological Model Selection
Authors:  Pia Mukherjee, David Parkinson, Andrew R. Liddle 
Abstract:  The abundance of new cosmological data becoming available means that a wider range of cosmological models are testable than ever before. However, an important distinction must be made between parameter fitting and model selection. While parameter fitting simply determines how well a model fits the data, model selection statistics, such as the Bayesian Evidence, are now necessary to choose between these different models, and in particular to assess the need for new parameters. We implement a new evidence algorithm known as nested sampling, which combines accuracy, generality of application and computational feasibility, and apply it to some cosmological datasets and models. We find that a fiveparameter model with HarrisonZel'dovich initial spectrum is currently preferred. 
[PDF] [PS] [BibTex] [Bookmark] 

 Posts: 1609
 Joined: September 23 2004
 Affiliation: University of Sussex
 Contact:
[astroph/0508461] A Nested Sampling Algorithm for Cosmologi
This paper discusses the use of Nested Sampling for cosmological model selection and parameter analysis. This is certainly a neat method, and nice to see it being put to use.
I hope to discuss some of these things at the Sussex conference next week. A couple of things to consider:
A. Intermediate steps in this method require sampling from the prior P(x) conditional on the likelihood L(x) being greater than a particular value L_j. To implement this sampling the authors seem to
1. Construct an ellipse than encloses the current set of samples
2. Expand the ellipse by some constant to allow for the fact that true likelihood contours are not elliptical
3. Importance sample within this expanded ellipse (i.e. take samples until one sample lies in the L(x) > L_j region).
This is in contrast to the original Skilling method where he suggests using MCMC for this sampling step. My worry about the method used here as a general method is the following. In ndimensions, the volume of a sphere of radius r is ~ r^n. If the sphere is expanded by a factor f, the fraction of the new volume contained in the target volume is (1/f)^n, i.e. the acceptance rate is approximately 1/f^n and the number of likelihood calculations to get the new sample is ~ f^n. For n=5, f=1.5 this gives an acceptance rate ~ 0.13, similar to the 20% they report. For n=7, f=1.8 it gives a rate of 0.016. Since f^n grows exponentially with dimension, this sampling method would seem to get exponentially bad in high dimensions. i.e. perhaps for n>~8 using MCMC would be a much better idea (though maybe fine for n<=8 considered in this paper). Also expanding the elliptical region is of course not guaranteed to enclose all points in the L(x) > L_j region, so the method is not formally correct, and can only be tested in each case by expanding my a much larger factor (very slow!) and checking for stability of the result.
B. Is it useful to use different 'priors'? e.g.
[tex]
\int d\theta L(\theta)P(\theta) =\int d\theta \frac{L(\theta) P(\theta)}{P'(\theta)} P'(\theta),
[/tex]
so one could perhaps apply this method sampling from P'(\theta) rather than the original P(\theta) [and using the appropriately modified likelihood]. i.e. perhaps one could use some approximation to L^{\beta} that can be quickly normalized.
I hope to discuss some of these things at the Sussex conference next week. A couple of things to consider:
A. Intermediate steps in this method require sampling from the prior P(x) conditional on the likelihood L(x) being greater than a particular value L_j. To implement this sampling the authors seem to
1. Construct an ellipse than encloses the current set of samples
2. Expand the ellipse by some constant to allow for the fact that true likelihood contours are not elliptical
3. Importance sample within this expanded ellipse (i.e. take samples until one sample lies in the L(x) > L_j region).
This is in contrast to the original Skilling method where he suggests using MCMC for this sampling step. My worry about the method used here as a general method is the following. In ndimensions, the volume of a sphere of radius r is ~ r^n. If the sphere is expanded by a factor f, the fraction of the new volume contained in the target volume is (1/f)^n, i.e. the acceptance rate is approximately 1/f^n and the number of likelihood calculations to get the new sample is ~ f^n. For n=5, f=1.5 this gives an acceptance rate ~ 0.13, similar to the 20% they report. For n=7, f=1.8 it gives a rate of 0.016. Since f^n grows exponentially with dimension, this sampling method would seem to get exponentially bad in high dimensions. i.e. perhaps for n>~8 using MCMC would be a much better idea (though maybe fine for n<=8 considered in this paper). Also expanding the elliptical region is of course not guaranteed to enclose all points in the L(x) > L_j region, so the method is not formally correct, and can only be tested in each case by expanding my a much larger factor (very slow!) and checking for stability of the result.
B. Is it useful to use different 'priors'? e.g.
[tex]
\int d\theta L(\theta)P(\theta) =\int d\theta \frac{L(\theta) P(\theta)}{P'(\theta)} P'(\theta),
[/tex]
so one could perhaps apply this method sampling from P'(\theta) rather than the original P(\theta) [and using the appropriately modified likelihood]. i.e. perhaps one could use some approximation to L^{\beta} that can be quickly normalized.

 Posts: 60
 Joined: September 25 2004
 Affiliation: ITA, University of Oslo
 Contact:
Re: [astroph/0508461] A Nested Sampling Algorithm for Cosmo
Do you know of any reference on how this actually is done? I've been looking through Skilling's page, and there are a number of papers/tutorials/manuals there describing the idea (which is neatly summarized by Mukherjee et al.), but for some reason I can't find any explicit description of a good sampling algorithm, just general suggestions. And the BayeSys program is a little too massive that I want to spend too much time on figuring out how it's done there. Multidimensional Hilbert curves, atom creation and bit swapping sounds complicated.. :)Antony Lewis wrote:This is in contrast to the original Skilling method where he suggests using MCMC for this sampling step.
Thanks!

 Posts: 1609
 Joined: September 23 2004
 Affiliation: University of Sussex
 Contact:
Re: [astroph/0508461] A Nested Sampling Algorithm for Cosmo
I should think you could use almost anything for simple cases, e.g. just the standard CosmoMC sampler (see the CosmoMC paper, notes and readme for references; basically MetropolisHastings with Gaussian jumps in random 2D subspaces).
On the other hand I'm not at all sure how well nested sampling would work on a general distribution. I'd have thought sampling from the sharplytruncated prior could give a very odd disjoint shape to sample from even if the posterior distribution is only weakly multimodal itself  in general very difficult. An advantage of using theromodynamic integration is that it least it is pretty robust to odd distribution shapes.
On the other hand I'm not at all sure how well nested sampling would work on a general distribution. I'd have thought sampling from the sharplytruncated prior could give a very odd disjoint shape to sample from even if the posterior distribution is only weakly multimodal itself  in general very difficult. An advantage of using theromodynamic integration is that it least it is pretty robust to odd distribution shapes.

 Posts: 60
 Joined: September 25 2004
 Affiliation: ITA, University of Oslo
 Contact:
Re: [astroph/0508461] A Nested Sampling Algorithm for Cosmo
This is the point I guess I don't understand. The proposals are supposed to be drawn from the prior, right? So with a uniform prior, an MCMC chain is basically just a random walk  with the additional constraint that you never go below L_low (= least likely of the N live points). My problem with this is that it seems you would have to make many MC steps to diffuse sufficiently far away from the original point, and for each of these steps you would have to compute a (possibly expensive) likelihood just to make sure you haven't gone outside the allowed region. So my problem is that I don't quite see how an MCMC method is going to be effective..? And as you say, the multidimensional spheroid approach of Mukherjee et al. also have problems in high dimensions because of the expansion factor..Antony Lewis wrote:I should think you could use almost anything for simple cases, e.g. just the standard CosmoMC sampler (see the CosmoMC paper, notes and readme for references; basically MetropolisHastings with Gaussian jumps in random 2D subspaces).
Or am I missing something fundamental here..?

 Posts: 1609
 Joined: September 23 2004
 Affiliation: University of Sussex
 Contact:
Re: [astroph/0508461] A Nested Sampling Algorithm for Cosmo
I think that is right  MCMC will certainly involve lots of likelihood evaluations. One could however hope that it would scale with dimension less badly than exponentially.
On the other hand David Mackay pointed out that for a nD ellipse, almost all the probability volume is around the surface, so it is indeed also very hard for MCMC to find a new sample. He suggested starting chains at one of the justrejected points  which would probably be valid for some class of unimodal distributions, though certainly not generally.
In general it seems to me that nested sampling is difficult in high dimensions, and may indeed not be better than thermodynamic integration  especially for oddshaped distributions.
On the other hand David Mackay pointed out that for a nD ellipse, almost all the probability volume is around the surface, so it is indeed also very hard for MCMC to find a new sample. He suggested starting chains at one of the justrejected points  which would probably be valid for some class of unimodal distributions, though certainly not generally.
In general it seems to me that nested sampling is difficult in high dimensions, and may indeed not be better than thermodynamic integration  especially for oddshaped distributions.

 Posts: 3
 Joined: May 02 2006
 Affiliation: University of Sussex
[astroph/0508461] A Nested Sampling Algorithm for Cosmologi
We tried MCMCing (from any one of the remaining points, or from the last rejected point) and found that in general there remained a bias in the answer. In comparison, the sampling from the spheroid scheme was efficient and gave a very accurate and unbiased answer. True, this scheme for finding new points will become less efficient with increasing dimentionality.
As regards comparison with thermodynamic integration (TI), we seemed to require orders of magnitude fewer likelihood evaluations to reach a given final accuracy (based on TI requiring ~10^7 likelihood evaluations to get to \delta(ln E) of 0.1 for a 5 or 6 parameter model. This is based on estimates from Beltran et al. and private communications. Whether TI can be made more efficient, through more tuned thermodynamic scheduling, I dont know.
As regards comparison with thermodynamic integration (TI), we seemed to require orders of magnitude fewer likelihood evaluations to reach a given final accuracy (based on TI requiring ~10^7 likelihood evaluations to get to \delta(ln E) of 0.1 for a 5 or 6 parameter model. This is based on estimates from Beltran et al. and private communications. Whether TI can be made more efficient, through more tuned thermodynamic scheduling, I dont know.

 Posts: 21
 Joined: September 28 2004
 Affiliation: University of Edinburgh
 Contact:
[astroph/0508461] A Nested Sampling Algorithm for Cosmologi
Hi there,
I think it is pretty generic that MCMC won't work to generate new live points, which is what we found in testing various methods.
The challenge is to sample **uniformly** from the remaining prior, which becomes a small part of the parameter volume as the code progresses. With an MCMC method, the samples will only find (or remain in) the high likelihood region if the sampler is given knowledge of what the likelihood is, but as soon as you do that the sampling stops being uniform and biases the answer.
It would be interesting to explore thermodynamic integration more extensively for largedimensional problems, where I agree that nested sampling may scale badly. In Beltran et al astroph/0501477 we found that Therm Int was very inefficient for cosmological problems, and we subsequently found the same even for simple gaussian likelihoods. But we focussed on one particular annealing schedule and significant optimization may be possible.
best,
Andrew
I think it is pretty generic that MCMC won't work to generate new live points, which is what we found in testing various methods.
The challenge is to sample **uniformly** from the remaining prior, which becomes a small part of the parameter volume as the code progresses. With an MCMC method, the samples will only find (or remain in) the high likelihood region if the sampler is given knowledge of what the likelihood is, but as soon as you do that the sampling stops being uniform and biases the answer.
It would be interesting to explore thermodynamic integration more extensively for largedimensional problems, where I agree that nested sampling may scale badly. In Beltran et al astroph/0501477 we found that Therm Int was very inefficient for cosmological problems, and we subsequently found the same even for simple gaussian likelihoods. But we focussed on one particular annealing schedule and significant optimization may be possible.
best,
Andrew

 Posts: 1609
 Joined: September 23 2004
 Affiliation: University of Sussex
 Contact:
Re: [astroph/0508461] A Nested Sampling Algorithm for Cosmo
I'm not sure I understand this. The task is to sample from P*(x), which is equal to P(x) for L(x) > L_j and equal to zero for L(x)<L_j. This is a perfectly well defined function that is flat in the region you want and zero outside (so the chain never moves outside). The likelihood has to be evalutated at each step to see if L(x)>L_j, but the value of the likelihood is not used for anything else. I don't see how this cannot work as long as you have enough MCMC steps that it converges? (which, admittedly, may be a lot!)Andrew Liddle wrote: The challenge is to sample **uniformly** from the remaining prior, which becomes a small part of the parameter volume as the code progresses. With an MCMC method, the samples will only find (or remain in) the high likelihood region if the sampler is given knowledge of what the likelihood is, but as soon as you do that the sampling stops being uniform and biases the answer.
David MacKay also suggested reading about multicanonical sampling  this looks potentially interesting (e.g. I found condmat/9909236), esp. for multimodal distributions.

 Posts: 21
 Joined: September 28 2004
 Affiliation: University of Edinburgh
 Contact:
[astroph/0508461] A Nested Sampling Algorithm for Cosmologi
Dear Antony,
Yes, you're right. That should work but the number of MCMC steps needed to converge might indeed be quite large. Worth testing.
Your proposal is somewhat different from the failed MCMC attempts we tried.
best,
Andrew
Yes, you're right. That should work but the number of MCMC steps needed to converge might indeed be quite large. Worth testing.
Your proposal is somewhat different from the failed MCMC attempts we tried.
best,
Andrew