[astro-ph/0508461] A Nested Sampling Algorithm for Cosmologi
Posted: June 02 2006
This paper discusses the use of Nested Sampling for cosmological model selection and parameter analysis. This is certainly a neat method, and nice to see it being put to use.
I hope to discuss some of these things at the Sussex conference next week. A couple of things to consider:
A. Intermediate steps in this method require sampling from the prior P(x) conditional on the likelihood L(x) being greater than a particular value L_j. To implement this sampling the authors seem to
1. Construct an ellipse than encloses the current set of samples
2. Expand the ellipse by some constant to allow for the fact that true likelihood contours are not elliptical
3. Importance sample within this expanded ellipse (i.e. take samples until one sample lies in the L(x) > L_j region).
This is in contrast to the original Skilling method where he suggests using MCMC for this sampling step. My worry about the method used here as a general method is the following. In n-dimensions, the volume of a sphere of radius r is ~ r^n. If the sphere is expanded by a factor f, the fraction of the new volume contained in the target volume is (1/f)^n, i.e. the acceptance rate is approximately 1/f^n and the number of likelihood calculations to get the new sample is ~ f^n. For n=5, f=1.5 this gives an acceptance rate ~ 0.13, similar to the 20% they report. For n=7, f=1.8 it gives a rate of 0.016. Since f^n grows exponentially with dimension, this sampling method would seem to get exponentially bad in high dimensions. i.e. perhaps for n>~8 using MCMC would be a much better idea (though maybe fine for n<=8 considered in this paper). Also expanding the elliptical region is of course not guaranteed to enclose all points in the L(x) > L_j region, so the method is not formally correct, and can only be tested in each case by expanding my a much larger factor (very slow!) and checking for stability of the result.
B. Is it useful to use different 'priors'? e.g.
[tex]
\int d\theta L(\theta)P(\theta) =\int d\theta \frac{L(\theta) P(\theta)}{P'(\theta)} P'(\theta),
[/tex]
so one could perhaps apply this method sampling from P'(\theta) rather than the original P(\theta) [and using the appropriately modified likelihood]. i.e. perhaps one could use some approximation to L^{\beta} that can be quickly normalized.
I hope to discuss some of these things at the Sussex conference next week. A couple of things to consider:
A. Intermediate steps in this method require sampling from the prior P(x) conditional on the likelihood L(x) being greater than a particular value L_j. To implement this sampling the authors seem to
1. Construct an ellipse than encloses the current set of samples
2. Expand the ellipse by some constant to allow for the fact that true likelihood contours are not elliptical
3. Importance sample within this expanded ellipse (i.e. take samples until one sample lies in the L(x) > L_j region).
This is in contrast to the original Skilling method where he suggests using MCMC for this sampling step. My worry about the method used here as a general method is the following. In n-dimensions, the volume of a sphere of radius r is ~ r^n. If the sphere is expanded by a factor f, the fraction of the new volume contained in the target volume is (1/f)^n, i.e. the acceptance rate is approximately 1/f^n and the number of likelihood calculations to get the new sample is ~ f^n. For n=5, f=1.5 this gives an acceptance rate ~ 0.13, similar to the 20% they report. For n=7, f=1.8 it gives a rate of 0.016. Since f^n grows exponentially with dimension, this sampling method would seem to get exponentially bad in high dimensions. i.e. perhaps for n>~8 using MCMC would be a much better idea (though maybe fine for n<=8 considered in this paper). Also expanding the elliptical region is of course not guaranteed to enclose all points in the L(x) > L_j region, so the method is not formally correct, and can only be tested in each case by expanding my a much larger factor (very slow!) and checking for stability of the result.
B. Is it useful to use different 'priors'? e.g.
[tex]
\int d\theta L(\theta)P(\theta) =\int d\theta \frac{L(\theta) P(\theta)}{P'(\theta)} P'(\theta),
[/tex]
so one could perhaps apply this method sampling from P'(\theta) rather than the original P(\theta) [and using the appropriately modified likelihood]. i.e. perhaps one could use some approximation to L^{\beta} that can be quickly normalized.