CosmoMC: On the minimun \Chi^2 and priors etc.

 Posts: 70
 Joined: January 04 2005
 Affiliation: ICG, Portsmouth
 Contact:
CosmoMC: On the minimun \\Chi^2 and priors etc.
Hi:
I am a beginner of MCMC and encounter several questions in fitting Dark Energy to current data as follows:
1) On the minimun \chi^2 , I have calculated it using action =0 and 2 in the params.ini of CosmoMC and set others as default . Using WMAP and SN data (Age and HST priors included) , parametrizing DE equation of state (EoS) as astroph/0407372 : w=w0+w1*(a1)+w2*(a1)^2 , I got the minimun \chi^2 /2= 805 and 810 respectively .Given MCMC cannot propose accurate \Chi^2 for best fit sample , the fact that the minimun \chi^2 /2 (action =0) being smaller than minimun \chi^2 /2 (action =2) is still puzzling for me . (I had supposed the former should be larger whatever)
2) On the prior , as for large EoS of DE in the early epoch it is unphysical since DE would dominate the universe .I hence used such a prior as :
When the total EOS of DE is larger than 0.05 today I will set the whole EOS as a constant 0.05 instead(this is somewhat optimistic but maybe applicable), my question is that would such a prior spoil the whole sampling of MCMC due to some unusual samplings around this point (w=0.05) and give an utterly different and wrong result? And which prior is more appropriate?
3) On the convergence level. Is it that for R1<0.05 the (multi)chains must have converged and there are anyway no exceptions?
4)On the likelihood contour. There seems to be different claims on the likelihood contour of MCMC and I suppose two of them are very typical: one is astroph/0310723 where the way of using the contours out are exactly like(at least the 2\sigma region in this way) those of marginalization with the conventional grids. The other is astroph/0406608 in page 5 of the eprint. The authors seemed to have used something like the coverage probability to work the 1\sigma and 2\sigma regions out where the \delta\chi^2 is very strange and this often corresponds to some nonintegral degrees of freedom. So what a method has COSMOMC used? The latter?
Thanks a lot.
Best,
GongBo Zhao
I am a beginner of MCMC and encounter several questions in fitting Dark Energy to current data as follows:
1) On the minimun \chi^2 , I have calculated it using action =0 and 2 in the params.ini of CosmoMC and set others as default . Using WMAP and SN data (Age and HST priors included) , parametrizing DE equation of state (EoS) as astroph/0407372 : w=w0+w1*(a1)+w2*(a1)^2 , I got the minimun \chi^2 /2= 805 and 810 respectively .Given MCMC cannot propose accurate \Chi^2 for best fit sample , the fact that the minimun \chi^2 /2 (action =0) being smaller than minimun \chi^2 /2 (action =2) is still puzzling for me . (I had supposed the former should be larger whatever)
2) On the prior , as for large EoS of DE in the early epoch it is unphysical since DE would dominate the universe .I hence used such a prior as :
When the total EOS of DE is larger than 0.05 today I will set the whole EOS as a constant 0.05 instead(this is somewhat optimistic but maybe applicable), my question is that would such a prior spoil the whole sampling of MCMC due to some unusual samplings around this point (w=0.05) and give an utterly different and wrong result? And which prior is more appropriate?
3) On the convergence level. Is it that for R1<0.05 the (multi)chains must have converged and there are anyway no exceptions?
4)On the likelihood contour. There seems to be different claims on the likelihood contour of MCMC and I suppose two of them are very typical: one is astroph/0310723 where the way of using the contours out are exactly like(at least the 2\sigma region in this way) those of marginalization with the conventional grids. The other is astroph/0406608 in page 5 of the eprint. The authors seemed to have used something like the coverage probability to work the 1\sigma and 2\sigma regions out where the \delta\chi^2 is very strange and this often corresponds to some nonintegral degrees of freedom. So what a method has COSMOMC used? The latter?
Thanks a lot.
Best,
GongBo Zhao

 Posts: 6
 Joined: September 01 2005
 Affiliation: University of Sydney
CosmoMC: On the minimun \\Chi^2 and priors etc.
I'd be interested in how you went about modifying CosmoMC to allow variation in the EOS. Did you modify the functions in CAMB such as dtauda and rombint to allow a varying w, or did you replace them with your own functions?

 Posts: 1618
 Joined: September 23 2004
 Affiliation: University of Sussex
 Contact:
Re: CosmoMC: On the minimun \\\\Chi^2 and priors etc.
I'm not sure what's going on in your first question.
Imposing a prior should be OK. You can set the limitsxxx= setting in the distparams.ini file so that GetDist knows about the prior (see the ReadMe). The chains will probably avoid this region of parameter space anyway.gongbo zhao wrote:2) On the prior , as for large EoS of DE in the early epoch it is unphysical since DE would dominate the universe .I hence used such a prior as :
When the total EOS of DE is larger than 0.05 today I will set the whole EOS as a constant 0.05 instead(this is somewhat optimistic but maybe applicable), my question is that would such a prior spoil the whole sampling of MCMC due to some unusual samplings around this point (w=0.05) and give an utterly different and wrong result? And which prior is more appropriate?
It is neccessary but not sufficient. In general there are no sufficient tests for convergence. In practice chains with R1<0.05 are usually fine as long as the posterior is unimodal (or the chains have found all the modes if it is multimodal).gongbo zhao wrote: 3) On the convergence level. Is it that for R1<0.05 the (multi)chains must have converged and there are anyway no exceptions?
2D solid contours plotted by GetDist enclose a given about of maginalized probability. 1D solid contours are marginalized probability. You can also plot 1D dotted contours: these are mean likelihoods (see the cosmomc paper appendices for discussion). The .margestats file produced contains the means, standard deviations, and marginalized confidence limits.gongbo zhao wrote: 4)On the likelihood contour. There seems to be different claims on the likelihood contour of MCMC and I suppose two of them are very typical: one is astroph/0310723 where the way of using the contours out are exactly like(at least the 2\sigma region in this way) those of marginalization with the conventional grids. The other is astroph/0406608 in page 5 of the eprint. The authors seemed to have used something like the coverage probability to work the 1\sigma and 2\sigma regions out where the \delta\chi^2 is very strange and this often corresponds to some nonintegral degrees of freedom. So what a method has COSMOMC used? The latter?

 Posts: 70
 Joined: January 04 2005
 Affiliation: ICG, Portsmouth
 Contact:
CosmoMC: On the minimun \\Chi^2 and priors etc.
Hi,Antony:
On the contour plot of Dark Energy parameters , say , the w0 , w1 of the Linderlike parametrization , I just want to enlarge the parameter space to unphysical area , ie , w(a) > 0.05 at early epoch , to make the plot nicer . To do this , I have to modify the equations.f90 in CAMB to allow the calculation when w(a) goes unphysical . I set w(a) = 0.05 when w > 0.05 . Now CosmoMC can run smoothly given arbitrary w0 and w1 without any divergence. However I am not sure whether my method is correct . Will the condition I set bias the MCMC sampling results ? Thanks again.
Cheers,
Gongbo Zhao
On the contour plot of Dark Energy parameters , say , the w0 , w1 of the Linderlike parametrization , I just want to enlarge the parameter space to unphysical area , ie , w(a) > 0.05 at early epoch , to make the plot nicer . To do this , I have to modify the equations.f90 in CAMB to allow the calculation when w(a) goes unphysical . I set w(a) = 0.05 when w > 0.05 . Now CosmoMC can run smoothly given arbitrary w0 and w1 without any divergence. However I am not sure whether my method is correct . Will the condition I set bias the MCMC sampling results ? Thanks again.
Cheers,
Gongbo Zhao

 Posts: 1618
 Joined: September 23 2004
 Affiliation: University of Sussex
 Contact:
Re: CosmoMC: On the minimun \\\\Chi^2 and priors etc.
If there is nonzero probability in the region you are changing, then yes it will bias the results. If on the other hand these models always have low likelihood (what I'd expect) and so are rejected it shouldn't make any difference.
I would think the contours cut off before 0.05. Maybe try increasing the number of bins when running GetDist to improve your resolution near w=0. c.f. the plots in astroph/0409574.
I would think the contours cut off before 0.05. Maybe try increasing the number of bins when running GetDist to improve your resolution near w=0. c.f. the plots in astroph/0409574.

 Posts: 41
 Joined: November 22 2004
 Affiliation: ITP Heidelberg
 Contact:
CosmoMC: On the minimun \\Chi^2 and priors etc.
Hi,
there's nothing "unphysical" about w=0, not
even about w=1/3, which are the tracker solutions of an exponential potential during radiation domination.
What you do have with this parametrization, however, is that most of the time, having w=0 will lead to too much dark energy at early times in this parametrization.
But I can easily manufacture a model with w=0 at early times in this parametrization
that is perfectly fine, at least if I'm allowed to cross the phantom line (I'll go backwards, i.e. to from z=0 to z=infinity):
Take w_0 = 4, say. Omega_dark_0 = 0.7, that will take you down swiftly when you go to higher redshifts. From that low Omega_dark level, you can then sustain it with w=0 up to high redshfits. From RD on upwards, dark energy will get unimportant, just like matter.
So voila, even in this parametrization, you can have w=0 at high redshifts.
there's nothing "unphysical" about w=0, not
even about w=1/3, which are the tracker solutions of an exponential potential during radiation domination.
What you do have with this parametrization, however, is that most of the time, having w=0 will lead to too much dark energy at early times in this parametrization.
But I can easily manufacture a model with w=0 at early times in this parametrization
that is perfectly fine, at least if I'm allowed to cross the phantom line (I'll go backwards, i.e. to from z=0 to z=infinity):
Take w_0 = 4, say. Omega_dark_0 = 0.7, that will take you down swiftly when you go to higher redshifts. From that low Omega_dark level, you can then sustain it with w=0 up to high redshfits. From RD on upwards, dark energy will get unimportant, just like matter.
So voila, even in this parametrization, you can have w=0 at high redshifts.

 Posts: 11
 Joined: January 07 2005
 Affiliation: Research Center for the Early Universe, University of Tokyo
Re: CosmoMC: On the minimun \\\\Chi^2 and priors etc.
Michael Doran wrote:Hi,
there's nothing "unphysical" about w=0, not
even about w=1/3, which are the tracker solutions of an exponential potential during radiation domination.
What you do have with this parametrization, however, is that most of the time, having w=0 will lead to too much dark energy at early times in this parametrization.
But I can easily manufacture a model with w=0 at early times in this parametrization
that is perfectly fine, at least if I'm allowed to cross the phantom line (I'll go backwards, i.e. to from z=0 to z=infinity):
Take w_0 = 4, say. Omega_dark_0 = 0.7, that will take you down swiftly when you go to higher redshifts. From that low Omega_dark level, you can then sustain it with w=0 up to high redshfits. From RD on upwards, dark energy will get unimportant, just like matter.
So voila, even in this parametrization, you can have w=0 at high redshifts.
Hi Michael,
I do agree with you that 'there's nothing "unphysical" about w=0', anyway it is somewhat tricky for DE parametrizations. Actually I believe you would agree that in the fittings for a first step we do not care for the "unphysical" aspects, in practice we just need to ensure that DE should not dominate the early universe or the program works well with no divergence..........But I have been informed that the code of dverk does not work somewhat BEFORE(asympotically to) the violation of the condition, this is what I cannot understant and I believe, also why the authors in astroph/0507170 and astroph/0411803 used some priors which would somewhat lead to the bias.......

 Posts: 11
 Joined: January 07 2005
 Affiliation: Research Center for the Early Universe, University of Tokyo
Re: CosmoMC: On the minimun \\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\
gongbo zhao wrote: 4)On the likelihood contour. There seems to be different claims on the likelihood contour of MCMC and I suppose two of them are very typical: one is astroph/0310723 where the way of using the contours out are exactly like(at least the 2\sigma region in this way) those of marginalization with the conventional grids. The other is astroph/0406608 in page 5 of the eprint. The authors seemed to have used something like the coverage probability to work the 1\sigma and 2\sigma regions out where the \delta\chi^2 is very strange and this often corresponds to some nonintegral degrees of freedom. So what a method has COSMOMC used? The latter?
Hi Antony,
2D solid contours plotted by GetDist enclose a given about of maginalized probability. 1D solid contours are marginalized probability. You can also plot 1D dotted contours: these are mean likelihoods (see the cosmomc paper appendices for discussion). The .margestats file produced contains the means, standard deviations, and marginalized confidence limits.
Anyway it seems to me that the way of paper astroph/0406608 is different from the conventional ways. For example in astroph/0310723 for the 2d contours, the author just quoted a region where there is an increase of \Delta\chi^2=6.18, this is the conventional way and similar to those defined in the case of grids, somewhat assuming a gaussian distribution. However in astroph/0406608 the increase of \chi^2 is much larger than 6.18 and hence, I suppose the authors have been counting the coverage probability, where 95 \% of the data are around the best fit...... I am not sure exactly and would like to see your comments on astroph/0406608, thanks!
The reason I care for this so much is that personally I think astroph/0406608 is more physical, however seems several papers are using the conventional way in MCMC, just like astroph/0301723............

 Posts: 41
 Joined: November 22 2004
 Affiliation: ITP Heidelberg
 Contact:
CosmoMC: On the minimun \\Chi^2 and priors etc.
Hi,
in the \Delta chi^2 vs. coverage question, I take the side of \Delta chi^2. The reason is the following: when you have a parameter, that is not too well constrained by the experiment, but still a bit. So as the distribution, picture a slightly tilted almost horizontal line. Then by counting the area under the curve, you'd conclude that the parameter range with the slightly better \chi^2 is the "preferred" one, excluding the lower (but almost as good) \chi^2 region of this parameter.
Using \Delta \chi^2, you would find that indeed the parameter is not well constrained and that all values are essentially equally likely.
As both methods agree well in almost Gaussian cases, I prefer the \Delta \chi^2.
On the "dverk" issue: I don't know what the situation with CAMB is, but in CMBFAST, some things are "hard"coded. I.e., there are some (sometimes nonobvious) assumptions about
cosmology. Recfast, for instance is to large degree
hardcoded. So even if dverk and cmbfast ran, it would still be useful to check whether the result is "sensibel" given the model.
in the \Delta chi^2 vs. coverage question, I take the side of \Delta chi^2. The reason is the following: when you have a parameter, that is not too well constrained by the experiment, but still a bit. So as the distribution, picture a slightly tilted almost horizontal line. Then by counting the area under the curve, you'd conclude that the parameter range with the slightly better \chi^2 is the "preferred" one, excluding the lower (but almost as good) \chi^2 region of this parameter.
Using \Delta \chi^2, you would find that indeed the parameter is not well constrained and that all values are essentially equally likely.
As both methods agree well in almost Gaussian cases, I prefer the \Delta \chi^2.
On the "dverk" issue: I don't know what the situation with CAMB is, but in CMBFAST, some things are "hard"coded. I.e., there are some (sometimes nonobvious) assumptions about
cosmology. Recfast, for instance is to large degree
hardcoded. So even if dverk and cmbfast ran, it would still be useful to check whether the result is "sensibel" given the model.

 Posts: 144
 Joined: September 24 2004
 Affiliation: University College London (UCL)
 Contact:
CosmoMC: On the minimun \\Chi^2 and priors etc.
I am not convinced that astroph/0406608 are really using anything different to the conventional true probability (cf coverage probability). I see the paragraph on page 5 you are referring to, in which they comment on the [tex]\Delta \chi^2 [/tex] on the 68 and 95 per cent contours:gongbo zhao wrote: in astroph/0406608 the increase of χ2 is much larger than 6.18 and hence, I suppose the authors have been counting the coverage probability, where 95 % of the data are around the best fit
However I assume that they calculate the 68 and 95 per cent confidence contours in the usual way (by finding the isoprobability contour that encloses 68 or 95 per cent of the probability). Then I think that the [tex]\Delta \chi^2[/tex] they quote is an additional piece of information they provide us with.astroph/0406608 page 5 wrote:Studying the distribution of the χ2 values in the MCMC chains for the Lambda CDM models, we find that Delta χ2 = 6.4 for the models at 1 sigma; (68.3% CL) and Delta χ2 = 11.8 at 2 sigma (95.4% CL). In the Gaussian case, this corresponds to about 5.5 independent degrees of freedom, slightly less than the number of cosmological parameters used (6).
What does this [tex]\Delta \chi^2[/tex] mean? A handy table in Numerical Recipes gives the [tex]\Delta \chi^2[/tex] on the isoprobability contour containing e.g. 95 per cent of the probability, for Gaussian probability distributions in a range of dimensions. This number is higher for higher dimensional Gaussians, e.g. for a 1d Gaussian [tex]\Delta \chi^2=4[/tex] whereas for a 6d Gaussian [tex]\Delta \chi^2=12.8[/tex].
If you had a 1d Gaussian in a 2d space (e.g. probability was a Gaussian in x but was indep of y) then the [tex]\Delta \chi^2[/tex] on the isoprobability contour containing 95 per cent of the probability would still be 4 (i.e. that for a 1d Gaussian).
Therefore (i) it is not surprising if [tex]\Delta \chi^2[/tex] is larger than 4 on the 95 per cent confidence contour if there is more than 1 free parameter (ii) the value of [tex]\Delta \chi^2[/tex] can be used to liken the true probability distribution to an nd Gaussian, and estimate n.

 Posts: 144
 Joined: September 24 2004
 Affiliation: University College London (UCL)
 Contact:
CosmoMC: On the minimun \\Chi^2 and priors etc.
Yes, I see your point. The minimiser in CosmoMC (action=2) tries to find local minima using a conjugate gradient minimiser. One possibility is that there is a local minimum at chisq/2=810. The other possibility is that the minimiser is not doing a good job at getting to the minimum, perhaps it stops too soon. What have you used for delta_loglike? This determines when it will stop refining its search (if the change in log likelihood is less than delta_loglike on one iteration), so if you make delta_loglike smaller then you should get closer to the minimum.gongbo zhao wrote:1) On the minimun χ2 , I have calculated it using action =0 and 2 in the params.ini of CosmoMC and set others as default . ... I got the minimun χ2 /2= 805 and 810 respectively .Given MCMC cannot propose accurate Χ2 for best fit sample , the fact that the minimun χ2 /2 (action =0) being smaller than minimun χ2 /2 (action =2) is still puzzling for me . (I had supposed the former should be larger whatever)
An alternative way to see how confident to be in your minimum chisq would be to start the minimiser from different starting points (as given in the CosmoMC params file). This would be useful independent of which of the above (local minimum or struggling minimiser) is the case.

 Posts: 11
 Joined: January 07 2005
 Affiliation: Research Center for the Early Universe, University of Tokyo
Re: CosmoMC: On the minimun \\\\\\\\Chi^2 and priors etc.
So I think you are agreeing with the somewhat conventional and physcial method of astroph/0406608, and this is different from what Dr. Michael Doran claims and also different from the last paragraph in the eprint of astroph/0310723....Sarah Bridle wrote:
I am not convinced that astroph/0406608 are really using anything
different to the conventional true probability (cf coverage probability). I see the paragraph on page 5 you are referring to, in which they comment on the [tex]\Delta \chi^2 [/tex] on the 68 and 95 per cent contours:
.........However I assume that they calculate the 68 and 95 per cent confidence contours in the usual way (by finding the isoprobability contour that encloses 68 or 95 per cent of the probability). Then I think that the [tex]\Delta \chi^2[/tex] they quote is an additional piece of information they provide us with.astroph/0406608 page 5 wrote:Studying the distribution of the χ2 values in the MCMC chains for the Lambda CDM models, we find that Delta χ2 = 6.4 for the models at 1 sigma; (68.3% CL) and Delta χ2 = 11.8 at 2 sigma (95.4% CL). In the Gaussian case, this corresponds to about 5.5 independent degrees of freedom, slightly less than the number of cosmological parameters used (6).
Therefore (i) it is not surprising if [tex]\Delta \chi^2[/tex] is larger than 4 on the 95 per cent confidence contour if there is more than 1 free parameter (ii) the value of [tex]\Delta \chi^2[/tex] can be used to liken the true probability distribution to an nd Gaussian, and estimate n.
My understanding seems to be like this: in paper astroph/0406608 the authors are just counting the coverage probability, disregard how many free parameters are really used(e.g. effective dof=5.5 for the 6parameter fitting and dof\sim 8 for 10parameter fitting ), in this way one can work out the 1d and 2d parameter space. However in paper astroph/0310723, this is a different way: one just get the minimum \chi^2, and then try to find the region where there is an increase of Δχ2=6.18 for the 2d constraint. I believe one will get different results when using above two different approachs, and the difference would be very large if the distribution deviates a lot from Gaussian ......
Regarding the statistical points I think there are two more papers very interesting. One is astroph/0411803. The authors claimed to have used
a \chi^2 minimization instead of the marginalization, (se page 5 of their eprint). The other is astroph/0507170 , where the authors claimed to have used the FREQUENTIST MULTIPROBE APPROACH, they expressed this method in page.34 of the eprint and claimed astroph/0411803 also
essentially used the FREQUENTIST MULTIPROBE APPROACH. I am a bit puzzled not only because the two papers gave somewhat incompatiable results for the same fittings, but I had supposed astroph/0411803 used the bayesian approach anyway, while astroph/0507170 not, as claimed in the introductory part of Section 2.2 ............. I am not sure and would like to see your comments, thanks in advance!

 Posts: 43
 Joined: November 11 2004
 Affiliation: LUTH, Observatoire de ParisMeudon
 Contact:
CosmoMC: On the minimun \\Chi^2 and priors etc.
Hi all,
I just noticed that there was a discussion on our paper, well first of all let me say we did not do anything special, as Sarah has pointed out it is just standard statistics textbook, we have calculate the [tex]1[/tex] and [tex]2\sigma[/tex] confidence contours by finding the "area" which encloses 68 or 95 per cent of the probability. We have quoted the inferred [tex]\Delta\chi^2[/tex] values and compared to the Gaussian case just to show to how many "independent" parameters we were actualy sensitive to. Besides the structure of the likelihood in some of the parameters ([tex]a_t[/tex], [tex]\Delta[/tex]..) is by no means close to a Gaussian. So we would have committed a heresy if we had determined the [tex]1[/tex] and [tex]2\sigma[/tex] countours by taking the region of the parameter space for which [tex]\Delta\chi^2[/tex] values are those corresponding to a Gaussian for the given number of degree of freedom.
This would be simply the wrong procedure. One should always compute
in the former way and then check whether the results agree or not to the Gaussian case, the discrepancy is a further piece of information.
Hope this clarify the point.
Regards,
PierStefano
I just noticed that there was a discussion on our paper, well first of all let me say we did not do anything special, as Sarah has pointed out it is just standard statistics textbook, we have calculate the [tex]1[/tex] and [tex]2\sigma[/tex] confidence contours by finding the "area" which encloses 68 or 95 per cent of the probability. We have quoted the inferred [tex]\Delta\chi^2[/tex] values and compared to the Gaussian case just to show to how many "independent" parameters we were actualy sensitive to. Besides the structure of the likelihood in some of the parameters ([tex]a_t[/tex], [tex]\Delta[/tex]..) is by no means close to a Gaussian. So we would have committed a heresy if we had determined the [tex]1[/tex] and [tex]2\sigma[/tex] countours by taking the region of the parameter space for which [tex]\Delta\chi^2[/tex] values are those corresponding to a Gaussian for the given number of degree of freedom.
This would be simply the wrong procedure. One should always compute
in the former way and then check whether the results agree or not to the Gaussian case, the discrepancy is a further piece of information.
Hope this clarify the point.
Regards,
PierStefano

 Posts: 41
 Joined: November 22 2004
 Affiliation: ITP Heidelberg
 Contact:
Re: CosmoMC: On the minimun \\\\Chi^2 and priors etc.
Hi PierStefano,Pier Stefano Corasaniti wrote:Hi all,
This would be simply the wrong procedure. One should always compute
in the former way and then check whether the results agree or not to the Gaussian case, the discrepancy is a further piece of information.
I think you have a point here. In this case,
however, one should not only quote the 1,2 etc.
\sigma errors, but also the "significance" of this.
Let me still use the almost horizontal, but slightly
tilted distribution. In that case, one would infer
limits on the parameter that are utterly
insignificant, because the parameter is not at all
constrained by the data. This, of course would
then come out from the \Delta \chi^2 that
corresponds to 1 \sigma etc.
On the other hand, using the gaussian \Delta\chi^2 as a *definition* of 1,2 etc. \sigma does
not seem wrong to me: the statement is just a
slightly different one.
But I would like to hear more opinions on this!
Cause I used to use the counting method, now I
use the \Delta \chi^2, mainly because it prevents
displaying "constraints" that are not there in
reality.
So looking forward to replies :]
Michael

 Posts: 43
 Joined: November 11 2004
 Affiliation: LUTH, Observatoire de ParisMeudon
 Contact:
CosmoMC: On the minimun \\Chi^2 and priors etc.
Hi Michael,
sorry for my late reply, I would have emailed to you,
but I think that the topic is of general interests and it is worth
to have an open discussion on it.
Inferring confidence intervals is far from being a trivial subject,
if you look at the literature you will find it extremely vast,
sometimes affected by confusion and misconceptions
and above all divided into Bayesians and Frequentists.
I am a Bayesian and hope that everyone who is not will
become Bayesian by 2006 ;0)
Jokes apart I found very appropriate the
heading of a review article on the topic by Prof. D'Agostini
(a leading expert in Bayesian reasoning and statistical data analysis),
hepex/0002033, where he quotes Galileo's words
by Bertolt Brecht:
"You see, a question has arisen,
about which we cannot come to an agreement,
probably because we have read too many books". ;0)
Nevertheless I think an agreement is possible
if we are both Bayesians.
If I understand, you propose to infer the [tex]1\sigma[/tex]
confidence limits as the boundaries of the parameter interval,
which correspond to the [tex]\Delta\chi^2[/tex] value of a
Gaussian distribution at [tex]1\sigma[/tex].
Of course everyone is free to use such a definition
if he likes it, but the point is:
what does it imply? what does it tell me about the errors?
I found difficult to make sense of such a "confidence interval"
simply because there is not a probabilistic content in it...
..well there is but the [tex]1\sigma[/tex]
would not be the 68% of the total accessible parameter volume,
it will be a different probability depending on the structure of
the posterior. It will be 68% only in the Gaussian case.
Just to make clear the point, in the "conventional" method,
for a given set of data [tex]D[/tex], once I have the likelihood
of the data [tex]\mathcal{L}(D\theta,M)[/tex] given the parameter
[tex]\theta[/tex] in the model [tex]M[/tex] and its prior distribution
[tex]\Pi(\theta,M)[/tex], after normalization I obtain the posterior
[tex]p(\thetaD,M)=\frac{\mathcal{L}(D\theta,M)\Pi(\theta,M)}{\int \mathcal{L}(D\theta,M)\Pi(\theta,M)d\theta},[/tex]
(even Frequentists would agree that the posterior is a probability
function since one can give a direct frequency interpretation of it
...but if people it is confused by this statement just forget it)
Having the posterior we can now compute the [tex]1\sigma[/tex]
"confidence interval" as the range where there is 68% probability
of finding the true value:
[tex]P(\theta_1<\theta<\theta_2)=\frac{\int_{\theta_1}^{\theta_2}\mathcal{L}(D\theta,M)\Pi(\theta,M)d\theta}{\int \mathcal{L}(D\theta,M)\Pi(\theta,M)d\theta}=0.68[/tex] (1)
and I think this is crystall clear.
Now you may complain that you have to be sure that when
you compute the evidence (integral at the denominator)
you have to be sure that you have sampled the likelihood pretty
accurately, particularly where the likelihood is small while
the prior is not.
Of course I totally agree, however this is a computational
challenge and not a logical loophole, Eq. (1) is a selfconsistent
recipe to asses "confidence intervals" in terms of probabilities.
Frequentist may not like it, since there is an explicit dependence
on the prior, however for very informative data the dependence of
the prior is very weak and as we keep updating the likelihood with
better and better data, the prior dependence will be less and less
important. On the other hand if you don't use (1) because you want
to evade the prior dependence even the smallest one,
well anything else that you may think of will not have
probabilistic content.
I have found a very nice proof of why
"frequentist confidence intervals" don't have probabilistic
meaning in Cramer statistic textbook
(Mathematical Methods of Statistics, pag. 509).
This is one reason why we should not call "confidence intervals"
those inferred from (1) in order not to confuse with the frequentist
confidence intervals. The confusion will disappear if people stop using
frequentist methods ;0)..yes I am rather radical and maximalist on the
subject ;0))
Having said this I do think that the point that you raised is indeed
important, and I understand your concern about nearly flat or
nonclosed likelihoods. If this is the case I agree that inferring intervals
may be tricky. This will occur particularly when data are poorly
informative on a given parameter, and dependence on the prior is
therefore strong (in this case there would be also a dependence on the
choice of metric, e.g. whether you use [tex]\theta[/tex] or [tex]\log{\theta}[/tex]).
Well in such a case I would simply not infer any limit,
as a flat likelihood simply means data are not good enough
to say anything about that parameter.
For instance this is what we found in astroph/0406608,
as you can see the likelihoods in [tex]a_t[/tex] and [tex]\Delta[/tex]
are not closed and highly noised (Fig.6).
We though more sensical and honest not to infer any constraints
on those parameters, neither assesed the significance of them
..they are simply unconstrained and that is it.
I would have deserved to be burnt as an heretic in Campo de' Fiori
if we had quoted any "noncovarage probability" limit just for
the sake of quoting a number, although I could have always
asked for clemency to the Pope. (Bayes was a priest after all)
The best,
PierStefano
sorry for my late reply, I would have emailed to you,
but I think that the topic is of general interests and it is worth
to have an open discussion on it.
Inferring confidence intervals is far from being a trivial subject,
if you look at the literature you will find it extremely vast,
sometimes affected by confusion and misconceptions
and above all divided into Bayesians and Frequentists.
I am a Bayesian and hope that everyone who is not will
become Bayesian by 2006 ;0)
Jokes apart I found very appropriate the
heading of a review article on the topic by Prof. D'Agostini
(a leading expert in Bayesian reasoning and statistical data analysis),
hepex/0002033, where he quotes Galileo's words
by Bertolt Brecht:
"You see, a question has arisen,
about which we cannot come to an agreement,
probably because we have read too many books". ;0)
Nevertheless I think an agreement is possible
if we are both Bayesians.
If I understand, you propose to infer the [tex]1\sigma[/tex]
confidence limits as the boundaries of the parameter interval,
which correspond to the [tex]\Delta\chi^2[/tex] value of a
Gaussian distribution at [tex]1\sigma[/tex].
Of course everyone is free to use such a definition
if he likes it, but the point is:
what does it imply? what does it tell me about the errors?
I found difficult to make sense of such a "confidence interval"
simply because there is not a probabilistic content in it...
..well there is but the [tex]1\sigma[/tex]
would not be the 68% of the total accessible parameter volume,
it will be a different probability depending on the structure of
the posterior. It will be 68% only in the Gaussian case.
Just to make clear the point, in the "conventional" method,
for a given set of data [tex]D[/tex], once I have the likelihood
of the data [tex]\mathcal{L}(D\theta,M)[/tex] given the parameter
[tex]\theta[/tex] in the model [tex]M[/tex] and its prior distribution
[tex]\Pi(\theta,M)[/tex], after normalization I obtain the posterior
[tex]p(\thetaD,M)=\frac{\mathcal{L}(D\theta,M)\Pi(\theta,M)}{\int \mathcal{L}(D\theta,M)\Pi(\theta,M)d\theta},[/tex]
(even Frequentists would agree that the posterior is a probability
function since one can give a direct frequency interpretation of it
...but if people it is confused by this statement just forget it)
Having the posterior we can now compute the [tex]1\sigma[/tex]
"confidence interval" as the range where there is 68% probability
of finding the true value:
[tex]P(\theta_1<\theta<\theta_2)=\frac{\int_{\theta_1}^{\theta_2}\mathcal{L}(D\theta,M)\Pi(\theta,M)d\theta}{\int \mathcal{L}(D\theta,M)\Pi(\theta,M)d\theta}=0.68[/tex] (1)
and I think this is crystall clear.
Now you may complain that you have to be sure that when
you compute the evidence (integral at the denominator)
you have to be sure that you have sampled the likelihood pretty
accurately, particularly where the likelihood is small while
the prior is not.
Of course I totally agree, however this is a computational
challenge and not a logical loophole, Eq. (1) is a selfconsistent
recipe to asses "confidence intervals" in terms of probabilities.
Frequentist may not like it, since there is an explicit dependence
on the prior, however for very informative data the dependence of
the prior is very weak and as we keep updating the likelihood with
better and better data, the prior dependence will be less and less
important. On the other hand if you don't use (1) because you want
to evade the prior dependence even the smallest one,
well anything else that you may think of will not have
probabilistic content.
I have found a very nice proof of why
"frequentist confidence intervals" don't have probabilistic
meaning in Cramer statistic textbook
(Mathematical Methods of Statistics, pag. 509).
This is one reason why we should not call "confidence intervals"
those inferred from (1) in order not to confuse with the frequentist
confidence intervals. The confusion will disappear if people stop using
frequentist methods ;0)..yes I am rather radical and maximalist on the
subject ;0))
Having said this I do think that the point that you raised is indeed
important, and I understand your concern about nearly flat or
nonclosed likelihoods. If this is the case I agree that inferring intervals
may be tricky. This will occur particularly when data are poorly
informative on a given parameter, and dependence on the prior is
therefore strong (in this case there would be also a dependence on the
choice of metric, e.g. whether you use [tex]\theta[/tex] or [tex]\log{\theta}[/tex]).
Well in such a case I would simply not infer any limit,
as a flat likelihood simply means data are not good enough
to say anything about that parameter.
For instance this is what we found in astroph/0406608,
as you can see the likelihoods in [tex]a_t[/tex] and [tex]\Delta[/tex]
are not closed and highly noised (Fig.6).
We though more sensical and honest not to infer any constraints
on those parameters, neither assesed the significance of them
..they are simply unconstrained and that is it.
I would have deserved to be burnt as an heretic in Campo de' Fiori
if we had quoted any "noncovarage probability" limit just for
the sake of quoting a number, although I could have always
asked for clemency to the Pope. (Bayes was a priest after all)
The best,
PierStefano