
CosmoCoffee

Authors:  Eric V. Linder, Ramon Miquel 
Abstract:  Interpretation of cosmological data to determine the number and values of
parameters describing the universe must not rely solely on statistics but
involve physical insight. Statistical techniques such as "model selection" or
"integrated survey optimization" blindly apply Occam's Razor  this can lead to
painful results. We emphasize that the sensitivity to prior probabilities and
to the number of models compared can lead to "prior selection" rather than
robust model selection. A concrete example demonstrates that Information
Criteria can in fact misinform over a large region of parameter space. 

[PDF]
[PS] [BibTex] [Bookmark]

View previous topic :: View next topic 
Author 
Message 
Andrew Liddle
Joined: 28 Sep 2004 Posts: 21 Affiliation: University of Edinburgh

Posted: March 08 2007 


Dear Thomas,
Remember that we are updating model priors as well as parameter priors; if you advocate carrying forward the parameter priors from previous experiments, you should carry forward the model likelihoods too. If you progressively constrained Omega_tot better and better around 1, the model likelihood might change only a little at each step, but eventually these will all add up and give a decisive verdict. After all, it would be a suspect method if the ultimate result were different if we applied the same data bitbybit rather than all at once.
Data may well motivate new models, and then one should be careful not to also calculate the evidence of the new model from the same data, as then the data will be being doublecounted. If data motivate a new model that is good, but new data is then needed to compare that model against others. [Eg, after WMAP1, someone could try saying that they had a model predicting precisely the value Omega_tot=1.02 that that dataset gave. But what they shouldn't then do is compute the evidence from the same data (which would indeed support that model); instead you wait for more data to come along, eg WMAP3 which no longer supports 1.02.]
Your ready dismissal of twosigma results highlights a point about the frequentist method. According to this method, a twosigma result should be correct about 95% of the time, and hence surely ought to be taken very seriously. Yet we all know that twosigma results are correct much less often than that. Lindley's `paradox' may be part of the reason; Bayesian methods set a significantly higher `bar' that must be crossed for a result to be taken seriously.
best,
Andrew 

Back to top 


Fergus Simpson
Joined: 25 Sep 2004 Posts: 27 Affiliation: University of Barcelona

Posted: March 08 2007 


Andrew Liddle wrote:  eg WMAP3 which no longer supports 1.02. 
I'm not so sure about that... Fig 12 (pg 50) from astroph/0603449 still looks centred on 1.02. 

Back to top 


Andrew Liddle
Joined: 28 Sep 2004 Posts: 21 Affiliation: University of Edinburgh

Posted: March 09 2007 


Fergus Simpson wrote:  Andrew Liddle wrote:  eg WMAP3 which no longer supports 1.02. 
I'm not so sure about that... Fig 12 (pg 50) from astroph/0603449 still looks centred on 1.02. 
Fair point; I was looking at the w=−1 version of the constraints (which is the model where the 1.02 arose in WMAP1) but there is indeed significant model dependence.
best,
Andrew 

Back to top 


Kate Land
Joined: 27 Sep 2004 Posts: 29 Affiliation: Oxford University

Posted: March 09 2007 


Hia,
I have a couple of basic questions that came up when reading this paper, and the posts herein. I figured this was the best place to ask them!
Firstly, in many cases I just don't understand why the Bayesian evidence, or rather the mean likelihood averaged over the priors, gives a fair ranking. I get the maths behind the Bayesian evidence, and the Bayes factor  but if a model fits well at one point of its parameter space then why does it matter if it fits badly at most other points? In fact, for a correct theory I would only expect it to fit well for some small range of parameter values. Thus I find that the maximum likelihood value intuitively makes more sense as a measure to use.
For example, if we wanted to account for the largeangle CMB alignments with some model, then this model would undoubtablly include the position on sky, (θ,φ), as two of its parameters. To compute the Bayesian evidence one would then have to average over all position on the sky  BUT the model won't fit well if it is pointing in the wrong direction! So you would find a low Bayesian evidence even with the correct model.
I imagine the answer is going to be along the lines of 'you have a prior on the direction from previous knowledge'. BUT with only one CMB it seems there is no way to update the prior, or test the model, as we will not have a second observation.
Secondly, I understand the notion of updating priors, etc. But this leads to the question of what is the first ever prior that you use? ie. When you have absolutely no information whatsoever, and no theoretical ideas either. There is no suitable prior in this case!
Kate 

Back to top 


Andrew Liddle
Joined: 28 Sep 2004 Posts: 21 Affiliation: University of Edinburgh

Posted: March 10 2007 


Dear Kate,
Kate Land wrote:  Hia,
I have a couple of basic questions that came up when reading this paper, and the posts herein. I figured this was the best place to ask them!
Firstly, in many cases I just don't understand why the Bayesian evidence, or rather the mean likelihood averaged over the priors, gives a fair ranking. I get the maths behind the Bayesian evidence, and the Bayes factor  but if a model fits well at one point of its parameter space then why does it matter if it fits badly at most other points? In fact, for a correct theory I would only expect it to fit well for some small range of parameter values. Thus I find that the maximum likelihood value intuitively makes more sense as a measure to use.
For example, if we wanted to account for the largeangle CMB alignments with some model, then this model would undoubtablly include the position on sky, (θ,φ), as two of its parameters. To compute the Bayesian evidence one would then have to average over all position on the sky  BUT the model won't fit well if it is pointing in the wrong direction! So you would find a low Bayesian evidence even with the correct model.
I imagine the answer is going to be along the lines of 'you have a prior on the direction from previous knowledge'. BUT with only one CMB it seems there is no way to update the prior, or test the model, as we will not have a second observation.

Maximum likelihood alone gives you no control over adding arbitary extra parameters of no physical relevance, since the likelihood never decreases. You have to do something to control that. The Bayesian model selection framework is one proposal for that `something'. The information criteria (except the bayesian information criterion which is not actually an information criterion) are an alternative that do focus only on the bestfitting model.
In the specific case you mention, the point should be that including the alignments significantly improves the fit to the data, to an extent that overcomes the extra unwanted prior volume from the averaging over angles (assuming that the prior indeed doesn't contain information on an expected direction of alignment). As the likelihood depends exponentially on the quality of fit to data, it doesn't take much improvement to overcome the increased prior volume factor.
The problem that there is only one largescale CMB anisotropy plagues all methods and there is no ready resolution.
Kate Land wrote: 
Secondly, I understand the notion of updating priors, etc. But this leads to the question of what is the first ever prior that you use? ie. When you have absolutely no information whatsoever, and no theoretical ideas either. There is no suitable prior in this case!
Kate 
Typically I don't think it is reasonable to expect there to be a single wellmotivated and unique prior. Instead one should investigate the effects of several to understand how robust the conclusions are. Eventually, if the data are good enough, the conclusions will become prior independent.
best,
Andrew 

Back to top 


Thomas Dent
Joined: 14 Nov 2006 Posts: 28 Affiliation: ITP Heidelberg

Posted: March 10 2007 


However unsatisfactory this might sound, it is 'physicist's intuition' that tells me that a 2 sigma deviation of a measured parameter away from standard model is not exciting.
First, there are quite a lot of parameters we could measure, and over a few years, the probability that at least one will be 2 sigma out for a nonnegligible period of time is rather large.
Second, there is a large but unquantifiable probability that the parameter estimation and error estimate are affected by as yet undiscovered systematics or flaws in analysis, which when discovered almost always erase most of the discrepancy. Small screwups are much more likely than significant new physics. I don't know of any good way of incorporating this fact of life into a systematic analysis... cosmologists just need to learn that 2 sigma deviations are as common as mud.
Physicists are well used to dealing with 'n sigma' or likelihood results and know from experience what they probably really mean. Whereas most don't have much tested experience with Bayesian inference. Bayes should take care of the first point (the many possible deviations from a standard model) but, at least at first sight, looks less transparent to the effects of screwups. Sure, you can do a new calculation to consider what would happen if this or that systematic changed the data such and such a way, but the effect isn't immediately visible. (Of course in complicated parameter spaces nothing is immediately visible...)
After thinking a bit about my 'bootstrap' suggestion I find that it doesn't work, in that one still needs a zeroth prior (as Kate says) before any data at all is applied  and if that zeroth prior is nonsense then so is the result.
What that says to me is that Bayesian inference about a 'model' which has no physically justifiable prior is meaningless. Sounds obvious, but you can't get physics out without putting physics in.
One obvious example, there is no inflationary physics that gives a HZ spectrum. So there is no physics justification for using a 'model' that fixes n=1 as a point of comparison. If people were to use it just on the basis of looking simple they would be fooling themselves to claim any physical significance for the result.
Perhaps one could reformulate it as comparing an inflationary model or class of models which predicts n−1 to be extremely small, with another (class of) model(s) in which it's distributed over a few percent... a rather more complicated question.
Or in astroph/0701338, the authors choose tophat priors on their nonLCDM models, with quite arbitrary boundaries. 'We let w vary, assuming that it is small enough to lead to acceleration.' Model II has a flat prior between −1/3 and −1; Model III has a flat prior between −1/3 and −2. Well, why stop at −2? Why impose acceleration in the first place, which sounds suspiciously like dressing up data in the guise of a prior? The whole exercise has no useful relation to physics models of dark energy (e.g axions) that produce sensible, nontophat distributions for w.
If you have physics models (eg fitting stellar spectra, supernovae...) you can compare them. If not, you shouldn't fabricate physicsfree prior distributions for the purpose of making a comparison. Without meaningful models, I would argue that the best one can do is to measure numbers.
Thomas 

Back to top 


Moncy Vilavinal John
Joined: 21 Mar 2006 Posts: 3 Affiliation: St. Thomas College, Kozhencherry, Kerala, India

Posted: March 11 2007 


Dear Kate,
Kate Land wrote:
Quote: 
... if a model fits well at one point of its parameter space then why does it matter if it fits badly at most other points? In fact, for a correct theory I would only expect it to fit well for some small range of parameter values.

You said it right. In this context, it would be interesting to note that all the fundamental constants we know today might have started off as just parameters. For instance, take G (of course, the nonvarying G!). After innumerable rounds of experiments, observations etc, its value has reached the present sharp range, with an almost deltafunction type of posterior, which may be used as prior in any future measurement. We now expect it to be fit only for this particular, very small range and give a bad fit for all other ranges.
But remember that this happens only after large number of trials and for the right theory and right sort of parameters.
But when someone invents a new parameter, it would be better to have at least a moderately good fit over a wide range of its values. A very good fit over a narrow range in one experiment and a similar fit over another (distant) narrow range in some other (future) experiment is no good sign for a realistic parameter and theory. Proper use of Bayes theory will penalize such models.
Kate Land wrote:
Quote: 
Secondly, I understand the notion of updating priors, etc. But this leads to the question of what is the first ever prior that you use? ie. When you have absolutely no information whatsoever, and no theoretical ideas either. There is no suitable prior in this case!

Don’t you think that G would have reached the present deltafunction like posterior, irrespective of whatever first prior is used?
Hope this helps.
Regards,
Moncy 

Back to top 


Douglas Clowe
Joined: 05 Nov 2005 Posts: 12 Affiliation: Ohio University

Posted: March 12 2007 


Andrew Liddle wrote: 
Maximum likelihood alone gives you no control over adding arbitary extra parameters of no physical relevance, since the likelihood never decreases. You have to do something to control that. Andrew 
You can always test to see if the extra parameter improves the quality of the fit enough to justify its inclusion. The simplest method is for a chisquared fit with Gaussian error bars in the data, where you can just use the Ftest of additional term to see if adding in the extra free parameter improves the fit more than adding in any random extra degree of freedom would normally improve the fit. Similar statistics can be derived for other types of fitting and for other types of error bars (provided of course you know the shape of the error distribution). 

Back to top 


Andrew Liddle
Joined: 28 Sep 2004 Posts: 21 Affiliation: University of Edinburgh

Posted: March 15 2007 


Douglas Clowe wrote:  Andrew Liddle wrote: 
Maximum likelihood alone gives you no control over adding arbitary extra parameters of no physical relevance, since the likelihood never decreases. You have to do something to control that. 
You can always test to see if the extra parameter improves the quality of the fit enough to justify its inclusion. The simplest method is for a chisquared fit with Gaussian error bars in the data, where you can just use the Ftest of additional term to see if adding in the extra free parameter improves the fit more than adding in any random extra degree of freedom would normally improve the fit. Similar statistics can be derived for other types of fitting and for other types of error bars (provided of course you know the shape of the error distribution). 
Sure, there are lots of tools for trying to address this question, ranging from frequentist Ftests and likelihoodratio tests that can address nested models, through to the more sophisticated information theory and Bayesian modellevel inference approaches which have broader applicability. The point is that they all disagree with each other, even for simple test cases like a 1D gaussian likelihood. So which should be used?
My personal preference is the Bayesian model selection approach. One reason is that it is more conservative than the others, in the sense that it is much less likely to indicate that a simple model is disfavoured by data in the marginal `detection' regime. According to Figure 3 of Roberto's astroph/0504022, a twosigma frequentist `detection' will not be supported by Bayesian model selection. [That's not to say such detections are never right, but their chance of being correct is Bayesiancomputable and is much less than 95%, and, in most cases, even 50%.] By contrast, a foursigma result normally is supported by Bayesian model selection. As it happens, these numbers seem to be in notatall bad agreement with the way Thomas (post dated 9Mar−07) describes particle physicists' translation of numberofsigmas into an experiencebased interpretation. So while particle phyicists have learnt to distrust and reinterpret frequentist confidence limits, if they used the Bayesian modellevel framework they might find its quantitative results in much better accord with their experience. Nevertheless, all methods are susceptible to unidentified systematics.
One message you can take from that is that if you want to be conservative and ensure all standard methods support a detection, actually you might as well just use Bayesian model selection as it sets the highest bar.
The other reason I prefer the Bayesian approach is that it is a complete and selfconsistent inference framework, with modellevel inference a natural and unique extension of parameter estimation. By contrast the frequentist approach is a set of tools with no underlying framework. This may be a matter of taste however.
best,
Andrew 

Back to top 


Andrew Liddle
Joined: 28 Sep 2004 Posts: 21 Affiliation: University of Edinburgh

Posted: March 15 2007 


Dear Thomas,
Thomas Dent wrote:  One obvious example, there is no inflationary physics that gives a HZ spectrum. So there is no physics justification for using a 'model' that fixes n=1 as a point of comparison. If people were to use it just on the basis of looking simple they would be fooling themselves to claim any physical significance for the result.
Thomas 
To me, the HZ case is interesting enough to merit study, because it was proposed over 35 years ago, a decade before inflation, and has not been definitively ruled out by subsequent data in all that time. I do agree with the argument that it lacks an underlying physical model, and so in my head I carry a Bayesian model prior probability which rates this model below the inflationary one. This is indeed perhaps strong enough that, combined with WMAP3 data, the HZ case can be considered to be ruled out.
But that would be ruling it out based on a combination of theoretical prejudice and observational data. Theoretical prejudice is a good thing to have available in interpreting marginal observational results, and the Bayesian model priors a good way of implementing it. But at the same time, given the long history of the HZ model, it would be nice if we could convincingly exclude it based on data alone, wouldn't it? And, if it is wrong, Planck should be able to do that.
best,
Andrew 

Back to top 


Syksy Rasanen
Joined: 02 Mar 2005 Posts: 128 Affiliation: University of Helsinki

Posted: March 15 2007 


I notice that there is a new version of the Linder and Miquel paper (the smartest person argument has been removed).
There's also a reply by Liddle, Corasaniti, Kunza, Mukherjee, Parkinson and Trotta, astroph/0703285, which addresses a number of issues discussed in this thread. 

Back to top 


Thomas Dent
Joined: 14 Nov 2006 Posts: 28 Affiliation: ITP Heidelberg

Posted: March 15 2007 


Moncy Vilavinal John wrote: 
(...)
Don’t you think that G would have reached the present deltafunction like posterior, irrespective of whatever first prior is used?

OK, can anyone propose a nonpathological and nonarbitrary 'first prior' then?
Perhaps this is a point where the anthropic principle might come into play, in that any region of parameter space where no observations by anybody (taking 'anybody' in the broadest possible sense) can be made, can a priori be excluded  which usually gets rid of possibly troublesome aspects of an a priori infinite parameter space.
In practice it's not completely simple to calculate such conservative upper and lower limits on an observable G. (Since dimensionful quantities cannot be measured absolutely I find it easier to think of measuring the 'gravitational fine structure constant' which can be written as the proton mass squared over the Planck mass squared...)
T 

Back to top 


Roberto Trotta
Joined: 27 Sep 2004 Posts: 18 Affiliation: Imperial College London

Posted: March 16 2007 


Thomas Dent wrote: 
OK, can anyone propose a nonpathological and nonarbitrary 'first prior' then?

E.T. Jaynes would say that there is no such a thing as complete ignorance. The challenge is then to find a prior that reflects our real state of knowledge about the problem at hand.
There are several different ways of setting such priors: eg the maximum entropy principle (see eg astroph/0702695), or fundamental principles relating with the symmetry properties of the problem (there is one nice worked out example of this in Jaynes' book, I can dig out the precise reference if you are interested) or an analysis based on the expected signaltonoise of the data that you will gather (I had a go at this in astroph/0504022, section about isocurvature modes). I'm sure the list is utterly incomplete!
Thomas Dent wrote: 
Perhaps this is a point where the anthropic principle might come into play, in that any region of parameter space where no observations by anybody (taking 'anybody' in the broadest possible sense) can be made, can a priori be excluded  which usually gets rid of possibly troublesome aspects of an a priori infinite parameter space.

While I agree in principle (the observed Universe is of course a piece of information we can and should condition upon), I don't think this is viable in practice, for two reasons. First, redefining the concept of 'anybody' will give you wildly different anthropic selection functions (see astroph/0607227); second, if you vary other fundamental parameters apart from say Λ (and if you vary one of them, there is no reason why you shouldn't vary all of them) you can find regions of parameter space that survive pretty aggressive anthropic cuts, see astroph/0106143, where Aguirre shows that in a 5 parameters model of a cold big bang cosmology the cosmological constant can be 10^{17} times (!) its value in our Universe and yet observers can happily exist (thus violating Weinberg's upper bound on Λ derived for the case where the cosmological constant alone is allowed to vary). But this really belongs to a separate thread. 

Back to top 


Thomas Dent
Joined: 14 Nov 2006 Posts: 28 Affiliation: ITP Heidelberg

Posted: March 16 2007 


For various reasons I don't think that the Starkman/Trotta paper "Why Anthropic Reasoning Cannot..." says very much about possibly using anthropic priors in this context  let's agree that it is off topic.
For the gravitational 'fine structure constant' I think some basic sort of anthropizing might work, in that there is a maximum value consistent with the existence of any form of structured matter. Above that value attractive gravity becomes stronger than any other force and things just fall together into a big lump or black hole  and one cannot measure the 'strength of gravity' in any meaningful sense.
Apart from this limiting value you could argue that G is the spurion in the conformal symmetry of the GR action, somewhat as QCD has conformal invariance up to a running which is parameterised by the scale Lambda. So if one were to pick a measure just looking at a theory with protons and gravity it should somehow scale under the conformal symmetry  that is, some sort of power law in (m_{p}/m_Pl). Since it has to be normalizable over (a,0) where a is the 'anthropic' maximum value, we could go to any positive power, or even constant ~ 1/a ... then hopefully none of these should affect the final posterior!
T 

Back to top 


Alan Heavens
Joined: 28 Sep 2004 Posts: 4 Affiliation: Imperial College London

Posted: March 26 2007 


This paper has generated a lot of interesting discussion. I would argue with much of the paper. In model selection, as with parameter estimation, if the prior really matters then the data are not really good enough. With any 'good' experiment the influence of the data via the likelihood will dominate any uncertainty in the prior ranges. It is a similar story in parameter estimation: the numerical results of the frequentist approach and the Bayesian approach will usually agree closely if the experiment is a good one, even if the former method is answering the wrong question.
Finally, the model selection questions are often of more fundamental importance than parameter estimation, as the answers may be indicative of new physics (e.g. dark energy, modified gravity). In my view, having to state up front one's prior prejudice is a small price to pay (and it is a price which one should in any case have to pay) to have a tool to answer the right question. 

Back to top 




You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum

