## [astro-ph/0702542] Tainted Evidence: Cosmological Model Selection vs. Fitting

 Authors: Eric V. Linder, Ramon Miquel Abstract: Interpretation of cosmological data to determine the number and values of parameters describing the universe must not rely solely on statistics but involve physical insight. Statistical techniques such as "model selection" or "integrated survey optimization" blindly apply Occam's Razor - this can lead to painful results. We emphasize that the sensitivity to prior probabilities and to the number of models compared can lead to "prior selection" rather than robust model selection. A concrete example demonstrates that Information Criteria can in fact misinform over a large region of parameter space. [PDF]  [PS]  [BibTex]  [Bookmark]

Discussion related to specific recent arXiv papers
Andrew Liddle
Posts: 21
Joined: September 28 2004
Affiliation: University of Edinburgh
Contact:

### [astro-ph/0702542] Tainted Evidence: Cosmological Model Sele

Dear Thomas,

Remember that we are updating model priors as well as parameter priors; if you advocate carrying forward the parameter priors from previous experiments, you should carry forward the model likelihoods too. If you progressively constrained Omega_tot better and better around 1, the model likelihood might change only a little at each step, but eventually these will all add up and give a decisive verdict. After all, it would be a suspect method if the ultimate result were different if we applied the same data bit-by-bit rather than all at once.

Data may well motivate new models, and then one should be careful not to also calculate the evidence of the new model from the same data, as then the data will be being double-counted. If data motivate a new model that is good, but new data is then needed to compare that model against others. [Eg, after WMAP1, someone could try saying that they had a model predicting precisely the value Omega_tot=1.02 that that dataset gave. But what they shouldn't then do is compute the evidence from the same data (which would indeed support that model); instead you wait for more data to come along, eg WMAP3 which no longer supports 1.02.]

Your ready dismissal of two-sigma results highlights a point about the frequentist method. According to this method, a two-sigma result should be correct about 95% of the time, and hence surely ought to be taken very seriously. Yet we all know that two-sigma results are correct much less often than that. Lindley's paradox' may be part of the reason; Bayesian methods set a significantly higher bar' that must be crossed for a result to be taken seriously.

best,

Andrew

Fergus Simpson
Posts: 27
Joined: September 25 2004
Affiliation: University of Barcelona

### Re: [astro-ph/0702542] Tainted Evidence: Cosmological Model

Andrew Liddle wrote:eg WMAP3 which no longer supports 1.02.
I'm not so sure about that... Fig 12 (pg 50) from astro-ph/0603449 still looks centred on 1.02.

Andrew Liddle
Posts: 21
Joined: September 28 2004
Affiliation: University of Edinburgh
Contact:

### Re: [astro-ph/0702542] Tainted Evidence: Cosmological Model

Fergus Simpson wrote:
Andrew Liddle wrote:eg WMAP3 which no longer supports 1.02.
I'm not so sure about that... Fig 12 (pg 50) from astro-ph/0603449 still looks centred on 1.02.
Fair point; I was looking at the w=-1 version of the constraints (which is the model where the 1.02 arose in WMAP1) but there is indeed significant model dependence.

best,

Andrew

Kate Land
Posts: 29
Joined: September 27 2004
Affiliation: Oxford University
Contact:

### [astro-ph/0702542] Tainted Evidence: Cosmological Model Sele

Hia,
I have a couple of basic questions that came up when reading this paper, and the posts herein. I figured this was the best place to ask them!

Firstly, in many cases I just don't understand why the Bayesian evidence, or rather the mean likelihood averaged over the priors, gives a fair ranking. I get the maths behind the Bayesian evidence, and the Bayes factor - but if a model fits well at one point of its parameter space then why does it matter if it fits badly at most other points? In fact, for a correct theory I would only expect it to fit well for some small range of parameter values. Thus I find that the maximum likelihood value intuitively makes more sense as a measure to use.

For example, if we wanted to account for the large-angle CMB alignments with some model, then this model would undoubtablly include the position on sky, $(\theta, \phi)$, as two of its parameters. To compute the Bayesian evidence one would then have to average over all position on the sky - BUT the model won't fit well if it is pointing in the wrong direction! So you would find a low Bayesian evidence even with the correct model.

I imagine the answer is going to be along the lines of 'you have a prior on the direction from previous knowledge'. BUT with only one CMB it seems there is no way to update the prior, or test the model, as we will not have a second observation.

Secondly, I understand the notion of updating priors, etc. But this leads to the question of what is the first ever prior that you use? ie. When you have absolutely no information whatsoever, and no theoretical ideas either. There is no suitable prior in this case!

Kate

Andrew Liddle
Posts: 21
Joined: September 28 2004
Affiliation: University of Edinburgh
Contact:

### Re: [astro-ph/0702542] Tainted Evidence: Cosmological Model

Dear Kate,
Kate Land wrote:Hia,
I have a couple of basic questions that came up when reading this paper, and the posts herein. I figured this was the best place to ask them!

Firstly, in many cases I just don't understand why the Bayesian evidence, or rather the mean likelihood averaged over the priors, gives a fair ranking. I get the maths behind the Bayesian evidence, and the Bayes factor - but if a model fits well at one point of its parameter space then why does it matter if it fits badly at most other points? In fact, for a correct theory I would only expect it to fit well for some small range of parameter values. Thus I find that the maximum likelihood value intuitively makes more sense as a measure to use.

For example, if we wanted to account for the large-angle CMB alignments with some model, then this model would undoubtablly include the position on sky, $(\theta, \phi)$, as two of its parameters. To compute the Bayesian evidence one would then have to average over all position on the sky - BUT the model won't fit well if it is pointing in the wrong direction! So you would find a low Bayesian evidence even with the correct model.

I imagine the answer is going to be along the lines of 'you have a prior on the direction from previous knowledge'. BUT with only one CMB it seems there is no way to update the prior, or test the model, as we will not have a second observation.
Maximum likelihood alone gives you no control over adding arbitary extra parameters of no physical relevance, since the likelihood never decreases. You have to do something to control that. The Bayesian model selection framework is one proposal for that something'. The information criteria (except the bayesian information criterion which is not actually an information criterion) are an alternative that do focus only on the best-fitting model.

In the specific case you mention, the point should be that including the alignments significantly improves the fit to the data, to an extent that overcomes the extra unwanted prior volume from the averaging over angles (assuming that the prior indeed doesn't contain information on an expected direction of alignment). As the likelihood depends exponentially on the quality of fit to data, it doesn't take much improvement to overcome the increased prior volume factor.

The problem that there is only one large-scale CMB anisotropy plagues all methods and there is no ready resolution.
Kate Land wrote: Secondly, I understand the notion of updating priors, etc. But this leads to the question of what is the first ever prior that you use? ie. When you have absolutely no information whatsoever, and no theoretical ideas either. There is no suitable prior in this case!
Kate
Typically I don't think it is reasonable to expect there to be a single well-motivated and unique prior. Instead one should investigate the effects of several to understand how robust the conclusions are. Eventually, if the data are good enough, the conclusions will become prior independent.

best,

Andrew

Thomas Dent
Posts: 26
Joined: November 14 2006
Affiliation: ITP Heidelberg
Contact:

### [astro-ph/0702542] Tainted Evidence: Cosmological Model Sele

However unsatisfactory this might sound, it is 'physicist's intuition' that tells me that a 2 sigma deviation of a measured parameter away from standard model is not exciting.

First, there are quite a lot of parameters we could measure, and over a few years, the probability that at least one will be 2 sigma out for a non-negligible period of time is rather large.

Second, there is a large but unquantifiable probability that the parameter estimation and error estimate are affected by as yet undiscovered systematics or flaws in analysis, which when discovered almost always erase most of the discrepancy. Small screw-ups are much more likely than significant new physics. I don't know of any good way of incorporating this fact of life into a systematic analysis... cosmologists just need to learn that 2 sigma deviations are as common as mud.

Physicists are well used to dealing with 'n sigma' or likelihood results and know from experience what they probably really mean. Whereas most don't have much tested experience with Bayesian inference. Bayes should take care of the first point (the many possible deviations from a standard model) but, at least at first sight, looks less transparent to the effects of screw-ups. Sure, you can do a new calculation to consider what would happen if this or that systematic changed the data such and such a way, but the effect isn't immediately visible. (Of course in complicated parameter spaces nothing is immediately visible...)

After thinking a bit about my 'bootstrap' suggestion I find that it doesn't work, in that one still needs a zeroth prior (as Kate says) before any data at all is applied - and if that zeroth prior is nonsense then so is the result.

What that says to me is that Bayesian inference about a 'model' which has no physically justifiable prior is meaningless. Sounds obvious, but you can't get physics out without putting physics in.

One obvious example, there is no inflationary physics that gives a HZ spectrum. So there is no physics justification for using a 'model' that fixes n=1 as a point of comparison. If people were to use it just on the basis of looking simple they would be fooling themselves to claim any physical significance for the result.

Perhaps one could reformulate it as comparing an inflationary model or class of models which predicts n-1 to be extremely small, with another (class of) model(s) in which it's distributed over a few percent... a rather more complicated question.

Or in astro-ph/0701338, the authors choose top-hat priors on their non-LCDM models, with quite arbitrary boundaries. 'We let w vary, assuming that it is small enough to lead to acceleration.' Model II has a flat prior between -1/3 and -1; Model III has a flat prior between -1/3 and -2. Well, why stop at -2? Why impose acceleration in the first place, which sounds suspiciously like dressing up data in the guise of a prior? The whole exercise has no useful relation to physics models of dark energy (e.g axions) that produce sensible, non-top-hat distributions for w.

If you have physics models (eg fitting stellar spectra, supernovae...) you can compare them. If not, you shouldn't fabricate physics-free prior distributions for the purpose of making a comparison. Without meaningful models, I would argue that the best one can do is to measure numbers.

Thomas

Moncy Vilavinal John
Posts: 3
Joined: March 21 2006
Affiliation: St. Thomas College, Kozhencherry, Kerala, India

### [astro-ph/0702542] Tainted Evidence: Cosmological Model Sele

Dear Kate,

Kate Land wrote:

... if a model fits well at one point of its parameter space then why does it matter if it fits badly at most other points? In fact, for a correct theory I would only expect it to fit well for some small range of parameter values.
You said it right. In this context, it would be interesting to note that all the fundamental constants we know today might have started off as just parameters. For instance, take G (of course, the non-varying G!). After innumerable rounds of experiments, observations etc, its value has reached the present sharp range, with an almost delta-function type of posterior, which may be used as prior in any future measurement. We now expect it to be fit only for this particular, very small range and give a bad fit for all other ranges.

But remember that this happens only after large number of trials and for the right theory and right sort of parameters.

But when someone invents a new parameter, it would be better to have at least a moderately good fit over a wide range of its values. A very good fit over a narrow range in one experiment and a similar fit over another (distant) narrow range in some other (future) experiment is no good sign for a realistic parameter and theory. Proper use of Bayes theory will penalize such models.

Kate Land wrote:

Secondly, I understand the notion of updating priors, etc. But this leads to the question of what is the first ever prior that you use? ie. When you have absolutely no information whatsoever, and no theoretical ideas either. There is no suitable prior in this case!

Don&#8217;t you think that G would have reached the present delta-function like posterior, irrespective of whatever first prior is used?

Hope this helps.

Regards,

Moncy

Douglas Clowe
Posts: 11
Joined: November 05 2005
Affiliation: Ohio University

### Re: [astro-ph/0702542] Tainted Evidence: Cosmological Model

Andrew Liddle wrote:
Maximum likelihood alone gives you no control over adding arbitary extra parameters of no physical relevance, since the likelihood never decreases. You have to do something to control that. Andrew
You can always test to see if the extra parameter improves the quality of the fit enough to justify its inclusion. The simplest method is for a chi-squared fit with Gaussian error bars in the data, where you can just use the F-test of additional term to see if adding in the extra free parameter improves the fit more than adding in any random extra degree of freedom would normally improve the fit. Similar statistics can be derived for other types of fitting and for other types of error bars (provided of course you know the shape of the error distribution).

Andrew Liddle
Posts: 21
Joined: September 28 2004
Affiliation: University of Edinburgh
Contact:

### Re: [astro-ph/0702542] Tainted Evidence: Cosmological Model

Douglas Clowe wrote:
Andrew Liddle wrote:
Maximum likelihood alone gives you no control over adding arbitary extra parameters of no physical relevance, since the likelihood never decreases. You have to do something to control that.
You can always test to see if the extra parameter improves the quality of the fit enough to justify its inclusion. The simplest method is for a chi-squared fit with Gaussian error bars in the data, where you can just use the F-test of additional term to see if adding in the extra free parameter improves the fit more than adding in any random extra degree of freedom would normally improve the fit. Similar statistics can be derived for other types of fitting and for other types of error bars (provided of course you know the shape of the error distribution).
Sure, there are lots of tools for trying to address this question, ranging from frequentist F-tests and likelihood-ratio tests that can address nested models, through to the more sophisticated information theory and Bayesian model-level inference approaches which have broader applicability. The point is that they all disagree with each other, even for simple test cases like a 1D gaussian likelihood. So which should be used?

My personal preference is the Bayesian model selection approach. One reason is that it is more conservative than the others, in the sense that it is much less likely to indicate that a simple model is disfavoured by data in the marginal detection' regime. According to Figure 3 of Roberto's astro-ph/0504022, a two-sigma frequentist `detection' will not be supported by Bayesian model selection. [That's not to say such detections are never right, but their chance of being correct is Bayesian-computable and is much less than 95%, and, in most cases, even 50%.] By contrast, a four-sigma result normally is supported by Bayesian model selection. As it happens, these numbers seem to be in not-at-all bad agreement with the way Thomas (post dated 9-Mar-07) describes particle physicists' translation of number-of-sigmas into an experience-based interpretation. So while particle phyicists have learnt to distrust and reinterpret frequentist confidence limits, if they used the Bayesian model-level framework they might find its quantitative results in much better accord with their experience. Nevertheless, all methods are susceptible to unidentified systematics.

One message you can take from that is that if you want to be conservative and ensure all standard methods support a detection, actually you might as well just use Bayesian model selection as it sets the highest bar.

The other reason I prefer the Bayesian approach is that it is a complete and self-consistent inference framework, with model-level inference a natural and unique extension of parameter estimation. By contrast the frequentist approach is a set of tools with no underlying framework. This may be a matter of taste however.

best,

Andrew

Andrew Liddle
Posts: 21
Joined: September 28 2004
Affiliation: University of Edinburgh
Contact:

### Re: [astro-ph/0702542] Tainted Evidence: Cosmological Model

Dear Thomas,
Thomas Dent wrote:One obvious example, there is no inflationary physics that gives a HZ spectrum. So there is no physics justification for using a 'model' that fixes n=1 as a point of comparison. If people were to use it just on the basis of looking simple they would be fooling themselves to claim any physical significance for the result.

Thomas
To me, the HZ case is interesting enough to merit study, because it was proposed over 35 years ago, a decade before inflation, and has not been definitively ruled out by subsequent data in all that time. I do agree with the argument that it lacks an underlying physical model, and so in my head I carry a Bayesian model prior probability which rates this model below the inflationary one. This is indeed perhaps strong enough that, combined with WMAP3 data, the HZ case can be considered to be ruled out.

But that would be ruling it out based on a combination of theoretical prejudice and observational data. Theoretical prejudice is a good thing to have available in interpreting marginal observational results, and the Bayesian model priors a good way of implementing it. But at the same time, given the long history of the HZ model, it would be nice if we could convincingly exclude it based on data alone, wouldn't it? And, if it is wrong, Planck should be able to do that.

best,

Andrew

Syksy Rasanen
Posts: 119
Joined: March 02 2005
Affiliation: University of Helsinki

### [astro-ph/0702542] Tainted Evidence: Cosmological Model Sele

I notice that there is a new version of the Linder and Miquel paper (the smartest person argument has been removed).

There's also a reply by Liddle, Corasaniti, Kunza, Mukherjee, Parkinson and Trotta, astro-ph/0703285, which addresses a number of issues discussed in this thread.

Thomas Dent
Posts: 26
Joined: November 14 2006
Affiliation: ITP Heidelberg
Contact:

### Re: [astro-ph/0702542] Tainted Evidence: Cosmological Model

Moncy Vilavinal John wrote:
(...)
Don’t you think that G would have reached the present delta-function like posterior, irrespective of whatever first prior is used?
OK, can anyone propose a non-pathological and non-arbitrary 'first prior' then?

Perhaps this is a point where the anthropic principle might come into play, in that any region of parameter space where no observations by anybody (taking 'anybody' in the broadest possible sense) can be made, can a priori be excluded - which usually gets rid of possibly troublesome aspects of an a priori infinite parameter space.

In practice it's not completely simple to calculate such conservative upper and lower limits on an observable G. (Since dimensionful quantities cannot be measured absolutely I find it easier to think of measuring the 'gravitational fine structure constant' which can be written as the proton mass squared over the Planck mass squared...)

T

Roberto Trotta
Posts: 18
Joined: September 27 2004
Affiliation: Imperial College London
Contact:

### Re: [astro-ph/0702542] Tainted Evidence: Cosmological Model

Thomas Dent wrote: OK, can anyone propose a non-pathological and non-arbitrary 'first prior' then?
E.T. Jaynes would say that there is no such a thing as complete ignorance. The challenge is then to find a prior that reflects our real state of knowledge about the problem at hand.

There are several different ways of setting such priors: eg the maximum entropy principle (see eg astro-ph/0702695), or fundamental principles relating with the symmetry properties of the problem (there is one nice worked out example of this in Jaynes' book, I can dig out the precise reference if you are interested) or an analysis based on the expected signal-to-noise of the data that you will gather (I had a go at this in astro-ph/0504022, section about isocurvature modes). I'm sure the list is utterly incomplete!
Thomas Dent wrote:
Perhaps this is a point where the anthropic principle might come into play, in that any region of parameter space where no observations by anybody (taking 'anybody' in the broadest possible sense) can be made, can a priori be excluded - which usually gets rid of possibly troublesome aspects of an a priori infinite parameter space.
While I agree in principle (the observed Universe is of course a piece of information we can and should condition upon), I don't think this is viable in practice, for two reasons. First, redefining the concept of 'anybody' will give you wildly different anthropic selection functions (see astro-ph/0607227); second, if you vary other fundamental parameters apart from say \Lambda (and if you vary one of them, there is no reason why you shouldn't vary all of them) you can find regions of parameter space that survive pretty aggressive anthropic cuts, see astro-ph/0106143, where Aguirre shows that in a 5 parameters model of a cold big bang cosmology the cosmological constant can be 10^{17} times (!) its value in our Universe and yet observers can happily exist (thus violating Weinberg's upper bound on \Lambda derived for the case where the cosmological constant alone is allowed to vary). But this really belongs to a separate thread.

Thomas Dent
Posts: 26
Joined: November 14 2006
Affiliation: ITP Heidelberg
Contact:

### [astro-ph/0702542] Tainted Evidence: Cosmological Model Sele

For various reasons I don't think that the Starkman/Trotta paper "Why Anthropic Reasoning Cannot..." says very much about possibly using anthropic priors in this context - let's agree that it is off topic.

For the gravitational 'fine structure constant' I think some basic sort of anthropizing might work, in that there is a maximum value consistent with the existence of any form of structured matter. Above that value attractive gravity becomes stronger than any other force and things just fall together into a big lump or black hole - and one cannot measure the 'strength of gravity' in any meaningful sense.

Apart from this limiting value you could argue that G is the spurion in the conformal symmetry of the GR action, somewhat as QCD has conformal invariance up to a running which is parameterised by the scale Lambda. So if one were to pick a measure just looking at a theory with protons and gravity it should somehow scale under the conformal symmetry - that is, some sort of power law in (m_p/m_Pl). Since it has to be normalizable over (a,0) where a is the 'anthropic' maximum value, we could go to any positive power, or even constant ~ 1/a ... then hopefully none of these should affect the final posterior!

T

Alan Heavens
Posts: 4
Joined: September 28 2004
Affiliation: Imperial College London
Contact:

### [astro-ph/0702542] Tainted Evidence: Cosmological Model Sele

This paper has generated a lot of interesting discussion. I would argue with much of the paper. In model selection, as with parameter estimation, if the prior really matters then the data are not really good enough. With any 'good' experiment the influence of the data via the likelihood will dominate any uncertainty in the prior ranges. It is a similar story in parameter estimation: the numerical results of the frequentist approach and the Bayesian approach will usually agree closely if the experiment is a good one, even if the former method is answering the wrong question.

Finally, the model selection questions are often of more fundamental importance than parameter estimation, as the answers may be indicative of new physics (e.g. dark energy, modified gravity). In my view, having to state up front one's prior prejudice is a small price to pay (and it is a price which one should in any case have to pay) to have a tool to answer the right question.