[astro-ph/0702542] Tainted Evidence: Cosmological Model Selection vs. Fitting

Authors:  Eric V. Linder, Ramon Miquel
Abstract:  Interpretation of cosmological data to determine the number and values of parameters describing the universe must not rely solely on statistics but involve physical insight. Statistical techniques such as "model selection" or "integrated survey optimization" blindly apply Occam's Razor - this can lead to painful results. We emphasize that the sensitivity to prior probabilities and to the number of models compared can lead to "prior selection" rather than robust model selection. A concrete example demonstrates that Information Criteria can in fact misinform over a large region of parameter space.
[PDF]  [PS]  [BibTex]  [Bookmark]

Discussion related to specific recent arXiv papers
Syksy Rasanen
Posts: 119
Joined: March 02 2005
Affiliation: University of Helsinki

[astro-ph/0702542] Tainted Evidence: Cosmological Model Sele

Post by Syksy Rasanen » February 23 2007

This paper is a strong criticism against model selection, and in favor of parameter fitting.

The paper makes (among other arguments) the point that model selection is difficult to interpret correctly because the result depends on the priors. The authors use an analogy which I didn't quite grasp:

Rather than discussing mathematical niceties, consider searching for, say, the smartest person in the world: applying a uniform prior across the population would lead a model selector to conclude that the evidence favors that person living in England rather than the US, because the US represents a larger parameter volume.

The authors also write, as an example of the adequacy of parameter fitting, that

A standard cosmological fit analysis of actual SNAP + SNF + Planck data will produce a central value in the ([tex]w_0,w_a[/tex]) plane (recall that [tex]w_a[/tex] = −2[tex]w[/tex]′(z = 1)) and a contour around it encompassing some chosen confidence level (CL), typically 68, 90, or 95%. The point corresponding to a cosmological constant (-1,0) may or may not lie inside the contour. If it does not, we may say that we exclude [tex]\Lambda[/tex]CDM at that CL.

More precisely, in a frequentist analysis in which one aims to prove or disprove [tex]\Lambda[/tex]CDM, one would simulate the expected SNAP + SNF + Planck data sample assuming a [tex]\Lambda[/tex]CDM universe, and, analyzing this synthetic data sample as if it were the real data, one would draw a, say, 90% CL contour around the (-1,0) point, much like the one depicted as the inner contour in Fig. 1. That contour tells us that, if [tex]\Lambda[/tex]CDM is true, we expect to get a central value inside that contour in 90% of the observations like SNAP + SNF + Planck that we may perform. Therefore, if our one real SNAP + SNF + Planck observation delivers a central value outside that contour, irrespectively of its associated error ellipse, we will be able to say that we have excluded [tex]\Lambda[/tex]CDM at greater than 90% CL.


It would be interesting to hear comments from people who advocate model selection.

Benjamin Wandelt
Posts: 12
Joined: September 24 2004
Affiliation: IAP, UPMC, Sorbonne Universites, and Illinois
Contact:

[astro-ph/0702542] Tainted Evidence: Cosmological Model Sele

Post by Benjamin Wandelt » February 23 2007

Without getting into philosophical arguments, there is a strong reason to favour the Bayesian model comparison: it can actually be done feasibly for problems relevant to cosmology.

Example: A full frequentist analysis producing a 90% confidence ball in an 8 dimensional model space is actually extremely computationally demanding. It requires simulating a large number of data sets at each node of a grid of models in the 8 dimensional space, and an estimator of the parameter vector for each one of these data sets. This defines the sampling distribution of the estimator.

Then the estimator of the parameter vector is computed using the actual data set. Next, a 7 dimensional hypersurface has to be found in the 8 dimensional parameter space that contains 90% of the data sets which gave the same estimator as the data. That's the 90% confidence ball (disregarding the obvious ambiguity of how to pick one of the infinity of volumes that contain 90%).

By comparison, thermodynamic integration or nested sampling suddenly don't sound so scary any more...

The lesson: just drawing chi-square contours only works if the parameters depend linearly on the data for Gaussian likelihoods, or in a regime where asymptotic methods work. In those cases Bayesian and frequentist methods are both easy and one can argue philosophy. But I said I wouldn't get into the philosophical arguments... so I won't. :)

Andrew Liddle
Posts: 21
Joined: September 28 2004
Affiliation: University of Lisbon

[astro-ph/0702542] Tainted Evidence: Cosmological Model Sele

Post by Andrew Liddle » February 26 2007

Dear Syksy,

As author of two-thirds of the papers mentioned in their pararaph one,
you won't be surprised that I disagree strongly with much (most
actually) of what they say. As it happens I am visiting Berkeley this
coming week and Eric and I will be trying to settle some of our
differences. No doubt after that I'll have more to say, either here or
on astro-ph.

Sticking for just now to your specific points, I too can't make any
sense of their `smartest person' analogy. There don't appear to be
two models (I'm not sure there's even one), so it can't be a model
comparison problem. There also doesn't seem to be any data to make a
comparison with. Perhaps the main message of this paragraph is
supposed to be subliminal rather than logical.

As to their frequentist parameter-fitting method, it wouldn't be my
choice but they can argue for it if they wish. However, I will point
out that what they call the BIC is *not* the BIC. If you want to use
model selection to compute the extent to which a given experiment will
rule out LambdaCDM assuming some dark energy model (or parameter
values) is the true model, you have to consider data simulated for
that `true' model. This is easy to do and in fact we did it for a very
similar experimental configuration in astro-ph/0512484, where Figs 1
and 2 show the comparison they intend to make (except computing the
full evidence rather than the BIC). In that paper we give the Bayesian
interpretation. The point I want to make here is just that if they
want to plot a figure aimed at discreditting the BIC, it should at
least show the BIC.

They may advocate a frequentist approach, but they cannot say that the
Bayesian analysis `misguides' or `spuriously rules out' just because
its results happen to disagree with their frequentist analysis. I
could just as well say, for instance, that their frequentist method `can
spuriously rule out LambdaCDM even when it is true, at much higher
than the stated confidence level', on the grounds that their
frequentist approach doesn't give the same result as model-level
Bayesian inference.

A more detailed critique of their paper will be along in due course.

best regards,

Andrew

Roberto Trotta
Posts: 18
Joined: September 27 2004
Affiliation: Imperial College London
Contact:

[astro-ph/0702542] Tainted Evidence: Cosmological Model Sele

Post by Roberto Trotta » February 27 2007

As author of the remaining third of papers, I'm not very impressed by their counter-example purporting to show that frequentist fitting is superior to Bayesian model selection. Apart from Andrew's cricitism re the way the comparison was carried out (which I share), the disagreement between the two methods is just another example of Lindley's paradox (see astro-ph/0504022) - in my opinion, this is good reason to prefer model selection, rather then to reject it.

Also, I don't think that the arguments given in section III are convincing - in fact, they come across to me as little more than anecdotal. I'd be interested to hear what people think of this section of the paper. I don't understand the example of the 'smartest person' either (perhaps this should count as evidence that such a person does not live in the UK, after all, if I consider myself an average sample of the population?).

Another query I have is re the role and meaning of priors as presented in the paper. I disagree that model selection boils down to prior selection. While it is true that the prior volume remains an irreducible feature of proper Bayesian model selection (ie using the Bayes factor among models rather than information criteria), there is a perfectly well defined rule as to how to choose the prior: it should encapsulate our state of knoweldge about the plausible parameters values and ranges of the model before we see the data. This assignement ought to be based to the whole of our previous scientific experience, possibly encompassing previous observations or theoretical prejudices on might wish to impose. It is true that it is often non-trivial to actually write down an explicit form of prior that fairly encapsulates all of this - and that people in different states of knowledge might end up using different priors. The easy cure for this is to present model selection results as a function of the prior one chooses to adopt, thus showing explicitily how a change in our prior beliefs impacts on the model selection outcome (as in astro-ph/0608116 and astro-ph/0607496).

I think that section V on experiment design misses the point of the Bayesian model selection forecast - the main thrust of which is in my view to go beyond the usual procedure of picking a fiducial model (which might not be the correct one) and making predictions of future errors around that particular point in parameter space. Instead, the Bayesian model selection forecast takes fully onboard present-day parameters and models uncertainty, and is therefore a conservative procedure when estimating the power of a future observation.

Roberto

Eric Linder
Posts: 53
Joined: August 02 2006
Affiliation: UC Berkeley

[astro-ph/0702542] Tainted Evidence: Cosmological Model Sele

Post by Eric Linder » March 01 2007

As is clear from the lead sentence of the conclusions, this paper is not a jeremiad versus model selection but an examination of the need for balance in conclusions drawn statistically.

One example in the paper concerns the "corner fraction" - the extreme ranges of the priors - which one includes in the Bayesian evidence. In a 10-D space this amounts to 99.8% of the volume, yet may have highly nontrivial physics. This causes difficulty with the idea that model selection is a higher level of parameter fitting - that one chooses acceptable models, then finds the fit parameters - because one may well have already discarded valid models before one seeks the specific, physically reasonable parameter fits.

Regarding the contours of Fig. 1, the Bayesian contour from calculating the uncertainty at each point in (w_0,w_a) will be quantitatively very similar to the BIC contour shown (as has been seen in previous papers) and the key result of a "misleading" region will not change.

As to Lindley's paradox, or the disparity between Bayesian and frequentist approaches, as physicists we routinely "bet on" expanded models with 2-4\sigma significance that Bayesian evidence, based on simplicity, would tell us to discard. Consider extending the CDM model in the case of galaxy-galaxy correlation function data or cross-correlation data between galaxies and the CMB: physicists do embrace models with baryon acoustic oscillations and with the ISW effect because of the physics and despite any statistical paradox about their insignificance. As we say in the article, "model efficiency takes a back seat to physical fidelity".

Syksy Rasanen
Posts: 119
Joined: March 02 2005
Affiliation: University of Helsinki

Re: [astro-ph/0702542] Tainted Evidence: Cosmological Model

Post by Syksy Rasanen » March 01 2007

Eric Linder wrote:As to Lindley's paradox, or the disparity between Bayesian and frequentist approaches, as physicists we routinely "bet on" expanded models with 2-4\sigma significance that Bayesian evidence, based on simplicity, would tell us to discard. Consider extending the CDM model in the case of galaxy-galaxy correlation function data or cross-correlation data between galaxies and the CMB: physicists do embrace models with baryon acoustic oscillations and with the ISW effect because of the physics and despite any statistical paradox about their insignificance. As we say in the article, "model efficiency takes a back seat to physical fidelity".
I don't see why this is an argument against model selection. If one feels that something is motivated from a physical point of view, that confidence in the physics can be encoded in the priors, right? Then the data will update one's priors, and quantify how well-justified the physical intuition was with regard to the observations. (More realistically, one can just interpret the statistical significance given by model selection according to one's own prejudice!)

Regarding the example discussed in the paper, I think that dark energy is a rather poorly chosen case for arguing using physics intuition instead of looking at the data with less preconceptions. If dark energy exists and is not vacuum energy, we have remarkably little physical idea about its nature and expected behaviour. (Witness the discussions on treating dark energy perturbations.) Scalar field models were advocated in the paper, but I don't see any theoretical argument in their favour (unlike for the Standard Model of electroweak interactions or other examples mentioned in section III). Assuming e.g. that the equation of state evolves slowly with redshift, or does not fall below -1, can seriously skew the analysis, but at present there is little reason to exclude such possibilities.

(BTW, I noticed there's a nice new paper by Zunckel and Trotta on evaluating the dark energy equation of state with minimal prejudice, astro-ph/0702695.)

Can you explain the smartest person analogy?

John Peacock
Posts: 3
Joined: March 02 2007
Affiliation: University of Edinburgh
Contact:

[astro-ph/0702542] Tainted Evidence: Cosmological Model Sele

Post by John Peacock » March 02 2007

I was quite pleased to see the Linder/Miquel paper, since I have always been uneasy about model selection. This is not to say that I dissent from the overall formalism of Bayesian Evidence - but Linder/Miquel correctly emphasise that the results you get using it can tell you more about the prior you picked than anything else. I can imagine circumstances in which model selection would work perfectly: e.g. contrast n_s=1 with some complete Landscape model in which you can calculate the (frequentist!) probability distribution of n_s. Model selection could then allow you to say whether there was evidence for or against the existence of the ensemble.

But we don't have such a complete theory, and an unjustified "assume a uniform prior on n_s between 0 and 2" is not an acceptable substitute. Therefore, either we can't use the Evidence apparatus, or we have to find a way of being explicit about the fact that the prior on n_s is not known. I'm quite drawn to the hyper-prior approach: e.g. the prior on n_s is a Gaussian of some unknown width, so we need a prior on that width. But this draws you into an infinite tower of priors on priors. If you could somehow turn this into a convergent series, and sum it to yield a definite answer, that would be satisfying. But I can't see how to do this.

Thomas Dent
Posts: 26
Joined: November 14 2006
Affiliation: ITP Heidelberg
Contact:

[astro-ph/0702542] Tainted Evidence: Cosmological Model Sele

Post by Thomas Dent » March 02 2007

If the authors didn't want to issue a Jeremiad, I wonder why they used the word 'tainted' in their title - pretty tendentious.

On the other side, I'm not sure I understand what is going on astro-ph/0504022 with the 'Lindley paradox' (fig. 1), nor why the Bayesian result is necessarily better or more correct than the frequentist.

The only thing that can be making a difference to the Bayesian result in fig.1 is the width of the measured distribution relative to the prior. Since there is no scale on the [tex]\omega[/tex] axis the only thing we have to compare the distribution widths to is the width of the prior.

One could turn the procedure round and keep the same width of distribution, i.e. the same data, but vary the prior width. That would show that for broad priors the Bayesian approach favours [tex]\omega=\omega_0[/tex], whereas for narrower ones it favours a model with free [tex]\omega[/tex]. This seems a more a realistic way to argue, since in practice one obtains data with a given statistical distribution, and then one has to decide what to do about the priors.

This is though not a fatal point against Bayesian evidence, since there is no reason why we should not be able to arrive at a sensible estimate of the correct width and shape of priors (... now how much care and attention should be spent on that part of the procedure?) - which would ideally happen in advance of looking at the data.

But it does seem slightly counterintuitive that your judgment between two models can depend crucially on what you believed you knew before the data were taken.

Thomas

Andrew Liddle
Posts: 21
Joined: September 28 2004
Affiliation: University of Lisbon

[astro-ph/0702542] Tainted Evidence: Cosmological Model Sele

Post by Andrew Liddle » March 03 2007

Dear All,

I wanted to pick up on a couple of points from the different mails in this interesting thread.

I don't see any reason to be afraid of the prior dependence of Bayesian results. Thomas says he finds the dependence on prior information counterintuitive, but remember that the Bayesian methodology is founded on continual updating of probabilities in light of data. How, then, could it not depend on prior assumptions? In fact, Bayes' theorem usefully provides a decomposition of the posterior probabilities into the likelihoods (the bit from the data) and the priors (the subjective bit). Hence we can easily see the extent to which the current conclusions are driven by data and/or priors. Obviously we want to get to the situation where conclusions are data-dominated, but there is no harm in being able to properly understand what is going on in the regime where we still have significant prior information contributing to the posterior (certainly true for dark energy).

I see the ability to vary the priors, and hence explore the robustness of conclusions, as a significant strength of the Bayesian approach.

Eric's point about belief in ISW and baryon oscillations again misses the point. These phenomena are predicted by our standard simple model. They do not require extra parameters, and hence are unrelated to model selection issues. What would have been shocking would have been their absence, not their presence. [In frequentist terms, the null hypothesis is that the effects are present, and the data are consistent with that.]

I don't think anyone is claiming that the Bayesian approach is `more correct' than the frequentist one. The Bayesian approach is a consistent framework of logical inference, built around Bayes' theorem and the manipulation of probabilities. The frequentist approach is a set of rules which lack the coherence to be called a framework, but which nevertheless are mathematically consistent. So it is not a case that one is right and the other wrong; they are just different. Hence my objection above to the claim that Bayesian model selection `misguides' or `spuriously rules out' in some circumstances; that could only be true of the frequentist approach could be said to be correct (and hence that all Bayesian statisticians in the world should be immediately sacked).

But what we can argue about is which approach is more useful, which is indeed a topic worthy of debate.

best regards,

Andrew

Moncy Vilavinal John
Posts: 3
Joined: March 21 2006
Affiliation: St. Thomas College, Kozhencherry, Kerala, India

[astro-ph/0702542] Tainted Evidence: Cosmological Model Sele

Post by Moncy Vilavinal John » March 04 2007

John Peacock wrote:
Linder/Miquel correctly emphasise that the results you get using it can tell you more about the prior you picked than anything else

Such apprehensions regarding priors mostly arise from the overlooking of the fact that the evaluation of Bayesian probability is a continuous process. As admitted by Andrew, Bayesian probability is not objective. Since priors are subjective, so are posteriors. If a coin gave 8 heads in 10 trials, the posterior probability you get at the end of the 10th trial (which in turn is your prior in the 11th trial) will not certainly be 0.5 (even if you have started the experiment with this value as prior). More precisely, the posterior probability we compute at any stage is not objectively verifiable. You can only update your plausibility assignment at the end of each trial. In this sense, Bayesian probability is not falsifiable – just as in the case of cosmology! [See also astro-ph/0506284]. However, in spite of all these, Bayesian theory is the most useful thing in making a decision on your own, under such conditions.

As I understand, the authors and also Peacock are worried about whether a competing model in a model comparison can get undue advantage by picking a suitable prior for some new parameter. But this anxiety is unfounded and can be dispelled once we recognize that Bayesian model comparison is not a one-time exercise. The posterior for that parameter, obtained in that analysis, must be used as prior in the future observation, and if the original prior is manipulated, there is every chance that this will turn out to be detrimental for that model.

This points to the need of discouraging the present practice of picking fresh priors in every new model comparison exercise. In other words, there should be rules for the game and they should be strictly obeyed!

Thomas Dent
Posts: 26
Joined: November 14 2006
Affiliation: ITP Heidelberg
Contact:

[astro-ph/0702542] Tainted Evidence: Cosmological Model Sele

Post by Thomas Dent » March 05 2007

OK, I think it's clearer now.

If the priors encode all relevant information apart from the current data under consideration, it is no surprise when different priors plus the same data can give different conclusions - the sum total of information is different.

In a frequentist approach one just combines old data with new data, of course with different sets of old data the result is different. The Lindley comparison doesn't look fair in that respect because one is comparing a frequentist who looks just at new data, versus a Bayesian who looks at new data plus something else which allows her to find a meaningful prior.

There is no real question of choosing the priors, you just have to decide which information you put into the prior and which you count as part of the data. For obvious reasons, if you exclude all experimental information from the prior there start to be problems...

In the case of the coin toss one's prior would be derived from observational data: that is, observation of what the coin looks like, and what similar coins have done in the past, and what kinds of trickery people did or didn't get up to in coin-tossing situations.

What I would find worrying is if priors are pulled out of hats because someone happens to find them reasonable guesses. But I don't think the situation is as bad as that. For example one might use pre-WMAP data for the prior over [tex]n_s[/tex], that would give you a perfectly well-behaved distribution, which also encodes the fact that no-one had a theoretical clue about its value apart from that it should fit older data.
Bayesian model comparison is not a one-time exercise. The posterior for that parameter, obtained in that analysis, must be used as prior in the future observation, (...)

This points to the need of discouraging the present practice of picking fresh priors in every new model comparison exercise. In other words, there should be rules for the game and they should be strictly obeyed!
In any approach there are potential difficulties with theoretical prejudices that one might want to give the status of information to. But if one admits old observations as part of the priors, a lot of the dependence on 'theory' might go away. Surely many things we might have as prior prejudices are just old observations in disguise.

Jason Dick
Posts: 11
Joined: November 08 2005
Affiliation: SISSA

Re: [astro-ph/0702542] Tainted Evidence: Cosmological Model

Post by Jason Dick » March 06 2007

Thomas Dent wrote:In any approach there are potential difficulties with theoretical prejudices that one might want to give the status of information to. But if one admits old observations as part of the priors, a lot of the dependence on 'theory' might go away. Surely many things we might have as prior prejudices are just old observations in disguise.
Well, if I might interject, I see two potential problems with this:
1. How do we know that the results will converge?
2. How, in this picture, do we check for consistency between different experiments?

My personal objection to making use of Bayesian evidence is that it has the problem where you explicitly need to have finite priors. With Bayesian parameter estimation, however, the question is often not so much what are my specific priors, but rather in what set of parameters am I going to assume uniform priors? In each case the priors are arbitrary, but it gives us a way to present the results of one experiment as independently as possible from other experiments, or not requiring a limitation to only those theoretical models where we have strong knowledge of the priors (depending upon from where one obtains the priors).

But no matter what we do, it seems, there is some degree of arbitrariness that is utterly unavoidable, and as a result I think the real lesson we should take away from this is that if we have an experimental result that claims that model X is ruled out with 90% confidence, we should in our minds expand that contour significantly so as to very qualitatively wrap in this arbitrariness both in modeling and in priors. I say qualitatively because even though there are methods of placing numbers on these things, I rather doubt that we can hope to do so in a non-arbitrary way.

This discussion has really made me all the more respectful of Andy Albrechts, "I won't get out of bed for less than four sigma," approach to ruling out models.

Roberto Trotta
Posts: 18
Joined: September 27 2004
Affiliation: Imperial College London
Contact:

Re: [astro-ph/0702542] Tainted Evidence: Cosmological Model

Post by Roberto Trotta » March 06 2007

Thomas Dent wrote:
On the other side, I'm not sure I understand what is going on astro-ph/0504022 with the 'Lindley paradox' (fig. 1), nor why the Bayesian result is necessarily better or more correct than the frequentist.

The only thing that can be making a difference to the Bayesian result in fig.1 is the width of the measured distribution relative to the prior. Since there is no scale on the [tex]\omega[/tex] axis the only thing we have to compare the distribution widths to is the width of the prior.

Thomas
I'm looking at the new version which came out today - so now Figure 1 is Figure A1 (notice that the direction of the x-axis has been swapped. I thought it was clearer this way).

You are quite right, the difference in the Bayesian approach comes from the Occam's razor effect brought about by the 'wasted volume' of parameter space in going from the prior to the posterior. This can be undestood in terms of information gain, ie how much the data changed your knowledge as encapsulated by the prior (see Eq. (A6) in astro-ph/0504022). And yes, it is the prior that sets the only relevant scale in the problem: after all, there is no absolute notion of 'well measured parameter'. You have to specify 'well measured wrt what' (this is spelled out in some detail in astro-ph/0602378, section IIB). In other words, there is no inference without assumptions (no matter what Frequentists say).

Now in the example I've fixed the prior width, because I'm arguing that this is a quantity that encapsulates your expctations about the plausible values of the extra parameter under the more complicated theory before you see the data. I'm then comparing the model selection results for different values of the likelihood width (the three coloured Gaussian in the top panel), which however are all constructed by hand to be 1.96 sigma's away from the value predicted by the simpler model. This means that under a frequentist rejection test all of the three curves lead us exactly to the same conclusion - namely that \omega_0 as predicted by the simpler model is ruled out at the 2\sigma CL.

The whole point of Lindley's paradox is to illustrate that clearly the widest curve (red Gaussian) is not as informative as the strongly peaked one (cyan curve), if we understand 'information gain' as the increase in our knowledge in going from prior to posterior. Hence it is only natural that our conclusions re the viability of the two models ought to be different for different information content.

This can be intuitively understood: if you measure a parameter to lie 2[tex]\sigma[/tex] away from the predicted value under the simple model but with a spread of the order of your prior for the more complicated model, than your relative belief in the two models will not be strongly affected (which is what you see for I<0 in the bottom panel of Figure A1, the curves are converging to odds of 1:1). But if you measure it with very high precision, say [tex]10^{-10}[/tex] times smaller than the prior scale (that would be I=10 on a log-10 scale), and you are still 2[tex]\sigma[/tex] away from [tex]\omega_0[/tex], then your confidence in the extended model should (correclty) be shattered, as it a surprising fact that this very strongly spiked measurement pops up in the vicinity of the predicted value under the smaller model.

The alternative possibility you mention, namely varying the prior width to assess the change in the model selection outcome following a change in one's prior beliefs, can also be done and it is an instructive exercice. But this is a different issue from Lindley's paradox. It is a way of assessing which change in your prior (ie, model predictivity) would be needed to chance considerably the model selection result. This has been carried out eg in Figure 2 of astro-ph/0703063, showing that the outcome remains essentially the same unless you are ready to entertain quite unreasonable prior beliefs about n_S. Another example (leading to different conclusions) is Figure 1 of astro-ph/0607496 for dark energy models. In general, a more restritive class of models (ie, with a narrower prior) which is compatible with the data will not be ruled out my model selection, but you will only get a non-committal result of equal posterior odds.

Jason Dick
Posts: 11
Joined: November 08 2005
Affiliation: SISSA

Re: [astro-ph/0702542] Tainted Evidence: Cosmological Model

Post by Jason Dick » March 07 2007

Roberto Trotta wrote:You are quite right, the difference in the Bayesian approach comes from the Occam's razor effect brought about by the 'wasted volume' of parameter space in going from the prior to the posterior. This can be undestood in terms of information gain, ie how much the data changed your knowledge as encapsulated by the prior (see Eq. (A6) in astro-ph/0504022). And yes, it is the prior that sets the only relevant scale in the problem: after all, there is no absolute notion of 'well measured parameter'. You have to specify 'well measured wrt what' (this is spelled out in some detail in astro-ph/0602378, section IIB). In other words, there is no inference without assumptions (no matter what Frequentists say).
Well, I think everything you've said is correct. But my only objection is that in a frequentist analysis, you aren't even worrying about the added volume. The only issue of interest as far as the priors are concerned is the shape of the priors within the region where there is significant probability. The shape of the priors outside that region has no effect whatsoever.

So while both analyses are sensitive to rather arbitrary prior choices, the model selection approach has an added sensitivity to the total volume of the space given by the prior choices. I guess I'm just a bit pessimistic that there really is a non-arbitrary, intelligent solution to the problem of how to factor in this sort of "Occam's Razor factor" into the problem of how to rule out models.

Thomas Dent
Posts: 26
Joined: November 14 2006
Affiliation: ITP Heidelberg
Contact:

[astro-ph/0702542] Tainted Evidence: Cosmological Model Sele

Post by Thomas Dent » March 07 2007

Thanks to Roberto for clarifying.

What I think I am trying to get at is the following. The idea for models with extra parameters does not spring up in a vacuum, they are developed in the light of older less precise data which allowed an 'interesting' space for deviation away from simpler models.

This fact, surely, solves the problem of 'models' with theoretically undetermined or very poorly determined parameters. Because there is older data which motivated the model, the prior state of knowledge, or belief, or prejudice or whatever, included the belief that the value of the parameter should be consistent with the old data.

Therefore the older data should be used as part of the prior. If theoretically there is no clue what the distribution over the parameter should be, the (posterior resulting from) older data is all one has. This would be appropriate in the case when the 'model' is simply something with no particular theoretical motivation, like 'allow n_s \neq 1'. The question would then be 'given what was known about the parameter in the more complicated model because of the old data, how does it compare with the simpler model looking at the new data'.

If the new data are only a slight improvement over the old, the prior would then be pretty narrow and we are at the case where frequentist and Bayesian point the same way. (Though 2 sigma is absolutely nothing to be excited about, and if one is honest, nothing even to draw attention to.) This means that if (say) the universe is measured to be flat with gradually better and better precision, we will never definitely be able to rule out the 'model' in which it deviates from flatness, if that amount of deviation becomes smaller and smaller...

But if we have no reason to believe that any size of deviation from flatness or scale invariance is more likely than any other, I don't see what else can be done. After all, very small deviations may be very physically significant. The real problem here is 'models' which amount to tweaking a parameter without clear physical motivation.

But if one does have a physically motivated model which produces on its own a meaningful probability distribution (as physical models should!), it does make sense to use that as the prior and test it with respect to all the data, to get a clean inference.

Thomas

Post Reply