
CosmoCoffee

Authors:  Eric V. Linder, Ramon Miquel 
Abstract:  Interpretation of cosmological data to determine the number and values of
parameters describing the universe must not rely solely on statistics but
involve physical insight. Statistical techniques such as "model selection" or
"integrated survey optimization" blindly apply Occam's Razor  this can lead to
painful results. We emphasize that the sensitivity to prior probabilities and
to the number of models compared can lead to "prior selection" rather than
robust model selection. A concrete example demonstrates that Information
Criteria can in fact misinform over a large region of parameter space. 

[PDF]
[PS] [BibTex] [Bookmark]

View previous topic :: View next topic 
Author 
Message 
Syksy Rasanen
Joined: 02 Mar 2005 Posts: 128 Affiliation: University of Helsinki

Posted: February 23 2007 


This paper is a strong criticism against model selection, and in favor of parameter fitting.
The paper makes (among other arguments) the point that model selection is difficult to interpret correctly because the result depends on the priors. The authors use an analogy which I didn't quite grasp:
Rather than discussing mathematical niceties, consider searching for, say, the smartest person in the world: applying a uniform prior across the population would lead a model selector to conclude that the evidence favors that person living in England rather than the US, because the US represents a larger parameter volume.
The authors also write, as an example of the adequacy of parameter fitting, that
A standard cosmological fit analysis of actual SNAP + SNF + Planck data will produce a central value in the (w_{0},w_{a}) plane (recall that w_{a} = −2w′(z = 1)) and a contour around it encompassing some chosen confidence level (CL), typically 68, 90, or 95%. The point corresponding to a cosmological constant (1,0) may or may not lie inside the contour. If it does not, we may say that we exclude ΛCDM at that CL.
More precisely, in a frequentist analysis in which one aims to prove or disprove ΛCDM, one would simulate the expected SNAP + SNF + Planck data sample assuming a ΛCDM universe, and, analyzing this synthetic data sample as if it were the real data, one would draw a, say, 90% CL contour around the (1,0) point, much like the one depicted as the inner contour in Fig. 1. That contour tells us that, if ΛCDM is true, we expect to get a central value inside that contour in 90% of the observations like SNAP + SNF + Planck that we may perform. Therefore, if our one real SNAP + SNF + Planck observation delivers a central value outside that contour, irrespectively of its associated error ellipse, we will be able to say that we have excluded ΛCDM at greater than 90% CL.
It would be interesting to hear comments from people who advocate model selection. 

Back to top 


Benjamin Wandelt
Joined: 24 Sep 2004 Posts: 12 Affiliation: IAP, UPMC, Sorbonne Universites, and Illinois

Posted: February 23 2007 


Without getting into philosophical arguments, there is a strong reason to favour the Bayesian model comparison: it can actually be done feasibly for problems relevant to cosmology.
Example: A full frequentist analysis producing a 90% confidence ball in an 8 dimensional model space is actually extremely computationally demanding. It requires simulating a large number of data sets at each node of a grid of models in the 8 dimensional space, and an estimator of the parameter vector for each one of these data sets. This defines the sampling distribution of the estimator.
Then the estimator of the parameter vector is computed using the actual data set. Next, a 7 dimensional hypersurface has to be found in the 8 dimensional parameter space that contains 90% of the data sets which gave the same estimator as the data. That's the 90% confidence ball (disregarding the obvious ambiguity of how to pick one of the infinity of volumes that contain 90%).
By comparison, thermodynamic integration or nested sampling suddenly don't sound so scary any more...
The lesson: just drawing chisquare contours only works if the parameters depend linearly on the data for Gaussian likelihoods, or in a regime where asymptotic methods work. In those cases Bayesian and frequentist methods are both easy and one can argue philosophy. But I said I wouldn't get into the philosophical arguments... so I won't. :) 

Back to top 


Andrew Liddle
Joined: 28 Sep 2004 Posts: 21 Affiliation: University of Edinburgh

Posted: February 26 2007 


Dear Syksy,
As author of twothirds of the papers mentioned in their pararaph one,
you won't be surprised that I disagree strongly with much (most
actually) of what they say. As it happens I am visiting Berkeley this
coming week and Eric and I will be trying to settle some of our
differences. No doubt after that I'll have more to say, either here or
on astroph.
Sticking for just now to your specific points, I too can't make any
sense of their `smartest person' analogy. There don't appear to be
two models (I'm not sure there's even one), so it can't be a model
comparison problem. There also doesn't seem to be any data to make a
comparison with. Perhaps the main message of this paragraph is
supposed to be subliminal rather than logical.
As to their frequentist parameterfitting method, it wouldn't be my
choice but they can argue for it if they wish. However, I will point
out that what they call the BIC is *not* the BIC. If you want to use
model selection to compute the extent to which a given experiment will
rule out LambdaCDM assuming some dark energy model (or parameter
values) is the true model, you have to consider data simulated for
that `true' model. This is easy to do and in fact we did it for a very
similar experimental configuration in astroph/0512484, where Figs 1
and 2 show the comparison they intend to make (except computing the
full evidence rather than the BIC). In that paper we give the Bayesian
interpretation. The point I want to make here is just that if they
want to plot a figure aimed at discreditting the BIC, it should at
least show the BIC.
They may advocate a frequentist approach, but they cannot say that the
Bayesian analysis `misguides' or `spuriously rules out' just because
its results happen to disagree with their frequentist analysis. I
could just as well say, for instance, that their frequentist method `can
spuriously rule out LambdaCDM even when it is true, at much higher
than the stated confidence level', on the grounds that their
frequentist approach doesn't give the same result as modellevel
Bayesian inference.
A more detailed critique of their paper will be along in due course.
best regards,
Andrew 

Back to top 


Roberto Trotta
Joined: 27 Sep 2004 Posts: 18 Affiliation: Imperial College London

Posted: February 27 2007 


As author of the remaining third of papers, I'm not very impressed by their counterexample purporting to show that frequentist fitting is superior to Bayesian model selection. Apart from Andrew's cricitism re the way the comparison was carried out (which I share), the disagreement between the two methods is just another example of Lindley's paradox (see astroph/0504022)  in my opinion, this is good reason to prefer model selection, rather then to reject it.
Also, I don't think that the arguments given in section III are convincing  in fact, they come across to me as little more than anecdotal. I'd be interested to hear what people think of this section of the paper. I don't understand the example of the 'smartest person' either (perhaps this should count as evidence that such a person does not live in the UK, after all, if I consider myself an average sample of the population?).
Another query I have is re the role and meaning of priors as presented in the paper. I disagree that model selection boils down to prior selection. While it is true that the prior volume remains an irreducible feature of proper Bayesian model selection (ie using the Bayes factor among models rather than information criteria), there is a perfectly well defined rule as to how to choose the prior: it should encapsulate our state of knoweldge about the plausible parameters values and ranges of the model before we see the data. This assignement ought to be based to the whole of our previous scientific experience, possibly encompassing previous observations or theoretical prejudices on might wish to impose. It is true that it is often nontrivial to actually write down an explicit form of prior that fairly encapsulates all of this  and that people in different states of knowledge might end up using different priors. The easy cure for this is to present model selection results as a function of the prior one chooses to adopt, thus showing explicitily how a change in our prior beliefs impacts on the model selection outcome (as in astroph/0608116 and astroph/0607496).
I think that section V on experiment design misses the point of the Bayesian model selection forecast  the main thrust of which is in my view to go beyond the usual procedure of picking a fiducial model (which might not be the correct one) and making predictions of future errors around that particular point in parameter space. Instead, the Bayesian model selection forecast takes fully onboard presentday parameters and models uncertainty, and is therefore a conservative procedure when estimating the power of a future observation.
Roberto 

Back to top 


Eric Linder
Joined: 02 Aug 2006 Posts: 48 Affiliation: UC Berkeley

Posted: March 01 2007 


As is clear from the lead sentence of the conclusions, this paper is not a jeremiad versus model selection but an examination of the need for balance in conclusions drawn statistically.
One example in the paper concerns the "corner fraction"  the extreme ranges of the priors  which one includes in the Bayesian evidence. In a 10D space this amounts to 99.8% of the volume, yet may have highly nontrivial physics. This causes difficulty with the idea that model selection is a higher level of parameter fitting  that one chooses acceptable models, then finds the fit parameters  because one may well have already discarded valid models before one seeks the specific, physically reasonable parameter fits.
Regarding the contours of Fig. 1, the Bayesian contour from calculating the uncertainty at each point in (w_{0},w_{a}) will be quantitatively very similar to the BIC contour shown (as has been seen in previous papers) and the key result of a "misleading" region will not change.
As to Lindley's paradox, or the disparity between Bayesian and frequentist approaches, as physicists we routinely "bet on" expanded models with 2−4σ significance that Bayesian evidence, based on simplicity, would tell us to discard. Consider extending the CDM model in the case of galaxygalaxy correlation function data or crosscorrelation data between galaxies and the CMB: physicists do embrace models with baryon acoustic oscillations and with the ISW effect because of the physics and despite any statistical paradox about their insignificance. As we say in the article, "model efficiency takes a back seat to physical fidelity". 

Back to top 


Syksy Rasanen
Joined: 02 Mar 2005 Posts: 128 Affiliation: University of Helsinki

Posted: March 01 2007 


Eric Linder wrote:  As to Lindley's paradox, or the disparity between Bayesian and frequentist approaches, as physicists we routinely "bet on" expanded models with 2−4σ significance that Bayesian evidence, based on simplicity, would tell us to discard. Consider extending the CDM model in the case of galaxygalaxy correlation function data or crosscorrelation data between galaxies and the CMB: physicists do embrace models with baryon acoustic oscillations and with the ISW effect because of the physics and despite any statistical paradox about their insignificance. As we say in the article, "model efficiency takes a back seat to physical fidelity". 
I don't see why this is an argument against model selection. If one feels that something is motivated from a physical point of view, that confidence in the physics can be encoded in the priors, right? Then the data will update one's priors, and quantify how welljustified the physical intuition was with regard to the observations. (More realistically, one can just interpret the statistical significance given by model selection according to one's own prejudice!)
Regarding the example discussed in the paper, I think that dark energy is a rather poorly chosen case for arguing using physics intuition instead of looking at the data with less preconceptions. If dark energy exists and is not vacuum energy, we have remarkably little physical idea about its nature and expected behaviour. (Witness the discussions on treating dark energy perturbations.) Scalar field models were advocated in the paper, but I don't see any theoretical argument in their favour (unlike for the Standard Model of electroweak interactions or other examples mentioned in section III). Assuming e.g. that the equation of state evolves slowly with redshift, or does not fall below −1, can seriously skew the analysis, but at present there is little reason to exclude such possibilities.
(BTW, I noticed there's a nice new paper by Zunckel and Trotta on evaluating the dark energy equation of state with minimal prejudice, astroph/0702695.)
Can you explain the smartest person analogy? 

Back to top 


John Peacock
Joined: 02 Mar 2007 Posts: 3 Affiliation: University of Edinburgh

Posted: March 02 2007 


I was quite pleased to see the Linder/Miquel paper, since I have always been uneasy about model selection. This is not to say that I dissent from the overall formalism of Bayesian Evidence  but Linder/Miquel correctly emphasise that the results you get using it can tell you more about the prior you picked than anything else. I can imagine circumstances in which model selection would work perfectly: e.g. contrast n_{s}=1 with some complete Landscape model in which you can calculate the (frequentist!) probability distribution of n_{s}. Model selection could then allow you to say whether there was evidence for or against the existence of the ensemble.
But we don't have such a complete theory, and an unjustified "assume a uniform prior on n_{s} between 0 and 2" is not an acceptable substitute. Therefore, either we can't use the Evidence apparatus, or we have to find a way of being explicit about the fact that the prior on n_{s} is not known. I'm quite drawn to the hyperprior approach: e.g. the prior on n_{s} is a Gaussian of some unknown width, so we need a prior on that width. But this draws you into an infinite tower of priors on priors. If you could somehow turn this into a convergent series, and sum it to yield a definite answer, that would be satisfying. But I can't see how to do this. 

Back to top 


Thomas Dent
Joined: 14 Nov 2006 Posts: 28 Affiliation: ITP Heidelberg

Posted: March 02 2007 


If the authors didn't want to issue a Jeremiad, I wonder why they used the word 'tainted' in their title  pretty tendentious.
On the other side, I'm not sure I understand what is going on astroph/0504022 with the 'Lindley paradox' (fig. 1), nor why the Bayesian result is necessarily better or more correct than the frequentist.
The only thing that can be making a difference to the Bayesian result in fig.1 is the width of the measured distribution relative to the prior. Since there is no scale on the ω axis the only thing we have to compare the distribution widths to is the width of the prior.
One could turn the procedure round and keep the same width of distribution, i.e. the same data, but vary the prior width. That would show that for broad priors the Bayesian approach favours ω = ω_{0}, whereas for narrower ones it favours a model with free ω. This seems a more a realistic way to argue, since in practice one obtains data with a given statistical distribution, and then one has to decide what to do about the priors.
This is though not a fatal point against Bayesian evidence, since there is no reason why we should not be able to arrive at a sensible estimate of the correct width and shape of priors (... now how much care and attention should be spent on that part of the procedure?)  which would ideally happen in advance of looking at the data.
But it does seem slightly counterintuitive that your judgment between two models can depend crucially on what you believed you knew before the data were taken.
Thomas 

Back to top 


Andrew Liddle
Joined: 28 Sep 2004 Posts: 21 Affiliation: University of Edinburgh

Posted: March 03 2007 


Dear All,
I wanted to pick up on a couple of points from the different mails in this interesting thread.
I don't see any reason to be afraid of the prior dependence of Bayesian results. Thomas says he finds the dependence on prior information counterintuitive, but remember that the Bayesian methodology is founded on continual updating of probabilities in light of data. How, then, could it not depend on prior assumptions? In fact, Bayes' theorem usefully provides a decomposition of the posterior probabilities into the likelihoods (the bit from the data) and the priors (the subjective bit). Hence we can easily see the extent to which the current conclusions are driven by data and/or priors. Obviously we want to get to the situation where conclusions are datadominated, but there is no harm in being able to properly understand what is going on in the regime where we still have significant prior information contributing to the posterior (certainly true for dark energy).
I see the ability to vary the priors, and hence explore the robustness of conclusions, as a significant strength of the Bayesian approach.
Eric's point about belief in ISW and baryon oscillations again misses the point. These phenomena are predicted by our standard simple model. They do not require extra parameters, and hence are unrelated to model selection issues. What would have been shocking would have been their absence, not their presence. [In frequentist terms, the null hypothesis is that the effects are present, and the data are consistent with that.]
I don't think anyone is claiming that the Bayesian approach is `more correct' than the frequentist one. The Bayesian approach is a consistent framework of logical inference, built around Bayes' theorem and the manipulation of probabilities. The frequentist approach is a set of rules which lack the coherence to be called a framework, but which nevertheless are mathematically consistent. So it is not a case that one is right and the other wrong; they are just different. Hence my objection above to the claim that Bayesian model selection `misguides' or `spuriously rules out' in some circumstances; that could only be true of the frequentist approach could be said to be correct (and hence that all Bayesian statisticians in the world should be immediately sacked).
But what we can argue about is which approach is more useful, which is indeed a topic worthy of debate.
best regards,
Andrew 

Back to top 


Moncy Vilavinal John
Joined: 21 Mar 2006 Posts: 3 Affiliation: St. Thomas College, Kozhencherry, Kerala, India

Posted: March 04 2007 


John Peacock wrote:
Quote:  Linder/Miquel correctly emphasise that the results you get using it can tell you more about the prior you picked than anything else 
Such apprehensions regarding priors mostly arise from the overlooking of the fact that the evaluation of Bayesian probability is a continuous process. As admitted by Andrew, Bayesian probability is not objective. Since priors are subjective, so are posteriors. If a coin gave 8 heads in 10 trials, the posterior probability you get at the end of the 10th trial (which in turn is your prior in the 11th trial) will not certainly be 0.5 (even if you have started the experiment with this value as prior). More precisely, the posterior probability we compute at any stage is not objectively verifiable. You can only update your plausibility assignment at the end of each trial. In this sense, Bayesian probability is not falsifiable – just as in the case of cosmology! [See also astroph/0506284]. However, in spite of all these, Bayesian theory is the most useful thing in making a decision on your own, under such conditions.
As I understand, the authors and also Peacock are worried about whether a competing model in a model comparison can get undue advantage by picking a suitable prior for some new parameter. But this anxiety is unfounded and can be dispelled once we recognize that Bayesian model comparison is not a onetime exercise. The posterior for that parameter, obtained in that analysis, must be used as prior in the future observation, and if the original prior is manipulated, there is every chance that this will turn out to be detrimental for that model.
This points to the need of discouraging the present practice of picking fresh priors in every new model comparison exercise. In other words, there should be rules for the game and they should be strictly obeyed! 

Back to top 


Thomas Dent
Joined: 14 Nov 2006 Posts: 28 Affiliation: ITP Heidelberg

Posted: March 05 2007 


OK, I think it's clearer now.
If the priors encode all relevant information apart from the current data under consideration, it is no surprise when different priors plus the same data can give different conclusions  the sum total of information is different.
In a frequentist approach one just combines old data with new data, of course with different sets of old data the result is different. The Lindley comparison doesn't look fair in that respect because one is comparing a frequentist who looks just at new data, versus a Bayesian who looks at new data plus something else which allows her to find a meaningful prior.
There is no real question of choosing the priors, you just have to decide which information you put into the prior and which you count as part of the data. For obvious reasons, if you exclude all experimental information from the prior there start to be problems...
In the case of the coin toss one's prior would be derived from observational data: that is, observation of what the coin looks like, and what similar coins have done in the past, and what kinds of trickery people did or didn't get up to in cointossing situations.
What I would find worrying is if priors are pulled out of hats because someone happens to find them reasonable guesses. But I don't think the situation is as bad as that. For example one might use preWMAP data for the prior over n_{s}, that would give you a perfectly wellbehaved distribution, which also encodes the fact that noone had a theoretical clue about its value apart from that it should fit older data.
Quote:  Bayesian model comparison is not a onetime exercise. The posterior for that parameter, obtained in that analysis, must be used as prior in the future observation, (...)
This points to the need of discouraging the present practice of picking fresh priors in every new model comparison exercise. In other words, there should be rules for the game and they should be strictly obeyed! 
In any approach there are potential difficulties with theoretical prejudices that one might want to give the status of information to. But if one admits old observations as part of the priors, a lot of the dependence on 'theory' might go away. Surely many things we might have as prior prejudices are just old observations in disguise. 

Back to top 


Jason Dick
Joined: 08 Nov 2005 Posts: 11 Affiliation: SISSA

Posted: March 06 2007 


Thomas Dent wrote:  In any approach there are potential difficulties with theoretical prejudices that one might want to give the status of information to. But if one admits old observations as part of the priors, a lot of the dependence on 'theory' might go away. Surely many things we might have as prior prejudices are just old observations in disguise. 
Well, if I might interject, I see two potential problems with this:
1. How do we know that the results will converge?
2. How, in this picture, do we check for consistency between different experiments?
My personal objection to making use of Bayesian evidence is that it has the problem where you explicitly need to have finite priors. With Bayesian parameter estimation, however, the question is often not so much what are my specific priors, but rather in what set of parameters am I going to assume uniform priors? In each case the priors are arbitrary, but it gives us a way to present the results of one experiment as independently as possible from other experiments, or not requiring a limitation to only those theoretical models where we have strong knowledge of the priors (depending upon from where one obtains the priors).
But no matter what we do, it seems, there is some degree of arbitrariness that is utterly unavoidable, and as a result I think the real lesson we should take away from this is that if we have an experimental result that claims that model X is ruled out with 90% confidence, we should in our minds expand that contour significantly so as to very qualitatively wrap in this arbitrariness both in modeling and in priors. I say qualitatively because even though there are methods of placing numbers on these things, I rather doubt that we can hope to do so in a nonarbitrary way.
This discussion has really made me all the more respectful of Andy Albrechts, "I won't get out of bed for less than four sigma," approach to ruling out models. 

Back to top 


Roberto Trotta
Joined: 27 Sep 2004 Posts: 18 Affiliation: Imperial College London

Posted: March 06 2007 


Thomas Dent wrote: 
On the other side, I'm not sure I understand what is going on astroph/0504022 with the 'Lindley paradox' (fig. 1), nor why the Bayesian result is necessarily better or more correct than the frequentist.
The only thing that can be making a difference to the Bayesian result in fig.1 is the width of the measured distribution relative to the prior. Since there is no scale on the ω axis the only thing we have to compare the distribution widths to is the width of the prior.
Thomas 
I'm looking at the new version which came out today  so now Figure 1 is Figure A1 (notice that the direction of the xaxis has been swapped. I thought it was clearer this way).
You are quite right, the difference in the Bayesian approach comes from the Occam's razor effect brought about by the 'wasted volume' of parameter space in going from the prior to the posterior. This can be undestood in terms of information gain, ie how much the data changed your knowledge as encapsulated by the prior (see Eq. (A6) in astroph/0504022). And yes, it is the prior that sets the only relevant scale in the problem: after all, there is no absolute notion of 'well measured parameter'. You have to specify 'well measured wrt what' (this is spelled out in some detail in astroph/0602378, section IIB). In other words, there is no inference without assumptions (no matter what Frequentists say).
Now in the example I've fixed the prior width, because I'm arguing that this is a quantity that encapsulates your expctations about the plausible values of the extra parameter under the more complicated theory before you see the data. I'm then comparing the model selection results for different values of the likelihood width (the three coloured Gaussian in the top panel), which however are all constructed by hand to be 1.96 sigma's away from the value predicted by the simpler model. This means that under a frequentist rejection test all of the three curves lead us exactly to the same conclusion  namely that ω_{0} as predicted by the simpler model is ruled out at the 2σ CL.
The whole point of Lindley's paradox is to illustrate that clearly the widest curve (red Gaussian) is not as informative as the strongly peaked one (cyan curve), if we understand 'information gain' as the increase in our knowledge in going from prior to posterior. Hence it is only natural that our conclusions re the viability of the two models ought to be different for different information content.
This can be intuitively understood: if you measure a parameter to lie 2σ away from the predicted value under the simple model but with a spread of the order of your prior for the more complicated model, than your relative belief in the two models will not be strongly affected (which is what you see for I<0 in the bottom panel of Figure A1, the curves are converging to odds of 1:1). But if you measure it with very high precision, say 10^{  10} times smaller than the prior scale (that would be I=10 on a log−10 scale), and you are still 2σ away from ω_{0}, then your confidence in the extended model should (correclty) be shattered, as it a surprising fact that this very strongly spiked measurement pops up in the vicinity of the predicted value under the smaller model.
The alternative possibility you mention, namely varying the prior width to assess the change in the model selection outcome following a change in one's prior beliefs, can also be done and it is an instructive exercice. But this is a different issue from Lindley's paradox. It is a way of assessing which change in your prior (ie, model predictivity) would be needed to chance considerably the model selection result. This has been carried out eg in Figure 2 of astroph/0703063, showing that the outcome remains essentially the same unless you are ready to entertain quite unreasonable prior beliefs about n_{S}. Another example (leading to different conclusions) is Figure 1 of astroph/0607496 for dark energy models. In general, a more restritive class of models (ie, with a narrower prior) which is compatible with the data will not be ruled out my model selection, but you will only get a noncommittal result of equal posterior odds. 

Back to top 


Jason Dick
Joined: 08 Nov 2005 Posts: 11 Affiliation: SISSA

Posted: March 07 2007 


Roberto Trotta wrote:  You are quite right, the difference in the Bayesian approach comes from the Occam's razor effect brought about by the 'wasted volume' of parameter space in going from the prior to the posterior. This can be undestood in terms of information gain, ie how much the data changed your knowledge as encapsulated by the prior (see Eq. (A6) in astroph/0504022). And yes, it is the prior that sets the only relevant scale in the problem: after all, there is no absolute notion of 'well measured parameter'. You have to specify 'well measured wrt what' (this is spelled out in some detail in astroph/0602378, section IIB). In other words, there is no inference without assumptions (no matter what Frequentists say). 
Well, I think everything you've said is correct. But my only objection is that in a frequentist analysis, you aren't even worrying about the added volume. The only issue of interest as far as the priors are concerned is the shape of the priors within the region where there is significant probability. The shape of the priors outside that region has no effect whatsoever.
So while both analyses are sensitive to rather arbitrary prior choices, the model selection approach has an added sensitivity to the total volume of the space given by the prior choices. I guess I'm just a bit pessimistic that there really is a nonarbitrary, intelligent solution to the problem of how to factor in this sort of "Occam's Razor factor" into the problem of how to rule out models. 

Back to top 


Thomas Dent
Joined: 14 Nov 2006 Posts: 28 Affiliation: ITP Heidelberg

Posted: March 07 2007 


Thanks to Roberto for clarifying.
What I think I am trying to get at is the following. The idea for models with extra parameters does not spring up in a vacuum, they are developed in the light of older less precise data which allowed an 'interesting' space for deviation away from simpler models.
This fact, surely, solves the problem of 'models' with theoretically undetermined or very poorly determined parameters. Because there is older data which motivated the model, the prior state of knowledge, or belief, or prejudice or whatever, included the belief that the value of the parameter should be consistent with the old data.
Therefore the older data should be used as part of the prior. If theoretically there is no clue what the distribution over the parameter should be, the (posterior resulting from) older data is all one has. This would be appropriate in the case when the 'model' is simply something with no particular theoretical motivation, like 'allow n_{s} \neq 1'. The question would then be 'given what was known about the parameter in the more complicated model because of the old data, how does it compare with the simpler model looking at the new data'.
If the new data are only a slight improvement over the old, the prior would then be pretty narrow and we are at the case where frequentist and Bayesian point the same way. (Though 2 sigma is absolutely nothing to be excited about, and if one is honest, nothing even to draw attention to.) This means that if (say) the universe is measured to be flat with gradually better and better precision, we will never definitely be able to rule out the 'model' in which it deviates from flatness, if that amount of deviation becomes smaller and smaller...
But if we have no reason to believe that any size of deviation from flatness or scale invariance is more likely than any other, I don't see what else can be done. After all, very small deviations may be very physically significant. The real problem here is 'models' which amount to tweaking a parameter without clear physical motivation.
But if one does have a physically motivated model which produces on its own a meaningful probability distribution (as physical models should!), it does make sense to use that as the prior and test it with respect to all the data, to get a clean inference.
Thomas 

Back to top 




You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum

