[astro-ph/0702542] Tainted Evidence: Cosmological Model Sele
Posted: March 08 2007
Dear Thomas,
Remember that we are updating model priors as well as parameter priors; if you advocate carrying forward the parameter priors from previous experiments, you should carry forward the model likelihoods too. If you progressively constrained Omega_tot better and better around 1, the model likelihood might change only a little at each step, but eventually these will all add up and give a decisive verdict. After all, it would be a suspect method if the ultimate result were different if we applied the same data bit-by-bit rather than all at once.
Data may well motivate new models, and then one should be careful not to also calculate the evidence of the new model from the same data, as then the data will be being double-counted. If data motivate a new model that is good, but new data is then needed to compare that model against others. [Eg, after WMAP1, someone could try saying that they had a model predicting precisely the value Omega_tot=1.02 that that dataset gave. But what they shouldn't then do is compute the evidence from the same data (which would indeed support that model); instead you wait for more data to come along, eg WMAP3 which no longer supports 1.02.]
Your ready dismissal of two-sigma results highlights a point about the frequentist method. According to this method, a two-sigma result should be correct about 95% of the time, and hence surely ought to be taken very seriously. Yet we all know that two-sigma results are correct much less often than that. Lindley's `paradox' may be part of the reason; Bayesian methods set a significantly higher `bar' that must be crossed for a result to be taken seriously.
best,
Andrew
Remember that we are updating model priors as well as parameter priors; if you advocate carrying forward the parameter priors from previous experiments, you should carry forward the model likelihoods too. If you progressively constrained Omega_tot better and better around 1, the model likelihood might change only a little at each step, but eventually these will all add up and give a decisive verdict. After all, it would be a suspect method if the ultimate result were different if we applied the same data bit-by-bit rather than all at once.
Data may well motivate new models, and then one should be careful not to also calculate the evidence of the new model from the same data, as then the data will be being double-counted. If data motivate a new model that is good, but new data is then needed to compare that model against others. [Eg, after WMAP1, someone could try saying that they had a model predicting precisely the value Omega_tot=1.02 that that dataset gave. But what they shouldn't then do is compute the evidence from the same data (which would indeed support that model); instead you wait for more data to come along, eg WMAP3 which no longer supports 1.02.]
Your ready dismissal of two-sigma results highlights a point about the frequentist method. According to this method, a two-sigma result should be correct about 95% of the time, and hence surely ought to be taken very seriously. Yet we all know that two-sigma results are correct much less often than that. Lindley's `paradox' may be part of the reason; Bayesian methods set a significantly higher `bar' that must be crossed for a result to be taken seriously.
best,
Andrew