Page 1 of 1

[astro-ph/0307080] Precision Primordial $^4$He Measurement w

Posted: November 04 2004
by Alessandro Melchiorri
Hi all, just saw this paper today on the web. It is definitly an interesting paper but I have a question concerning the constraints on the Helium abundance from CMB data alone.
The authors of this paper claim Y=0.25\pm0.02 (see Fig.2) while Trotta and Hansen in
astro-ph/0306588 claim 0.16<Y<0.5, so error bars more than 10 times larger!!!
Is this due to different priors ?

[astro-ph/0307080] Precision Primordial $^4$He Measurement w

Posted: November 05 2004
by Greg Huey
We know about the Hansen & Trotta limits, and that they are much larger than our CMB-only limits. We had a protracted discussion with them about the various possible reasons. Its been quite a while (well over a year) so my memory is a bit rusty.

As I recall, there are a couple of reasons for the differences. One is that they have much sparser high-l experimental coverage than we do. I think they used only one other eperiment besides WMAP (but I might be mistaken), while we used the high-l bands of most of the recent CMB experiments. High-l coverage is important, as I recall, in reducing degeneracies with and between other parameters.

We did check some of their degenerate models (using their set of experiments) to see if we could reproduce their large error range for He4. We were able to reproduce the goodness-of-fits for
some of these. I can't recall now if we checked their full 0.16<Y<0.5 range, or a smaller range.

Another issue I recall is that they resricted their Markov chains to models that were flat, no tensors (?) and maybe some constraints that we relaxed. It may seem counter-intuitive that a larger parameter space can yield tighter bounds on parameters. However, this is Gaussian intuition, and we found
the likelihood function between our best-fit model and their range of parameter space to be significantly non-Guassian. Our best-fit was a much better fit than their best fit, and the likelihood function was much more sharply peaked around the former than the latter.

We did check convergence of our Markov chains with a multivariate version of the test used by WMAP. We also tested the relative volume of slices of parameter space several different ways. All of these results supported our conclusion that our Markov chains had converged.

You can play with these Markov chain pointsets and run convergence tests yourself online at:

BTW, I am guessing that you only now noticed this paper because we recently updated the online version to match the journal version - we had forgotten to do that for about a year.

Greg Huey

Posted: November 05 2004
by Roberto Trotta
While it is true that we had somewhat less data in the high-ell region (WMAP+CBI+ACBAR), we have compelling reasons to believe that the errors given by Huey et al cannot be correct:

the effect of 4He is limited to a change in the efficiency of the tight coupling regime, which translates into a slightly different Silk damping. A change of Y_p by 10% affects the T spectrum at the level of 2% for ell=1800 (see our figure 3). There is also a change in the reionization bump, but this is degenerate with tau or with changes in the reionization history. so the physical impact is minimal: we don't see which experimental coverage (short of Planck) could possibly detect a 0.4% change above ell=1000 in an 8 dimensional parameter space.

In the Huey et al paper there is no discussion of how their very precise constraints physically come about. An indeed, it is hard to see how marginalizing over a non-gaussian, higher dimensional parameter space could yield smaller constraints...

BTW, the Gellman & Rubin criterium is a test for mixing, not for convergence. We are sorry that, despite intesive email exchange, we could not establish the reason for this discrepancy. We still believe it has to be searched in a severe undersampling of the tails in the MC method by Huey et al. In fact, in the 2 figures below

we compare two spectra which, according to Huey et al, should be 5 sigmas apart (one has Y_P = 0.24, the other Y_p = 0.30, the remaining parameters slighlty tuned). They are indistinguishable.

[astro-ph/0307080] Precision Primordial $^4$He Measurement w

Posted: November 05 2004
by Greg Huey
> ... we have compelling reasons
> to believe that the errors given
> by Huey et al cannot be correct:

No, I don't think you do. Have you done your own analysis in the same parameter space as ours, with the same experiments? Briefly, if you want to claim we are wrong, do the same analysis, find any errors, correct them, and report the correct results.

> we compare two spectra which,
> according to Huey et al, should
> be 5 sigmas apart ...
> They are indistinguishable.

I think this pretty much sums up your logic in this whole exchange. You previously sent us these flat models and said the same thing. I explained to you then that the models you sent were not distinguishable from each other at 5 sigma. However, they ARE BOTH distinguishable from our best fit model at around 5 sigma (I don't recall the exact numbers off the top of my head).

Ok, so instead of reproducing our previous discussions again, lets resume where we previously left off:

Do you recall the three tests I did of our MC results and sent to you and Hansen? These independent tests all supported our results. I don't recall ever getting a reply. Please
in your next post explain why you think those tests are correct or not. If you can't find it I can dig that message out of my email buffer and post it here.

To summarize, we have checked your results and think they are procedurally correct. If you disagree with our results, then find our procedure error - demonstrate what we did wrong by redoing what we did (you have had over a year to do so). If you think the parameter space was undersampled by our Markov chains, then run your own, and show that is the case. Also explain what is wrong with all of the non-Markov chain tests we have conducted and explained to you.

[astro-ph/0307080] Precision Primordial $^4$He Measurement w

Posted: November 05 2004
by Richard Cyburt
Hi All,

Thought I'd put my two cents in. I think there are 2 important points that both lines of discussion have raised. Both raising issues in each of the works.

1. Have the Huey, Cyburt and Wandelt chains sampled a large enough volume of parameter space?

We went to great lengths to understand the differences between the Trotta and Hansen chains and our own. If the chi^2 is as broad in the Yp direction, then one would expect the random walk in Yp space to extend quite far into the space of increasing chi^2. The fact that we do not see this, and in fact see the random walk turn around as close to our central value, suggests the distribution is not as broad as Trotta and Hansen claim. Perhaps their proposal density is still infuencing their results.

Of course, I cannot rule out completely that we have undersampled parameter space, but evidence suggests otherwise. That is definitely a fair point, addressed by my collegue Greg Huey and I hope a point of further discussion.

2. Trotta and Hansen have excluded important data that influences strongly the constraint on Yp.

Trotta and Hansen use WMAP data up to ell~800 and CBI and ACBAR beyond that. In the paper they say that the constraint from Yp comes from multipoles with ell>~400. To be fair, WMAP is not precise between 350<ell<800. So the region Yp is sensitive to, is not adequately probed by the Trotta and Hansen chains. In the Huey, Cyburt and Wandelt work, we include non-WMAP data at ell >~350. Including the data from DASI, BOOMERANG and others. This data now probes the power spectrum range Yp is sensitive too, more so than the data set adopted by Trotta and Hansen.

I also believe "chi-by-eye" arguments should taken with caution and one should rely on something more quantitative.

I feel that these two points are the most important to this discussion and I hope to hear more about each.

Posted: November 05 2004
by Steen H. Hansen
Ok, I feel the need to bring the discussion to a simpler physical level.

The Huey et al did agree that the results found by Trotta & Hansen are correct
in the parameter space that Trotta & Hansen considered.

The Huey et al suggest that *enlarging* the parameter space, by including tensors
in particular, will significantly *reduce* the allowed parameter space for helium.
This is so counter-intuitive that it certainly would have deserved an explanation
in the Huey et al paper - I am surprised than no referee asked for this.
My general feeling is, that when a result is this weird, then one should maybe not
trust the results blindly, but instead take a step back and ask if this makes any
sense from a physical point of view.

There were Fisher matrix forecasts done previously, in particular
Kaplinghat, Knox, Song, astro-ph/0306052
Eisenstein, Hu, Tegmark, astro-ph/9807130
and they agree very well with the results of Trotta & Hansen, and they therefore
disagree wildly with the results of Huey et al. This would also need an explanation
in the Huey et al paper.

[astro-ph/0307080] Precision Primordial $^4$He Measurement w

Posted: November 06 2004
by Greg Huey
As far as we could tell by our limited attempts to reproduce some of your results - using your experimental data selection and your parameter space - your analysis does not have any apparent procedural errors. This certainly does not make the difference in our results shocking, particularly given the different experimental data we used.

We are certainly not blindly accepting computational results - Rich, Ben and I have checked our results with several different approaches, and explained them in detail to you in previous email exchange, and more briefly here. The biggest cause in the difference in our results, as Rich explained well, is probably that you had much thinner experimental coverage at high-l - fewer experiments, and throwing out non-WMAP bins in the range 350<l<800. A significant, but subdominant secondary effect is that your parameter space is limited, and does not include the region where the likelihood is sharply peaked. We allowed curvature as well as tensors in our parameter space. The resulting effect is perhaps surprising, but not revolutionary. It is well known that accurate, reliable parameter estimation requires careful, reliable statistical methods (ie: Markov chains instead of Fisher matrices), and careful attention to the effect of priors & assumptions. Is the "error reduction through more parameters" effect worthy of publication? Perhaps somewhat.

Greg Huey

Posted: November 06 2004
by Patrick McDonald
In the context of this discussion...
I think it would be nice if every Markov chain analysis was backed
up by an old-fashioned chi^2 minimization analysis.
(If what I'm about to say is wrong, I'm sure lots of
people would be enlightened by an explanation of why.)

After running the MC until your convergence test's content, run a
minimizer (maybe starting from the best MC point) to find the
absolute minimum chi^2. Then, fix your parameter of interest to,
say, the 3 sigma upper limit from your MC, and run a minimization
with respect to the rest of the parameters (maybe starting from
the MC point closest to matching your fixed value for the parameter
of interest). This would give you Delta chi^2 between the 3 sigma
point and the best fit. I realize that if this isn't ~9, it may
be because your MC is telling you something useful and correct;
however, if it is ~1 or ~100, it seems that at the very least
an explanation is required of why it is correct to rule out a
model that fits almost as well as the best model, or not rule
out grossly bad fitting models - more likely, finding
Delta chi^2=1 or 100 would mean something is wrong with your MC
(or your minimizer of course).

My assumption here is that, once you think you've basically mapped
out the parameter space with the MC, a derivative-based minimization
will take substantially less time than the original MC, so this
test could be applied to all the most interesting 1 parameter limits
without much cost.
Of course, if the minimization is not fast this is all irrelevant.

This would also provide a natural starting point for diagnosing a
disagreement, e.g., in the case at hand, instead of "your MC hasn't
converged", "yes it has", Huey et al. could show how the best fit
for a Y_p value outside their constraint is poor (including what
data makes it poor - I know, they could miss the minimum,
but you have to start somewhere).

[astro-ph/0307080] Precision Primordial $^4$He Measurement w

Posted: November 06 2004
by Greg Huey
One can minimize - or run a Markov chain - on fixed Yp slices. These are two of the approaches we tried to see if we reveal any sort of inconsistancy with our results - which would suggest the original Markov chain set had not converged. We also tried fitting the likelihood to a gaussian at different distances (well, different delta chi^2) from the best-fit point. After a stage of the fit one runs more points along the eigenvector directions and fits again, until the fit stabilizes. These tests all suggested what I have said previously.


Posted: November 06 2004
by Patrick McDonald
So what is chi^2 for the best overall fit, and what is it for the best fit with Y_p=0.5?

Posted: November 06 2004
by Patrick McDonald
Or maybe I should have picked something less extreme like 0.35 (still ~10 sigma).

[astro-ph/0307080] Precision Primordial $^4$He Measurement w

Posted: November 10 2004
by Greg Huey
Hello Pat,

I went back and found a summary of our tests of the convergence of the Markov chains. However, there is terminology or interpretation of the results that I don't understand - I will confer with the collaborators on this and post a detailed summary here.

Sorry about the confusion,
Greg Huey

[astro-ph/0307080] Precision Primordial $^4$He Measurement w

Posted: November 25 2004
by Roberto Trotta
It seems that there is a general consensus up to now that the results in Trotta & Hansen are correct, given the data used and the (minimal) parameter space we chose. Regarding the results of Huey et al, I think it might be useful to distinguish between two different questions which need clarification. The first relates to the physics, the second to the MC.
  • 1) The physics: if indeed the data used in Huey et al have more discriminatory power than the compilation used in Trotta & Hansen, then the difference must come from the range 350 < ell < 800 (as Cyburt pointed out). Going back to our Figure 3 (which I reproduce here for your convenience. Bottom panel shows the % change when changing Y_P by +/- 10%, i.e. by 0.024), it seems that the impact of He in this range (TT) is quite small (compare with Figure 1 of Huey et al, where their data are plotted). That is, unless there is some other physical mechanism involving He which we (and all previous studies which have produced Fisher Matrix forecasts in agreement with our results) have missed - but which is not discussed in Huey et al, either- which would make the effect of He in that multipole range much larger. This is related to the question raised by McDonald.
  • 2) The MC: I am sure that Huey et al went through great pain to check their chains and MC algorithm. But on a simpler level, let me try and clarify one point. From Greg's reply, it seems that their best fit point lies in a region with non-zero tensors and/or curvature (which in their reasoning seems to justify why they get smaller errors - essentially because of more data, point 1 above, and different parameter space, discussed here). Let us assume that this is indeed the case, and for definiteness let's say that their best-fit point has r = C_2^T / C_2^S > 0.

    Then, if the shape of the posterior is flat for r -> 0, it follows that for r=0 or as small as to be negligible they should be integrating over our subspace of models when they marginalize out all other parameters but Y_p. Hence, their errors in this case should be at least as large as ours (perhaps slightly smaller given the more data they use).

    If the posterior drops sharply along r (which would exclude r = 0 from their marginalization), then they must have a many sigma detection of tensors, which by itself would be quite an interesting result. Indeed, from a previous email exchange with Greg, he told us that with their data, and restricting themselves to flat models only without tensors, they get a best fit chi^2 of 1491.4. When they add curvature and tensors, their best fit becomes 1483.4, which seems to suggest that they are indeed detecting non-zero tensors or curvature (?). This however cannot be the case with present-day data.

    Another possibility is that the posterior is bimodal, and that their MC has missed the second peak (around r = 0). The same reasoning applies to curvature. Furthermore, if we consider the direction along Y_p, in order to get any effect at all at the level of \Delta Y_p = 0.01 there must be some extra physical mechanism at work, as addressed in point 1) above.
In conclusion it would be very informative if we could see the marginalized posterior and mean likelihood along all directions in the parameter space of Huey et al. A point which might explain some of the issues is that (to the best of my knowledge, as I deduced from their paper) they start all of their chains from the same point in parameter space, and that they impose a rather high acceptance rate, which in my opinion could lead to severe undersampling of the tails (or, taken together, to the fact that their routine misses altogether the second peak of the posterior for a bimodal distribution).

[astro-ph/0307080] Precision Primordial $^4$He Measurement w

Posted: February 10 2005
by Greg Huey
Sorry for the lateness of this reply - I only recent noticed a new post (yours) here.
We are currently developing Importance-Sampling/Kernel Density Estimation code for
use for cosmological parameter estimation. One test case we will do is a repeat of
He4 MCMC analysis, but with IS/KDE instead of a MCMC approach. We expect the results
to be identical - and if they are, this would be strong support for our previous MCMC
results. While the IS/KDE approach is still under development, a false positive agreement
with the MCMC results would be highly unlikely (due to the approaches being very different).
So, I think this check with the IS/KDE code will settle the issue of whether our MC chains
under-sample the tails.

I don't believe the likelihood distribution is bimodal, as you suggest, with a second peak
at r=0 that is as good a fit as the peak at our best-fit point (r!=0, nonflat, etc).
Our best-fit point was a much better fit than your flat, r=0 best-fit point. As I recall -
and it has been a while, we did a detailed analysis of how the distribution tails varied
in the Yp direction from our 1-sigma bounds, out to your best-fit Yp (flat, r=0). It
appeared the variation between them was smooth and monotonic - I don't recall any suggestion
of the likelihood rising again as one approached your best-fit model.

Your reasoning about detection of tensors is, I think, assuming Gaussianity. To get the
error on Yp alone we marginalized over r, n_t, etc, This gives a likelihood function of
Yp. Now, each value of Yp has a maximal likelihood point in it's Yp=constant slice of
parameter space. It may be that one has to go out very far on the tail to find a Yp
value that has it's Yp=constant maximal likelihood peak at r=0 - say, perhaps 99.9%
excluded. That is NOT the same thing as r=0 being excluded at 99.9%. To get a bound on
r alone, you must marginalize over Yp and not marginalize over r, and then use that
likelihood function of r to find the best-fit and exclusion bounds on r. The bottom line
that simple reasoning concerning peaks in the likelihood distribution works when the
distribution is Gaussian - and may fail when the distribution is non-Gaussianity. We did
see strong evidence of non-Gaussianity in out MCMC point distribution. When the above
IS/KDE analysis is done, it might be worth it for us to actually do this analysis for r -
to determine exactly at what confidence we exclude r=0. I am not aware of any previous
parameter estimation that has been done allowing non-flat, tensors, variable Yp and
not imposing the inflationary relation between r and n_t.

While we work on the IS/KDE approach, why don't you work on redoing our MCMC analysis?
I don't believe any special code modification was required. It was just a matter of crunching
CPU. Then we could compare the same thing, instead of two different things.

Greg Huey

[astro-ph/0307080] Precision Primordial $^4$He Measurement w

Posted: January 10 2006
by Roberto Trotta
A new, independent analysis set out to clarify this issue. Ichikawa & Takahashi find (astro-ph/0601099) that 0.17 < Yp < 0.52 (1 sigma), in excellent agreement with our own results.