[astroph/0409768] Constraining Neutrino Masses by CMB Experiments Alone
Authors:  Kazuhide Ichikawa, Masataka Fukugita, Masahiro Kawasaki 
Abstract:  It is shown that a subelectronvolt upper limit can be derived on the neutrino mass from the CMB data alone in the Lambda CDM model with the powerlaw adiabatic perturbations, without the aid of any other cosmological data. Assuming the flatness of the universe, the constraint we can derive from the current WMAP observations is \sum m_nu < 2.2 eV at the 95% confidence level for the sum over three species of neutrinos (m_nu < 0.75 eV for the degenerate neutrinos). This constraint may be loosened if we abandon the flatness assumption but only by up to 4% for Omega_tot=1.02, the WMAP limit on the spatial curvature. We argue that it would be difficult to improve the limit much beyond \sum m_nu \lesssim 1.5 eV using only the CMB data, even if their statistics are substantially improved. However, a significant improvement of the limit is possible if an external input is introduced that constrains the Hubble constant from below. The parameter correlation and the mechanism of CMB perturbations that give rise to the limit on the neutrino mass are also elucidated. 
[PDF] [PS] [BibTex] [Bookmark] 

 Posts: 183
 Joined: September 24 2004
 Affiliation: Brookhaven National Laboratory
 Contact:
[astroph/0409768] Constraining Neutrino Masses by CMB Exper
The article claims to get a decent constraint on neutrino mass from CMB data alon. I always though you cannot constrain neutrino mass from CMB alone and that this is the main reason why WMAP didn't put constraint on neutrino mass from its data alone, but combined with LSS (basically you get other parameters from CMB and LSS then constrains omega_nu in my view)
Authors of this article use a very complicated data reduction procedure in which they find global minimum in chi^2 by comparing lots of CMBFAST models with data and then they constrain position and ration of peak height which they in turn use to constrain omega_nu. What is the advantage of doing this instead of straight MCMC and how can you reliably estimate errors in this manner? Is there something fundamentally new that I overlooked?
Authors of this article use a very complicated data reduction procedure in which they find global minimum in chi^2 by comparing lots of CMBFAST models with data and then they constrain position and ration of peak height which they in turn use to constrain omega_nu. What is the advantage of doing this instead of straight MCMC and how can you reliably estimate errors in this manner? Is there something fundamentally new that I overlooked?

 Posts: 128
 Joined: September 24 2004
 Affiliation: University of Rome
 Contact:

 Posts: 144
 Joined: September 24 2004
 Affiliation: University College London (UCL)
 Contact:
delta chisq and confidence limits
Hi, yes this result is quite surprising.
I think looking at the best fit point for fixed values of [tex]\omega_{\nu}[/tex] is quite interesting in itself, even though the interpretation in terms of probability as a function of [tex]\omega_{\nu}[/tex] is not so simple.
It is surprising to me that you get such a big [tex]\Delta \chi^2[/tex] for such a small change in the neutrino mass.
It also seems like a nice idea to me to look at how the features of the CMB power spectrum depend on [tex]\omega_{\nu}[/tex].
Not sure how significant this point is, but re interpretation of [tex]\Delta \chi^2[/tex] in terms of [tex]{\rm Pr}(\omega_{\nu})[/tex]:
The authors' main result comes from the observation that [tex]\Delta \chi^2=4[/tex] for [tex]\omega_{\nu}=0.024[/tex] which can be converted into [tex]\sum m_{\nu} = 2.25[/tex] as quoted in the abstract.
They interpret this as a 95 per cent detection, since [tex]\Delta \chi^2=4[/tex].
However this conversion is only possible for a 1d Gaussian.
Whereas I think that if they are going to extrapolate the results from a Gaussian then a 7d Gaussian would be more appropriate, since they are looking at a minimum in 7d space.
The [tex]\Delta \chi^2[/tex] for a 95 per cent detection is much higher for higher dimensions.
The table in Numerical Recipes only goes up to 6d, but for 95 per cent confidence in 6d you need [tex]\Delta \chi^2=12.8[/tex]. For a 6d Gaussian [tex]\Delta \chi^2=7[/tex] corresponds to 68 per cent.
So maybe for a 7d Gaussian [tex]\Delta \chi^2=4[/tex] is about a half sigma detection?
Also, if you have a 1d Gaussian peaked at [tex]x=0[/tex] and you say that x>0, then you would need [tex]\Delta \chi^2[/tex] greater than 4 for a 95 per cent confidence detection.
Since [tex]\omega_{\nu}[/tex]>0 then this would also need to be taken into account too which would also weaken the detection.
I think looking at the best fit point for fixed values of [tex]\omega_{\nu}[/tex] is quite interesting in itself, even though the interpretation in terms of probability as a function of [tex]\omega_{\nu}[/tex] is not so simple.
It is surprising to me that you get such a big [tex]\Delta \chi^2[/tex] for such a small change in the neutrino mass.
It also seems like a nice idea to me to look at how the features of the CMB power spectrum depend on [tex]\omega_{\nu}[/tex].
Not sure how significant this point is, but re interpretation of [tex]\Delta \chi^2[/tex] in terms of [tex]{\rm Pr}(\omega_{\nu})[/tex]:
The authors' main result comes from the observation that [tex]\Delta \chi^2=4[/tex] for [tex]\omega_{\nu}=0.024[/tex] which can be converted into [tex]\sum m_{\nu} = 2.25[/tex] as quoted in the abstract.
They interpret this as a 95 per cent detection, since [tex]\Delta \chi^2=4[/tex].
However this conversion is only possible for a 1d Gaussian.
Whereas I think that if they are going to extrapolate the results from a Gaussian then a 7d Gaussian would be more appropriate, since they are looking at a minimum in 7d space.
The [tex]\Delta \chi^2[/tex] for a 95 per cent detection is much higher for higher dimensions.
The table in Numerical Recipes only goes up to 6d, but for 95 per cent confidence in 6d you need [tex]\Delta \chi^2=12.8[/tex]. For a 6d Gaussian [tex]\Delta \chi^2=7[/tex] corresponds to 68 per cent.
So maybe for a 7d Gaussian [tex]\Delta \chi^2=4[/tex] is about a half sigma detection?
Also, if you have a 1d Gaussian peaked at [tex]x=0[/tex] and you say that x>0, then you would need [tex]\Delta \chi^2[/tex] greater than 4 for a 95 per cent confidence detection.
Since [tex]\omega_{\nu}[/tex]>0 then this would also need to be taken into account too which would also weaken the detection.

 Posts: 183
 Joined: September 24 2004
 Affiliation: Brookhaven National Laboratory
 Contact:
I see what do you mean, but my main objection is still against their method: basically, by reducing the entire power spectrum to 6 numbers representing the position of the peak, they are throwing away a lot of data... The correct thing would be to run MCMC and then simply look at the marginalised probability dist. for [tex]\omega_\nu[/tex]: and then they will probably get a half sigma detection or something as you say.
While analyzing latest VSA data batch there was some uproad re neutrino mass, but I think it turned out to be just sampling artifact... Have a look at [tex]f_\nu[/tex] plot on Tegmark's site (http://www.hep.upenn.edu/~max/), even for [tex]f_\nu =1[/tex] changes are fairly small, while their detection is for [tex]f_\nu[/tex] <<1.
While analyzing latest VSA data batch there was some uproad re neutrino mass, but I think it turned out to be just sampling artifact... Have a look at [tex]f_\nu[/tex] plot on Tegmark's site (http://www.hep.upenn.edu/~max/), even for [tex]f_\nu =1[/tex] changes are fairly small, while their detection is for [tex]f_\nu[/tex] <<1.

 Posts: 144
 Joined: September 24 2004
 Affiliation: University College London (UCL)
 Contact:
checking delta chisq
I got the impression they only used the additional fitting parameters later in the paper, and that the main result does not use this.
I guess it would be very easy to check the [tex]\chi^2[/tex] values for the two sets of parameters in rows 1 and 5 of Table 1..
If these are right (and assuming they really do find the minimum in 6d, which seems likely given that they have found a lower minimum than those in Spergel et al and Tegmark et al) then it is surely just a question of how one interprets a [tex]\Delta \chi^2[/tex] of 4 ?
I guess it would be very easy to check the [tex]\chi^2[/tex] values for the two sets of parameters in rows 1 and 5 of Table 1..
If these are right (and assuming they really do find the minimum in 6d, which seems likely given that they have found a lower minimum than those in Spergel et al and Tegmark et al) then it is surely just a question of how one interprets a [tex]\Delta \chi^2[/tex] of 4 ?

 Posts: 183
 Joined: September 24 2004
 Affiliation: Brookhaven National Laboratory
 Contact:
Re: checking delta chisq
Sarah Bridle wrote: If these are right (and assuming they really do find the minimum in 6d, which seems likely given that they have found a lower minimum than those in Spergel et al and Tegmark et al) then it is surely just a question of how one interprets a \Delta \chi^2 of 4 ?
Ahh, sorry, I misread the paper, now I see what they are doing... Now it sounds somewhat more sensible, although I am not sure either how should one interpret this [tex]\Delta \chi^2[/tex] of 4... Need to think about...

 Posts: 128
 Joined: September 24 2004
 Affiliation: University of Rome
 Contact:
Uhm, no I don't believe this result. 710 eV would be a more reasonable limit.
BTW a more general point.
I always feel a bit confused in interpreting a \delta chi2=4 as 95% c.l. when the best fit is 1428 and the degrees of freedom are 1341. You should also consider that many of the degrees of freedom quoted come from datapoints at very small scales (or from TE) with big error bars so their contribution to the overall chi2 is small and considering them IMHO is like cheating a bit. What do you think ? :)
conservative Al.
BTW a more general point.
I always feel a bit confused in interpreting a \delta chi2=4 as 95% c.l. when the best fit is 1428 and the degrees of freedom are 1341. You should also consider that many of the degrees of freedom quoted come from datapoints at very small scales (or from TE) with big error bars so their contribution to the overall chi2 is small and considering them IMHO is like cheating a bit. What do you think ? :)
conservative Al.

 Posts: 183
 Joined: September 24 2004
 Affiliation: Brookhaven National Laboratory
 Contact:
Yeah, I tend to agree. There is a well prescribed procedure to calculate the marginalised probability distribution for [tex]\omega_\nu[/tex] and this method gives a few eV as you say. Since they are not putting in extra physics they cannot obtain stronger constraints unless cheating. :)Alessandro Melchiorri wrote:Uhm, no I don't believe this result. 710 eV would be a more reasonable limit.
BTW a more general point.
I always feel a bit confused in interpreting a \delta chi2=4 as 95% c.l. when the best fit is 1428 and the degrees of freedom are 1341. You should also consider that many of the degrees of freedom quoted come from datapoints at very small scales (or from TE) with big error bars so their contribution to the overall chi2 is small and considering them IMHO is like cheating a bit. What do you think ? :)

 Posts: 144
 Joined: September 24 2004
 Affiliation: University College London (UCL)
 Contact:
Delta chisq and dof
I agree with Alessandro's criticism that looking at the [tex]\chi^2[/tex] per dof is misleading when you have lots of noisy data. But I think this is a separate issue from the issue of interpreting [tex]\Delta \chi^2[/tex].
If the probability distribution in 7d were a 7d Gaussian then it would be possible to look at the [tex]\Delta \chi^2[/tex] and convert into confidence levels.
If I had a 2d Gaussian in x and y I could do something analogous to what they did with the neutrino mass:
* fix x
* find the best fitting y
* note down the chisq at this best fit y and fixed x
* repeat for all values of fixed x
* plot [tex]\chi^2[/tex] at the best fit y as a function of fixed x
* find the minimum wrt x
* find the x value where the [tex]\chi^2[/tex] is increased by 4 wrt this minimum
The question is, how does this x value relate to the 95 per cent marginalised limit on x? I am saying that it does not have any relation for an nd Gaussian where n \ne 1.
I am saying that when they say 95 per cent confidence they should have put about 10 or 20 per cent confidence (and that's assuming its all Gaussian..)!
Please could someone check I'm not going mad about this point? Thanks very much indeed.
If the probability distribution in 7d were a 7d Gaussian then it would be possible to look at the [tex]\Delta \chi^2[/tex] and convert into confidence levels.
If I had a 2d Gaussian in x and y I could do something analogous to what they did with the neutrino mass:
* fix x
* find the best fitting y
* note down the chisq at this best fit y and fixed x
* repeat for all values of fixed x
* plot [tex]\chi^2[/tex] at the best fit y as a function of fixed x
* find the minimum wrt x
* find the x value where the [tex]\chi^2[/tex] is increased by 4 wrt this minimum
The question is, how does this x value relate to the 95 per cent marginalised limit on x? I am saying that it does not have any relation for an nd Gaussian where n \ne 1.
I am saying that when they say 95 per cent confidence they should have put about 10 or 20 per cent confidence (and that's assuming its all Gaussian..)!
Please could someone check I'm not going mad about this point? Thanks very much indeed.

 Posts: 183
 Joined: September 24 2004
 Affiliation: Brookhaven National Laboratory
 Contact:
Re: Delta chisq and dof
Ok,Sarah Bridle wrote:
I am saying that when they say 95 per cent confidence they should have put about 10 or 20 per cent confidence (and that's assuming its all Gaussian..)!
Please could someone check I'm not going mad about this point? Thanks very much indeed.
have a loot at
http://www.fiz.unilj.si/cosmo/aslosar/dstat.mws
I did exactly what you advocated. For 2D gaussians it seems that the
advocated procedure is exactly the same as full marginalisation, i.e.
the probablity of a funcion at its maximum given fixed x is proportional
to its marginalised values... However, I don't think it works in general...

 Posts: 45
 Joined: September 24 2004
 Affiliation: LudwigMaximiliansUniversity Munich
 Contact:
Don't go mad
Hi Sarah
I agree entirely with you. The [tex]\Delta\Chi^2 = 4 \equiv 95\%[/tex] only applies for a 1d likelihhod. But they have a 7d likelihood. It would be like having a prior with NO errorbars
on all the other parameters fixed at the best fit value. Which I would say is wrong.
I agree entirely with you. The [tex]\Delta\Chi^2 = 4 \equiv 95\%[/tex] only applies for a 1d likelihhod. But they have a 7d likelihood. It would be like having a prior with NO errorbars
on all the other parameters fixed at the best fit value. Which I would say is wrong.

 Posts: 144
 Joined: September 24 2004
 Affiliation: University College London (UCL)
 Contact:
my comments were barking up the wrong tree
DOH! Thanks very much for your post Anze, this brings me back into the real world at least..
Somehow I had got sidetracked with an irrelevant issue: where the 95 per cent contour would be in 7d space.. so please ignore my postings above!
So I agree with what you say:
* what they do is the old "maximising" over other parameters trick, which the majority of people used to do before MCMC to save time;
* this is the same as marginalising properly if the pdf is a Gaussian;
* one possible explanation for their result cf others is that the pdf is not very Gaussian and so this maximisation gives a misleading answer
Somehow I had got sidetracked with an irrelevant issue: where the 95 per cent contour would be in 7d space.. so please ignore my postings above!
So I agree with what you say:
* what they do is the old "maximising" over other parameters trick, which the majority of people used to do before MCMC to save time;
* this is the same as marginalising properly if the pdf is a Gaussian;
* one possible explanation for their result cf others is that the pdf is not very Gaussian and so this maximisation gives a misleading answer

 Posts: 144
 Joined: September 24 2004
 Affiliation: University College London (UCL)
 Contact:
Seems like they may be right...?
We concluded that their result might not hold when you do the full MCMC. So I ran a few chains and it looks like they might be right.. I put the chains at www.star.ucl.ac.uk/~sarah/chains
I know the chains are extremely ropey, but I'm posting this anyway because I realised I am not finding time to improve it and it seemed better to post this than not say anything.
I started all the chains from fnu=0.5 and I did a couple of CosmoMC runs to improve the covariance matrix. Cutting a measly 500 samples for burnin (after which I see no trend in fnu or likelihood changing) gives www.star.ucl.ac.uk/~sarah/chains/fnu.png
You can see just how dodgy the chains are by looking at how each chain contributes to this result www.star.ucl.ac.uk/~sarah/chains/fnu_eachchain.png
(The vertical lines are twotailed limits wheras we want an upper limit on fnu).
Seems to agree with their result pretty well.
This has made me think that they may not be wrong. One thing to note is that the dashed lines (mean likelihood) are v similar to the solid lines (marginalised likelihood) which happens when the pdf is quite Gaussian. Their method would give the right answer if the pdf were Gaussian. So that hangs together.
Anyway, obviously these chains are awful but I thought better to share it with you than to keep it on my disk. Does anyone have anything more reliable?
I know the chains are extremely ropey, but I'm posting this anyway because I realised I am not finding time to improve it and it seemed better to post this than not say anything.
I started all the chains from fnu=0.5 and I did a couple of CosmoMC runs to improve the covariance matrix. Cutting a measly 500 samples for burnin (after which I see no trend in fnu or likelihood changing) gives www.star.ucl.ac.uk/~sarah/chains/fnu.png
You can see just how dodgy the chains are by looking at how each chain contributes to this result www.star.ucl.ac.uk/~sarah/chains/fnu_eachchain.png
(The vertical lines are twotailed limits wheras we want an upper limit on fnu).
Seems to agree with their result pretty well.
This has made me think that they may not be wrong. One thing to note is that the dashed lines (mean likelihood) are v similar to the solid lines (marginalised likelihood) which happens when the pdf is quite Gaussian. Their method would give the right answer if the pdf were Gaussian. So that hangs together.
Anyway, obviously these chains are awful but I thought better to share it with you than to keep it on my disk. Does anyone have anything more reliable?

 Posts: 3
 Joined: March 25 2006
 Affiliation: Institute for Cosmic Ray Research, University of Tokyo
[astroph/0409768] Constraining Neutrino Masses by CMB Exper
Recent analyses (using WMAP 1st yr only) are consistent with ours:
astroph/0507503, MacTavish et al., Fig. 8,
hepph/0602058, Hannestad, sec. 4.3.2,
astroph/0603494, Lesgourgues & Pastor, sec. 5.2.
Also, WMAP 3rd yr result is consistent.
astroph/0603449, Spergel et al., sec. 7.2.1.
astroph/0507503, MacTavish et al., Fig. 8,
hepph/0602058, Hannestad, sec. 4.3.2,
astroph/0603494, Lesgourgues & Pastor, sec. 5.2.
Also, WMAP 3rd yr result is consistent.
astroph/0603449, Spergel et al., sec. 7.2.1.