[astroph/0608007] Estimation of Polarized Power Spectra by Gibbs sampling
Authors:  D. L. Larson, H. K. Eriksen, B. D. Wandelt, K. M. Gorski, Greg Huey, J. B. Jewell, I. J. O'Dwyer 
Abstract:  Earlier papers introduced a method of accurately estimating the angular cosmic microwave background (CMB) temperature power spectrum based on Gibbs sampling. Here we extend this framework to polarized data. All advantages of the Gibbs sampler still apply, and exact analysis of megapixel polarized data sets is thus feasible. These advantages may be even more important for polarization measurements than for temperature measurements. While approximate methods can alias power from the larger Emode spectrum into the weaker Bmode spectrum, the Gibbs sampler (or equivalently, exact likelihood evaluations) allows for a clean separation of these modes. To demonstrate the method, we analyze two simulated data sets: 1) a hypothetical future CMBPol mission, with the focus on Bmode estimation; and 2) a Plancklike mission, to highlight the computational feasibility of the method. 
[PDF] [PS] [BibTex] [Bookmark] 

 Posts: 1379
 Joined: September 23 2004
 Affiliation: University of Sussex
 Contact:
[astroph/0608007] Estimation of Polarized Power Spectra by
A nice paper implementing the Gibbs sampling technique to CMB polarization.
I think some of the comments about E/B separation are potentially confusing. There are two issues here: 1. the fundamental property than on an incomplete sky there exist ambiguous E/B modes; E/B cannot be separated exactly (e.g. astroph/0305545 and refs); 2. the problems that arise mixing E/B power when estimating the power spectra (for which naive PseudoC_l is not very good). Gibbs certainly helps with the latter, but they do not appear to have implemented E/B separation for the former in this paper.
Consider a noise free sky. By my understanding of the Gibbs method, with a sky cut the full sky samples should include nonzero B modes even if the true sky has B=0. This is just because there are ambiguous modes, so the method will sample all possible combinations of E and B that give the same observed ambiguous modes on the observed section. To this extent the method exhibits E/B mixing: B modes are present in the posterior samples even when there were none originally. I think one could if desired do E/Bseparation by studying the covariance of the posterior full sky E_lm and B_lm samples.
I think some of the comments about E/B separation are potentially confusing. There are two issues here: 1. the fundamental property than on an incomplete sky there exist ambiguous E/B modes; E/B cannot be separated exactly (e.g. astroph/0305545 and refs); 2. the problems that arise mixing E/B power when estimating the power spectra (for which naive PseudoC_l is not very good). Gibbs certainly helps with the latter, but they do not appear to have implemented E/B separation for the former in this paper.
Consider a noise free sky. By my understanding of the Gibbs method, with a sky cut the full sky samples should include nonzero B modes even if the true sky has B=0. This is just because there are ambiguous modes, so the method will sample all possible combinations of E and B that give the same observed ambiguous modes on the observed section. To this extent the method exhibits E/B mixing: B modes are present in the posterior samples even when there were none originally. I think one could if desired do E/Bseparation by studying the covariance of the posterior full sky E_lm and B_lm samples.

 Posts: 3
 Joined: August 03 2006
 Affiliation: University of Chicago
[astroph/0608007] Estimation of Polarized Power Spectra by
Hi Antony,
My intuition says the opposite: that Gibbs sampling should do an optimal job of separating EB because it samples the exact likelihood, which should give optimal separation (e.g. maximum likelihood estimators saturate CramerRao). The likelihood should be the same in the end whether ambiguous modes are separated out explicitly or the separation is hidden in the linear algebra. It would be great to see this demonstrated true or false, say for an azimuthally symmetric survey where the CramerRao bound can be computed feasibly because all covariance matrices are diagonal in m (I tried this comparison for pseudoCl in appendix F of astroph/0511629).
My intuition says the opposite: that Gibbs sampling should do an optimal job of separating EB because it samples the exact likelihood, which should give optimal separation (e.g. maximum likelihood estimators saturate CramerRao). The likelihood should be the same in the end whether ambiguous modes are separated out explicitly or the separation is hidden in the linear algebra. It would be great to see this demonstrated true or false, say for an azimuthally symmetric survey where the CramerRao bound can be computed feasibly because all covariance matrices are diagonal in m (I tried this comparison for pseudoCl in appendix F of astroph/0511629).

 Posts: 11
 Joined: September 25 2004
 Affiliation: Johns Hopkins University
[astroph/0608007] Estimation of Polarized Power Spectra by
Just trying to make sure I understand the question...
In some sense, the Gibbs sampling does not have to deal with cut sky issues, since we sample full skies. No code is needed to deal with cut sky issues, since we do our harmonic transforms on the full sky.
However, for the present paper, we ignore all the data inside the mask (since we consider it to be unusably contaminated by foregrounds). Because there exist B modes that are nonzero only inside the mask, then the data do not constrain these B modes, even with zero noise outside the mask. Thus the Gibbs sampler would sample those B modes, and would not set the
[tex]C^{BB}_{\ell}[/tex] spectrum to zero. I believe this should be the desired behavior.
Our goal is to do as much E/B separation as the data will allow, and no more. :)
In some sense, the Gibbs sampling does not have to deal with cut sky issues, since we sample full skies. No code is needed to deal with cut sky issues, since we do our harmonic transforms on the full sky.
However, for the present paper, we ignore all the data inside the mask (since we consider it to be unusably contaminated by foregrounds). Because there exist B modes that are nonzero only inside the mask, then the data do not constrain these B modes, even with zero noise outside the mask. Thus the Gibbs sampler would sample those B modes, and would not set the
[tex]C^{BB}_{\ell}[/tex] spectrum to zero. I believe this should be the desired behavior.
Our goal is to do as much E/B separation as the data will allow, and no more. :)

 Posts: 1379
 Joined: September 23 2004
 Affiliation: University of Sussex
 Contact:
Re: [astroph/0608007] Estimation of Polarized Power Spectra
I agree likelihoods should be optimal, but I don't think the algorithm in itself is doing any E/B mode separation (all full sky noise free maps will have some nonzero B mode everywhere, including in the observed region). To separate the modes you'd need to project the full sky sampled E_lm and B_lm into the linear combinations of the full sky E and B modes that are pure B and pure E over the observed area?
I'm not saying Gibbs is not a good method  I just don't see that it is doing E/B mode separation (which is redundant for getting exact likelihoods if everything is as expected).
I'm not saying Gibbs is not a good method  I just don't see that it is doing E/B mode separation (which is redundant for getting exact likelihoods if everything is as expected).

 Posts: 1379
 Joined: September 23 2004
 Affiliation: University of Sussex
 Contact:
Re: [astroph/0608007] Estimation of Polarized Power Spectra
I would also like to know what happens when C^B_l = 0 (or pure lensing)  this is the case of most worry for E/B mixing/convergence problems. In particular, how does the method arrange to give finite probability at C^B_l =0 given that C^{1} is singular there?
I would be quite surprised if the method does not have some convergence problems for low C^B_l, as this is rather like the troublesome low signal to noise regime. I'd therefore be interested to see if the method works if r=0 is assumed in the simulation (so C^E >> C^B at low l), or whether it takes a very long convergence time for low C^B to be sampled correctly. If you imaging measuring the polarization at a single point, there is an exact degeneracy between possible E and B fullsky samples, and hence potentially just the kind of correlated distribution Gibbs sampling becomes very bad at exploring.
It is also worth noting that the C^B_l due to lensing is quite nonGaussian and correlated, hence use of the Gaussian uncorrelated results is not strictly justified (though does not of course prevent you from constructing nonGaussian sky samples if the observed sky is). For a really convincing demonstration one could generate a nonGaussian lensed sky (e.g. using LensPix) and see if the method still works or can easily be adapted to give correct results.
I would be quite surprised if the method does not have some convergence problems for low C^B_l, as this is rather like the troublesome low signal to noise regime. I'd therefore be interested to see if the method works if r=0 is assumed in the simulation (so C^E >> C^B at low l), or whether it takes a very long convergence time for low C^B to be sampled correctly. If you imaging measuring the polarization at a single point, there is an exact degeneracy between possible E and B fullsky samples, and hence potentially just the kind of correlated distribution Gibbs sampling becomes very bad at exploring.
It is also worth noting that the C^B_l due to lensing is quite nonGaussian and correlated, hence use of the Gaussian uncorrelated results is not strictly justified (though does not of course prevent you from constructing nonGaussian sky samples if the observed sky is). For a really convincing demonstration one could generate a nonGaussian lensed sky (e.g. using LensPix) and see if the method still works or can easily be adapted to give correct results.

 Posts: 58
 Joined: September 25 2004
 Affiliation: ITA, University of Oslo
 Contact:
Re: [astroph/0608007] Estimation of Polarized Power Spectra
It's actually never singular, because C^B_l will always be strictly positive, due to noise. Remember that we are not trying to find the single "true" value, but rather a full distribution of possible values. So for a realization that truly has B=0, we will get a distribution that peaks at 0, and then drops off quickly at high values.Antony Lewis wrote:I would also like to know what happens when C^B_l = 0 (or pure lensing)  this is the case of most worry for E/B mixing/convergence problems. In particular, how does the method arrange to give finite probability at C^B_l =0 given that C^{1} is singular there?
This, on the other hand, is true. However, this is only a problem with the current implementation of the Gibbs sampler, where the step size is given by cosmic variance (which of course is nonzero even for B=0, because of the noise contribution discussed above) and the noise is big. We are currently investigating other sampling schemes, and these will hopefully resolve this. It's important to distinguish between the framework as such and the current implementation, I think  I have a hard time identifying any other methods that can do what the "Gibbs sampling framework" as such can do, but I have no problem identifying issues with the current implementation.I would be quite surprised if the method does not have some convergence problems for low C^B_l, as this is rather like the troublesome low signal to noise regime. I'd therefore be interested to see if the method works if r=0 is assumed in the simulation (so C^E >> C^B at low l), or whether it takes a very long convergence time for low C^B to be sampled correctly. If you imaging measuring the polarization at a single point, there is an exact degeneracy between possible E and B fullsky samples, and hence potentially just the kind of correlated distribution Gibbs sampling becomes very bad at exploring.
Sure. Yet another thing on our todo list.. :)It is also worth noting that the C^B_l due to lensing is quite nonGaussian and correlated, hence use of the Gaussian uncorrelated results is not strictly justified (though does not of course prevent you from constructing nonGaussian sky samples if the observed sky is). For a really convincing demonstration one could generate a nonGaussian lensed sky (e.g. using LensPix) and see if the method still works or can easily be adapted to give correct results.

 Posts: 11
 Joined: September 25 2004
 Affiliation: Johns Hopkins University
[astroph/0608007] Estimation of Polarized Power Spectra by
I don't see why we would need to project the full sky sampled Elm and Blm into pure B and pure E modes outside the mask. Working with pure E and B modes outside the mask is a useful technique if you don't already have the Elm and Blm coefficients for the full sky. However, we already do have those coefficients, so we can go directly from there to estimates of the power spectra: [tex]C_\ell^{EE}[/tex] and [tex]C_\ell^{BB}[/tex] (and [tex]C_\ell^{EB}[/tex], if you like).I agree likelihoods should be optimal, but I don't think the algorithm in itself is doing any E/B mode separation (all full sky noise free maps will have some nonzero B mode everywhere, including in the observed region). To separate the modes you'd need to project the full sky sampled Elm and Blm into the linear combinations of the full sky E and B modes that are pure B and pure E
The simplest way to do that would be to use formulas like
[tex]\hat{C}_\ell^{EE} = \frac{1}{2\ell+1}\sum_m a^{E}_{\ell m}^2[/tex]
I guess I have been assuming we are primarily interested in the power spectrum. If we want to know what the actual B modes are, perhaps to map out the weak lensing potential, then Gibbs sampling still samples realizations of the B modes. (We would, however, have to modify our assumption of isotropic Gaussianity for the B modes, since weak lensing doesn't satisfy that.)
It is true that our EB separation happens after the fact. All the Gibbs sampling code does is sample fullsky polarized realizations according to their likelihood. However, our point is that when you have a fullsky realizations of the polarization, then breaking that up into the E and B parts is trivial (now that the HEALPix code has been written, anyway).
I don't understand this statement. It's possible to determine the "curl" of a polarization field with a differential operator on the sphere (act with edthbar twice and take the imaginary part). This curl operator determines how much B mode there is at a given location. If the Gibbs sampler were run on a zeronoise masked sky that contained no B modes at all, then I believe the resulting samples would only have "curl" or Bness inside the mask, not everywhere.(all full sky noise free maps will have some nonzero B mode everywhere, including in the observed region)
Let me know if I've misunderstood what you said.

 Posts: 12
 Joined: September 24 2004
 Affiliation: IAP, UPMC, Sorbonne Universites, and Illinois
 Contact:
[astroph/0608007] Estimation of Polarized Power Spectra by
One short comment: it is easy to compute the posterior mean of the E and B maps using a Gibbs code. These should be the least square optimal maps of E and B modes separately given the data. So in that sense the Gibbs codes do perform E and B mode separation. This will certainly be small in the masked region, but use the power spectrum information obtained on the rest of the sky for the reconstruction, so it will use smoothness of the large scale modes to reconstruct them partially into the mask.
We decided to show the separation of E and B using a single allsky reconstruction, but given this discussion I think it would be very useful to add a figure that shows the posterior mean E and B maps.
Ben
We decided to show the separation of E and B using a single allsky reconstruction, but given this discussion I think it would be very useful to add a figure that shows the posterior mean E and B maps.
Ben

 Posts: 3
 Joined: August 03 2006
 Affiliation: University of Chicago
[astroph/0608007] Estimation of Polarized Power Spectra by
Hi Antony,
Now I see what you were saying in your original post (sorry, I think I responded to it without reading it carefully!) You're just saying (correct me if I'm wrong...) that language like "clean separation" of E and B is confusing because these are not actually decoupled in the Gibbs sampler; the E_{lm}'s do affect the posterior distribution of the B_{lm}'s on a cut sky. Maybe something like "optimal separation" (in the same sense as maximum likelihood estimation) would be clearer?
Now I see what you were saying in your original post (sorry, I think I responded to it without reading it carefully!) You're just saying (correct me if I'm wrong...) that language like "clean separation" of E and B is confusing because these are not actually decoupled in the Gibbs sampler; the E_{lm}'s do affect the posterior distribution of the B_{lm}'s on a cut sky. Maybe something like "optimal separation" (in the same sense as maximum likelihood estimation) would be clearer?

 Posts: 1379
 Joined: September 23 2004
 Affiliation: University of Sussex
 Contact:
Re: [astroph/0608007] Estimation of Polarized Power Spectra
Absolutely, you don't need to. I'm not saying you should, I'm just querying your claim to be doing E/B mode separation. To my mind E/B mode separation requires generating modes which are pure B and pure E in the observed patch, whereas you are generating ensembles of full sky samples each of which has different amounts of E and B in the patch.I don't see why we would need to project the full sky sampled Elm and Blm into pure B and pure E modes outside the mask.
The problem is the boundary integrals when you have an incomplete sky; you cannot take derivatives of the polarization field near the boundary unless it is band limited. The presence of small scale Emode power (or noise) means you effectively cannot do it (see proofs and refs in astroph/0305545). This is why there are ambiguous modes for nonbandlimited partial skies.I don't understand this statement. It's possible to determine the "curl" of a polarization field with a differential operator on the sphere (act with edthbar twice and take the imaginary part). This curl operator determines how much B mode there is at a given location. If the Gibbs sampler were run on a zeronoise masked sky that contained no B modes at all, then I believe the resulting samples would only have "curl" or Bness inside the mask, not everywhere.
But what happens to the BlackwellRao estimator? This involves C^{1}, and I might want to know the likelihood of a B=0 model.It's actually never singular, because CBl will always be strictly positive, due to noise. Remember that we are not trying to find the single "true" value, but rather a full distribution of possible values. So for a realization that truly has B=0, we will get a distribution that peaks at 0, and then drops off quickly at high values.
One short comment: it is easy to compute the posterior mean of the E and B maps using a Gibbs code. These should be the least square optimal maps of E and B modes separately given the data.
Doing this sounds like a good idea, giving a Wiender filtered E/B separation given the model. Assuming everything works well I think this should be 'better', but less general, than doing direct mode separation. (less general because there might be some unexpected nonGaussian B mode that doesn't fit the assumed Gaussian model at all well  direct mode separation makes no assumptions about the distribution of the modes; better because it includes modes with only very small E contamination  though this does mean it still isn't clean separation)Maybe something like "optimal separation" (in the same sense as maximum likelihood estimation) would be clearer?
I'd be very interested to know what nonGibbs sampling methods you think might work. (Hobson also has some ideas, see his Sussex talk)We are currently investigating other sampling schemes, and these will hopefully resolve this

 Posts: 58
 Joined: September 25 2004
 Affiliation: ITA, University of Oslo
 Contact:
Re: [astroph/0608007] Estimation of Polarized Power Spectra
Right. If you want to study B=0 models, I think the best solution is to start out the analysis with that constraint. It's very straightforward to enforce C_l^B = 0 everywhere, and sample just T and E. (The signal sampling step goes through just as before, while the C_l sampling step is reduced to a 2x2 matrix problem.) You don't get a full posterior that way, but rather a conditional one, P(C_ld, C_l^B = 0). In fact, this has already been implemented in our codes.But what happens to the BlackwellRao estimator? This involves C^{1}, and I might want to know the likelihood of a B=0 model.

 Posts: 1379
 Joined: September 23 2004
 Affiliation: University of Sussex
 Contact:
Re: [astroph/0608007] Estimation of Polarized Power Spectra
I think the following is a much better way to get sensible results for low C_B, and also a good way to resolve convergence issues at low signal to noise and get more robust BlackwellRao estimators (at least in toy problems like the one considered in this paper where the noise is close to uniform apart from a sky cut).
Basically: just add in the isotropic component of the noise with the signal. ie. define X_i = X + n_i, where X is the pure CMB sky and n_i is Gaussian isotropic noise with covariance N_i. The total noise covariance is N_i + N_a where N_a is not isotropic. Then sample from
[tex]
2\log P(S,X_id) = (dX_i)^\dagger N_a^{1}(dX_i) + X_i^\dagger(S+N_i)^{1}X_i + \logS+N_i
[/tex]
using the standard method, where d is the data and S is the signal covariance. The distribution of S +N_i given X_i is now nice and broad, so Gibbs iterations converge quickly. Furthermore the BlackwellRao likelihoods are well behaved for small S. The only cost of doing this is that N_a is now more anisotropic, so conjugate gradient solutions are somewhat slower. Using someting like N_i = 0.9 min(N) should usually be a significant net help.
For previous discussion relating to parameter esimation see
http://cosmocoffee.info/viewtopic.php?t=137#324
Basically: just add in the isotropic component of the noise with the signal. ie. define X_i = X + n_i, where X is the pure CMB sky and n_i is Gaussian isotropic noise with covariance N_i. The total noise covariance is N_i + N_a where N_a is not isotropic. Then sample from
[tex]
2\log P(S,X_id) = (dX_i)^\dagger N_a^{1}(dX_i) + X_i^\dagger(S+N_i)^{1}X_i + \logS+N_i
[/tex]
using the standard method, where d is the data and S is the signal covariance. The distribution of S +N_i given X_i is now nice and broad, so Gibbs iterations converge quickly. Furthermore the BlackwellRao likelihoods are well behaved for small S. The only cost of doing this is that N_a is now more anisotropic, so conjugate gradient solutions are somewhat slower. Using someting like N_i = 0.9 min(N) should usually be a significant net help.
For previous discussion relating to parameter esimation see
http://cosmocoffee.info/viewtopic.php?t=137#324