CosmoCoffee

Hiranya Peiris · Post by **Hiranya Peiris** » October 27 2005

Dear All,

I have been trying to find an appropriate proposal distribution for a problem which has 4 new "fast" parameters (I do not use the standard fast parameters). I have set 8 chains running under MPI on a beowulf cluster, with the following parameters:

a trial covmat which keeps the correlations between the standard slow parameters, with zeros for the fast parameters.

some relevant cosmomc parameters:

estimate_propose_matrix = F
propose_scale = 2.4
sampling_method = 1
use_fast_slow = T
oversample_fast = 1
MPI_Converge_Stop = 0.03
MPI_StartSliceSampling = T
MPI_Check_Limit_Converge = F
MPI_LearnPropose = T
MPI_R_StopProposeUpdate = 0.4

There are a couple of recalcitrant parameters in the fast parameter space whose distributions are poorly known a priori (so probably have bad initial guesses for step sizes etc).

After the requested 200000 samples, the chains stopped without the "stop propose update" criterion being statisfied.

When I plot the R-1 value of the worst eigenvalue which is being printed in the output, it starts very high (around 20) and the steadily decreases to around 2 after about 100000 samples. After that, it suddenly started shooting UP again. When the code stopped after 200000 samples, the worst R-1 had climbed back to ~9.

Is this normal behaviour? Is there anything I can do to make the LearnProposeUpdate process work faster? And are these chains useful for calculating a new covmat for a new run, given that they do not constitute a Markovian process?

Thanks a lot,
Hiranya

Post by **Antony Lewis** » October 27 2005

Sounds like it is getting stuck, or only after a long time finding a better local minimum. Making the 1D plots may show you what's going on: compare all chains together to individual chains.

Can also try generating the chains at a higher temperature, which should make moving around easier to start with.

You could also turn off slice sampling, or try one of the other sampling methods (directional gridding is quite good for fast parameters).

Hiranya Peiris · Post by **Hiranya Peiris** » October 27 2005

When I plot 1D chains in the new parameters, I see long wavelength correlations, suggesting that the stepsizes are much too small. Does the code keep track of the covmats it tried somewhere, so I can see what it was trying when the R-1 value started to increase again? I don't see anything in the output.

What temperature would you suggest to start with?

Also, should I use the covmat from the current chains in the next run, or just start with no covariance for the new parameters, as before?

Thanks!
Hiranya

Post by **Antony Lewis** » October 27 2005

It doesn't keep the covmats from the MPI learning.

I would start with the covmat from the old run. You could try a temperature of 2 or so (increasing temperature is like increasing the noise on the observation).

Directional gridding is good because it is very insensitive to chosing fast parameter step sizes too small.

You can of course also just increase the number of samples it does before giving up and see if it finally converges. (this increasing R behaviour also happens with WMAP1 if tau is unconstrained: takes a while for the chains to discover the high tau local minimum, then quite a long time to fully converge over both regions of parameter space)

Hiranya Peiris · Post by **Hiranya Peiris** » November 02 2005

Hi Antony,

I have now tried your suggestions (temperature=2, directional gridding on, slicing off, gave it 2 million steps this time). 8 chains.

The code seems to have aborted naturally with much fewer than 2 million steps taken, and R-1 for worst eigenvalue being ~9. I append the tail end of the output below: I can't figure out why it stopped without doing what it was supposed to.

Can you shed any light on this?

The last R-1 - related thing it printed was:
Current convergence R-1 = 8.465186 chain steps = 297966
.
.
.
.
Done slow grid, direction 3
grid steps moved: 0 acc rate = 0.000000
1 Directional gridding, Like: 364.3219
Calling CAMB
CAMB done
-1 Likelihood: 407.6772 Current Like: 364.3219
-1 Likelihood: 603.0909 Current Like: 364.3219
-1 Likelihood: 363.9507 Current Like: 364.3219
0 Likelihood: 378.1926 Current Like: 363.9507
Calling CAMB
CAMB done
-2 Likelihood: 421.2237 Current Like: 363.9507
-2 Likelihood: 515.0442 Current Like: 363.9507
0 Likelihood: 881.1145 Current Like: 363.9507
-2 Likelihood: 365.3286 Current Like: 363.9507
0 Likelihood: 366.8148 Current Like: 363.9507
0 Likelihood: 398.0000 Current Like: 363.9507
-2 Likelihood: 688.0377 Current Like: 363.9507
0 Likelihood: 381.4606 Current Like: 363.9507
-2 Likelihood: 372.7528 Current Like: 363.9507
-2 Likelihood: 409.5064 Current Like: 363.9507
-2 Likelihood: 377.9580 Current Like: 363.9507
-2 Likelihood: 370.9645 Current Like: 363.9507
-2 Likelihood: 1316.273 Current Like: 363.9507
-2 Likelihood: 392.0626 Current Like: 363.9507
-2 Likelihood: 429.3852 Current Like: 363.9507
-2 Likelihood: 408.3560 Current Like: 363.9507
Done slow grid, direction 4
grid steps moved: 1 acc rate = 5.0000001E-02
1 Directional gridding, Like: 363.9507
Calling CAMB
CAMB done
1 Likelihood: 410.7977 Current Like: 363.9507
1 Likelihood: 1.0000000E+30 Current Like: 363.9507
1 Likelihood: 387.9620 Current Like: 363.9507
MPI Id 0 is using GM port 2, board 0 (MAC 0060dd493d2a).
MPI Id 4 is using GM port 4, board 0 (MAC 0060dd493d2a).
MPI Id 5 is using GM port 4, board 0 (MAC 0060dd493d25).
MPI Id 1 is using GM port 2, board 0 (MAC 0060dd493d25).
MPI Id 6 is using GM port 4, board 0 (MAC 0060dd493c71).
MPI Id 2 is using GM port 2, board 0 (MAC 0060dd493c71).
MPI Id 7 is using GM port 4, board 0 (MAC 0060dd493c2e).
MPI Id 3 is using GM port 2, board 0 (MAC 0060dd493c2e).
Received data from all 8 MPI processes.
Sending mapping to MPI Id 0.
Sending mapping to MPI Id 1.
Sending mapping to MPI Id 2.
Sending mapping to MPI Id 3.
Sending mapping to MPI Id 4.
Sending mapping to MPI Id 5.
Sending mapping to MPI Id 6.
Sending mapping to MPI Id 7.
Data sent to all processes.
Received valid abort message !
Reap remote processes:
Abort in progress...

Cheers
Hiranya

Post by **Antony Lewis** » November 02 2005

Can't think of anything at the moment.. sorry. Could be a problem with the way cosmomc counts samples for directional gridding, but I can't see one off hand.

Hiranya Peiris · Post by **Hiranya Peiris** » November 05 2005

Hi Antony,

We have now tried several variations on MPI+CosmoMC (directional gridding/metropolis, temperature=1 or 2 etc). They all suffer the same problem (we tried it on a completely unmodified cosmomc with an enlarged parameter space, so that it would need to run a long time before finding an optimal proposal distribution).

The problem can be summarized as: at the stage where the first chain (flagged by *_1.txt) reaches roughly 300000 steps (not exact, but it is always around then), that chain simply dies. After that, R-1 values are no longer calculated. The rest of the chains continue to write to file (past 300000 steps) but the proposal matrix does not get updated. And we can see the thread die in the cluster usage log -- the load/node drops from 2 to 1.

Our guess is that something goes wrong with the proposal density update -- and there seems to be a lot of models with a likelihood of 10^30 in the params.ini.log file -- that could easily produce a matrix that was numerically singular, or at least very very ill-conditioned. (We are going to check this by writing out the proposal matrix every time its updated).

We are investigating possible cluster related problems, but the fact that this always happens around 300000 steps is suggestive.

We would really appreciate any thoughts on these issues.

Thanks,
Hiranya

Post by **Antony Lewis** » November 05 2005

Wild quess: something to do with the stored chain thinning? e.g. second thinning in paramdef.f90

Code: Select all

             if &#40;S%Count > 100000&#41; then
               !Try not to blow memory by storing too many samples
                call TList_RealArr_Thin&#40;S, 2&#41;
                MPI_thin_fac = MPI_thin_fac*2 
             end if

??
Could easily test by commenting it out or making the 100000 number much smaller, though I don't see a problem.

I've never seen a problem, but then most well behaved chains stop after far fewer samples.

Hiranya Peiris · Post by **Hiranya Peiris** » November 05 2005

Thanks, I will investigate that lead!

I'm just curious - what is the largest parameter space you've tested cosmomc on? The unmodified case where we were testing the enlarged parameter space to make it run longer had (\omega_b, \omega_c, \tau, \theta, n, d n/d \ln k, r, n_t, A_s) being varied. Now some of the parameters must be very poorly constrained, but I wouldn't have thought it was bad enough to break the code in some fashion.

Post by **Antony Lewis** » November 06 2005

Hiranya Peiris wrote: Our guess is that something goes wrong with the proposal density update – and there seems to be a lot of models with a likelihood of 1030 in the params.ini.log file – that could easily produce a matrix that was numerically singular, or at least very very ill-conditioned. (We are going to check this by writing out the proposal matrix every time its updated).

10^30 is just the placeholder it uses for models that are excluded by the prior. There are always some of these. The covariance matrix is computed using the samples only - the values of the likelihood are irrelevant - so I don't think that is a problem.

Hiranya Peiris wrote:I'm just curious - what is the largest parameter space you've tested cosmomc on? The unmodified case where we were testing the enlarged parameter space to make it run longer had (\omega_b, \omega_c, \tau, \theta, n, d n/d \ln k, r, n_t, A_s) being varied. Now some of the parameters must be very poorly constrained, but I wouldn't have thought it was bad enough to break the code in some fashion.

I've done things with lots more total parameters (e.g. astro-ph/0302306). I've not done anything with that particular set for a long time. If you like you can send me the params.ini and I can try to run it and see what happens.

CosmoCoffee

cosmomc: problems "learning" proposal covariance matrix

cosmomc: problems \"learning\" proposal covariance

Re: cosmomc: problems \\\"learning\\\" proposal co

cosmomc: problems \"learning\" proposal covariance

Re: cosmomc: problems \"learning\" proposal covari

cosmomc: problems \"learning\" proposal covariance

Re: cosmomc: problems \"learning\" proposal covari

cosmomc: problems \\\"learning\\\" proposal covari

Re: cosmomc: problems \\\\\\\"learning\\\\\\\" pro

cosmomc: problems \"learning\" proposal covariance

Re: cosmomc: problems \\\"learning\\\" proposal co