CosmoMC: How long can chains converge?

Use of Cobaya. camb, CLASS, cosmomc, compilers, etc.
Jun-Qing Xia
Posts: 22
Joined: January 02 2005
Affiliation: SISSA, Italy

CosmoMC: How long can chains converge?

Post by Jun-Qing Xia » July 22 2005

Dear All,

Now I use two parameters w0 and w1 to express the equation of state of Dark Energy instead of w. And I choose the spaces of parameters w0 and w1 between -10 and 10. I use the default setting about other parameters.
Two weeks ago I use 128 CPUs (Intel Itanium 1.3GHz CPU) on SuperComputer to run this program. But now the Current convergence R-1 is still about 3.2. But generally need R-1 < 0.1.
Does anyone know how long can these chains converge under this situation?
Thanks!

Jun-Qing Xia

Anze Slosar
Posts: 183
Joined: September 24 2004
Affiliation: Brookhaven National Laboratory
Contact:

CosmoMC: How long can chains converge?

Post by Anze Slosar » July 22 2005

Huh!

Antony is probably the right person to answer this, but on a 128 CPU monster it should converge in a few hours at most! So I guess you must be using a wrong covariance matrix... I find that MPI_StartSliceSampling = T helps a lot for a good covariance matrix, plus MPI_LearnPropose = T, of course, and maybe just forget about the default matrix and set "propose_matrix =". You might also try "estimate_propose_matrix =" but it doesn't always work... Anyway, given you have so many CPUs I would go with large openmp, say num_threads = 8, which still gives you 16 chains which is plenty!

anze

Jun-Qing Xia
Posts: 22
Joined: January 02 2005
Affiliation: SISSA, Italy

Re: CosmoMC: How long can chains converge?

Post by Jun-Qing Xia » July 22 2005

Anze Slosar wrote:Huh!

Antony is probably the right person to answer this, but on a 128 CPU monster it should converge in a few hours at most! So I guess you must be using a wrong covariance matrix... I find that MPI_StartSliceSampling = T helps a lot for a good covariance matrix, plus MPI_LearnPropose = T, of course, and maybe just forget about the default matrix and set "propose_matrix =". You might also try "estimate_propose_matrix =" but it doesn't always work... Anyway, given you have so many CPUs I would go with large openmp, say num_threads = 8, which still gives you 16 chains which is plenty!

anze
Hi!

You mean that I should set the parameters in params.ini like this:
MPI_StartSliceSampling = T
MPI_LearnPropose = T
propose_matrix =
estimate_propose_matrix = T

Because I add a parameter I don't have a good covariance matrix. What I do is adding many zero to make a new matrix 14×14 instead of the default matrix 13×13. And I set estimate_propose_matrix = F.
The parameters MPI_StartSliceSampling and MPI_LearnPropose are always set T.
So in the params.ini which I use is :
MPI_StartSliceSampling = T
MPI_LearnPropose = T
propose_matrix = params_CMB.covmat #(this is a new matrix 14×14)
estimate_propose_matrix = F

Am I wrong?

Jun-Qing

Anze Slosar
Posts: 183
Joined: September 24 2004
Affiliation: Brookhaven National Laboratory
Contact:

Re: CosmoMC: How long can chains converge?

Post by Anze Slosar » July 22 2005

Jun-Qing Xia wrote: So in the params.ini which I use is :
MPI_StartSliceSampling = T
MPI_LearnPropose = T
propose_matrix = params_CMB.covmat #(this is a new matrix 14×14)
estimate_propose_matrix = F

Am I wrong?

Jun-Qing
Well, I would expect that to work... Even if your covmat is very wrong then the initial dose of slice sampling should help MPI_learnpropose to converge to the right distribution...

I guess the best thing to do would be to compare individual chains and just plot their distribution and see which parameter they disagree in most and then investigate...

Jun-Qing Xia
Posts: 22
Joined: January 02 2005
Affiliation: SISSA, Italy

CosmoMC: How long can chains converge?

Post by Jun-Qing Xia » July 22 2005

Before I do like this I posted a topic "COSMOMC: How to constrain w?". Sarah Bridle replied I would only modify the matrix. And she thought it's possibly better than using estimate_propose_matrix = T.

Now I am getdisting the 128 chains. Maybe I can use this new producing matrix to do future work. And then the time for converging the chains will possibly short.

Michael Doran
Posts: 41
Joined: November 22 2004
Affiliation: ITP Heidelberg
Contact:

CosmoMC: How long can chains converge?

Post by Michael Doran » July 22 2005

Hi,

what do you mean by w_0 and w_1 ? Which parametrization ?

I ask, cause in cases where w goes to something >0 or even worse w > 1/3 at earlier times, the numerics (plus some assumptions) might be unreliable.

In addition, for w crossing w=-1, you might have to switch fluctuations off (unphysical as this might be).

You might want to consider only models for which w <= 1/3 always and even in those cases you might have troubles, I guess...

Michael

Antony Lewis
Posts: 1944
Joined: September 23 2004
Affiliation: University of Sussex
Contact:

Re: CosmoMC: How long can chains converge?

Post by Antony Lewis » July 22 2005

I'd run much fewer than 128 chains - maybe 6. It's certainly not tested for that many, though I'm not aware of definite problems. But certainly a waste of CPU time unless for some reason you need to massively oversample, and it's not likely to help MPI convergence if you have problems.

Looking at the output is usually the best way to diagnose problems.

Jun-Qing Xia
Posts: 22
Joined: January 02 2005
Affiliation: SISSA, Italy

CosmoMC: How long can chains converge?

Post by Jun-Qing Xia » July 22 2005

Maybe I made some mistakes. Now I use 128 CPUs to run 128 chains. I should run M chains and make each chain running on N processors. But I don't know where to set the parameter N in params.ini.
In CosmoMC's Readme I found something at FAQ 4:

eg. if you want to run with MxN processors (ie. M chains each running on
N processors) under the aux queue:
Set number of processors to {N} in params.ini
......
But the important thing is to leave the line
OMP_NUM_THREADS={N}
eg.
OMP_NUM_THREADS=4

I don't find the parameter OMP_NUM_THREADS in any file.
I think this is the key problem.
Would anyone help me?
Thanks!

Jun-Qing

Jun-Qing Xia
Posts: 22
Joined: January 02 2005
Affiliation: SISSA, Italy

Re: CosmoMC: How long can chains converge?

Post by Jun-Qing Xia » July 22 2005

Michael Doran wrote:Hi,

what do you mean by w_0 and w_1 ? Which parametrization ?

I ask, cause in cases where w goes to something >0 or even worse w > 1/3 at earlier times, the numerics (plus some assumptions) might be unreliable.

Michael
Hi Michael,

I use the parametrization like w = w_0 + w_1 ( 1 - a ). I believe what you point out is correct and very important. I have neglected this last time and I suppose this is one of the reasons why the chains converge so slowly. Thanks!

Jun-Qing

Jun-Qing Xia
Posts: 22
Joined: January 02 2005
Affiliation: SISSA, Italy

Re: CosmoMC: How long can chains converge?

Post by Jun-Qing Xia » July 22 2005

Antony Lewis wrote:I'd run much fewer than 128 chains - maybe 6. It's certainly not tested for that many, though I'm not aware of definite problems. But certainly a waste of CPU time unless for some reason you need to massively oversample, and it's not likely to help MPI convergence if you have problems.

Looking at the output is usually the best way to diagnose problems.
If I use one CPU (Intel Itanium 1.3GHz CPU) to run one chain when I add two slow parameters (as in the case with w_0 and w_1), how long would the chain converge typically?
I have not yet figured out how to run one chain on N CPUs. Any suggestion? P.S. Would the time of convergence then get shortened around N times typically?
Thanks!

Jun-Qing

Antony Lewis
Posts: 1944
Joined: September 23 2004
Affiliation: University of Sussex
Contact:

Re: CosmoMC: How long can chains converge?

Post by Antony Lewis » July 22 2005

You can set num_threads in the cosmomc .ini file to automatically force OMP_NUM_THREADS for open-mp parallelization. By default the number of threads is usually set from an environment variable (on small computers, the same as the number of CPUs). On many systems you need to specify the number of threads when submitting the job.

Using N CPUs per chains will generally scale well with N up to N>8 or so (depending on the system).

With one CPU convergence will probably take for ages. Using more than one with MPI lets it learn the covariance matrix, and is recommended unless you have a good covariance matrix to start with.

Jun-Qing Xia
Posts: 22
Joined: January 02 2005
Affiliation: SISSA, Italy

Re: CosmoMC: How long can chains converge?

Post by Jun-Qing Xia » July 22 2005

Antony Lewis wrote:You can set num_threads in the cosmomc .ini file to automatically set OMP_NUM_THREADS for open-mp parallelization. (however on some systems you also need to specify this when submitting the job).
In CosmoMC's Readme:

The num_threads parameter will determine the number of openMP threads (in MPI runs, usually set to the number of CPUs on each node).

The supercomputer which I use have 4 CPUs on each node. So the maximum number of CPUs to run one chain is 4?!
Antony Lewis wrote:Using N CPUs per chains will generally scale well with N up to N>8 or so (depending on the system).
I probably did not catch what you mean. So if I use 128 CPUs for one chain (4 CPUs on each node) the time can only be shortened to 1/8? ......

Antony Lewis
Posts: 1944
Joined: September 23 2004
Affiliation: University of Sussex
Contact:

Re: CosmoMC: How long can chains converge?

Post by Antony Lewis » July 23 2005

Jun-Qing Xia wrote: The supercomputer which I use have 4 CPUs on each node. So the maximum number of CPUs to run one chain is 4?!
Yes.
Jun-Qing Xia wrote: I probably did not catch what you mean. So if I use 128 CPUs for one chain (4 CPUs on each node) the time can only be shortened to 1/8? ......
Using 4 CPUs per chain will speed it up by a factor of about 4. Running more than a handful of chains probably won't speed convergence at all (at least initially).

I would run 6 chains on 6 nodes, 4 CPUs per chain on each node.

Jun-Qing Xia
Posts: 22
Joined: January 02 2005
Affiliation: SISSA, Italy

CosmoMC: How long can chains converge?

Post by Jun-Qing Xia » July 23 2005

I am puzzled.
If I use two nodes (if each node has 4 CPUs) to run two chains. How many can output files "file_root_NN.log" produce? Two or eight?
If I set parameter num_threads = 0 in the cosmomc params.ini , does the program automatically use 4 CPUs to run each chain (if each node has 4 CPUs)?
Thanks!

Antony Lewis
Posts: 1944
Joined: September 23 2004
Affiliation: University of Sussex
Contact:

Re: CosmoMC: How long can chains converge?

Post by Antony Lewis » July 23 2005

Jun-Qing Xia wrote:I am puzzled.
If I use two nodes (if each node has 4 CPUs) to run two chains. How many can output files "file_root_NN.log" produce? Two or eight?
Two, you get one file per chain.
Jun-Qing Xia wrote: If I set parameter num_threads = 0 in the cosmomc params.ini , does the program automatically use 4 CPUs to run each chain (if each node has 4 CPUs)?
Thanks!
If compiled with -openmp, then yes, probably (it may depend on your computer's settings).

Post Reply