CosmoMC: How long can chains converge?
-
- Posts: 22
- Joined: January 02 2005
- Affiliation: SISSA, Italy
CosmoMC: How long can chains converge?
Dear All,
Now I use two parameters w0 and w1 to express the equation of state of Dark Energy instead of w. And I choose the spaces of parameters w0 and w1 between -10 and 10. I use the default setting about other parameters.
Two weeks ago I use 128 CPUs (Intel Itanium 1.3GHz CPU) on SuperComputer to run this program. But now the Current convergence R-1 is still about 3.2. But generally need R-1 < 0.1.
Does anyone know how long can these chains converge under this situation?
Thanks!
Jun-Qing Xia
Now I use two parameters w0 and w1 to express the equation of state of Dark Energy instead of w. And I choose the spaces of parameters w0 and w1 between -10 and 10. I use the default setting about other parameters.
Two weeks ago I use 128 CPUs (Intel Itanium 1.3GHz CPU) on SuperComputer to run this program. But now the Current convergence R-1 is still about 3.2. But generally need R-1 < 0.1.
Does anyone know how long can these chains converge under this situation?
Thanks!
Jun-Qing Xia
-
- Posts: 183
- Joined: September 24 2004
- Affiliation: Brookhaven National Laboratory
- Contact:
CosmoMC: How long can chains converge?
Huh!
Antony is probably the right person to answer this, but on a 128 CPU monster it should converge in a few hours at most! So I guess you must be using a wrong covariance matrix... I find that MPI_StartSliceSampling = T helps a lot for a good covariance matrix, plus MPI_LearnPropose = T, of course, and maybe just forget about the default matrix and set "propose_matrix =". You might also try "estimate_propose_matrix =" but it doesn't always work... Anyway, given you have so many CPUs I would go with large openmp, say num_threads = 8, which still gives you 16 chains which is plenty!
anze
Antony is probably the right person to answer this, but on a 128 CPU monster it should converge in a few hours at most! So I guess you must be using a wrong covariance matrix... I find that MPI_StartSliceSampling = T helps a lot for a good covariance matrix, plus MPI_LearnPropose = T, of course, and maybe just forget about the default matrix and set "propose_matrix =". You might also try "estimate_propose_matrix =" but it doesn't always work... Anyway, given you have so many CPUs I would go with large openmp, say num_threads = 8, which still gives you 16 chains which is plenty!
anze
-
- Posts: 22
- Joined: January 02 2005
- Affiliation: SISSA, Italy
Re: CosmoMC: How long can chains converge?
Hi!Anze Slosar wrote:Huh!
Antony is probably the right person to answer this, but on a 128 CPU monster it should converge in a few hours at most! So I guess you must be using a wrong covariance matrix... I find that MPI_StartSliceSampling = T helps a lot for a good covariance matrix, plus MPI_LearnPropose = T, of course, and maybe just forget about the default matrix and set "propose_matrix =". You might also try "estimate_propose_matrix =" but it doesn't always work... Anyway, given you have so many CPUs I would go with large openmp, say num_threads = 8, which still gives you 16 chains which is plenty!
anze
You mean that I should set the parameters in params.ini like this:
MPI_StartSliceSampling = T
MPI_LearnPropose = T
propose_matrix =
estimate_propose_matrix = T
Because I add a parameter I don't have a good covariance matrix. What I do is adding many zero to make a new matrix 14×14 instead of the default matrix 13×13. And I set estimate_propose_matrix = F.
The parameters MPI_StartSliceSampling and MPI_LearnPropose are always set T.
So in the params.ini which I use is :
MPI_StartSliceSampling = T
MPI_LearnPropose = T
propose_matrix = params_CMB.covmat #(this is a new matrix 14×14)
estimate_propose_matrix = F
Am I wrong?
Jun-Qing
-
- Posts: 183
- Joined: September 24 2004
- Affiliation: Brookhaven National Laboratory
- Contact:
Re: CosmoMC: How long can chains converge?
Well, I would expect that to work... Even if your covmat is very wrong then the initial dose of slice sampling should help MPI_learnpropose to converge to the right distribution...Jun-Qing Xia wrote: So in the params.ini which I use is :
MPI_StartSliceSampling = T
MPI_LearnPropose = T
propose_matrix = params_CMB.covmat #(this is a new matrix 14×14)
estimate_propose_matrix = F
Am I wrong?
Jun-Qing
I guess the best thing to do would be to compare individual chains and just plot their distribution and see which parameter they disagree in most and then investigate...
-
- Posts: 22
- Joined: January 02 2005
- Affiliation: SISSA, Italy
CosmoMC: How long can chains converge?
Before I do like this I posted a topic "COSMOMC: How to constrain w?". Sarah Bridle replied I would only modify the matrix. And she thought it's possibly better than using estimate_propose_matrix = T.
Now I am getdisting the 128 chains. Maybe I can use this new producing matrix to do future work. And then the time for converging the chains will possibly short.
Now I am getdisting the 128 chains. Maybe I can use this new producing matrix to do future work. And then the time for converging the chains will possibly short.
-
- Posts: 41
- Joined: November 22 2004
- Affiliation: ITP Heidelberg
- Contact:
CosmoMC: How long can chains converge?
Hi,
what do you mean by w_0 and w_1 ? Which parametrization ?
I ask, cause in cases where w goes to something >0 or even worse w > 1/3 at earlier times, the numerics (plus some assumptions) might be unreliable.
In addition, for w crossing w=-1, you might have to switch fluctuations off (unphysical as this might be).
You might want to consider only models for which w <= 1/3 always and even in those cases you might have troubles, I guess...
Michael
what do you mean by w_0 and w_1 ? Which parametrization ?
I ask, cause in cases where w goes to something >0 or even worse w > 1/3 at earlier times, the numerics (plus some assumptions) might be unreliable.
In addition, for w crossing w=-1, you might have to switch fluctuations off (unphysical as this might be).
You might want to consider only models for which w <= 1/3 always and even in those cases you might have troubles, I guess...
Michael
-
- Posts: 1944
- Joined: September 23 2004
- Affiliation: University of Sussex
- Contact:
Re: CosmoMC: How long can chains converge?
I'd run much fewer than 128 chains - maybe 6. It's certainly not tested for that many, though I'm not aware of definite problems. But certainly a waste of CPU time unless for some reason you need to massively oversample, and it's not likely to help MPI convergence if you have problems.
Looking at the output is usually the best way to diagnose problems.
Looking at the output is usually the best way to diagnose problems.
-
- Posts: 22
- Joined: January 02 2005
- Affiliation: SISSA, Italy
CosmoMC: How long can chains converge?
Maybe I made some mistakes. Now I use 128 CPUs to run 128 chains. I should run M chains and make each chain running on N processors. But I don't know where to set the parameter N in params.ini.
In CosmoMC's Readme I found something at FAQ 4:
eg. if you want to run with MxN processors (ie. M chains each running on
N processors) under the aux queue:
Set number of processors to {N} in params.ini
......
But the important thing is to leave the line
OMP_NUM_THREADS={N}
eg.
OMP_NUM_THREADS=4
I don't find the parameter OMP_NUM_THREADS in any file.
I think this is the key problem.
Would anyone help me?
Thanks!
Jun-Qing
In CosmoMC's Readme I found something at FAQ 4:
eg. if you want to run with MxN processors (ie. M chains each running on
N processors) under the aux queue:
Set number of processors to {N} in params.ini
......
But the important thing is to leave the line
OMP_NUM_THREADS={N}
eg.
OMP_NUM_THREADS=4
I don't find the parameter OMP_NUM_THREADS in any file.
I think this is the key problem.
Would anyone help me?
Thanks!
Jun-Qing
-
- Posts: 22
- Joined: January 02 2005
- Affiliation: SISSA, Italy
Re: CosmoMC: How long can chains converge?
Hi Michael,Michael Doran wrote:Hi,
what do you mean by w_0 and w_1 ? Which parametrization ?
I ask, cause in cases where w goes to something >0 or even worse w > 1/3 at earlier times, the numerics (plus some assumptions) might be unreliable.
Michael
I use the parametrization like w = w_0 + w_1 ( 1 - a ). I believe what you point out is correct and very important. I have neglected this last time and I suppose this is one of the reasons why the chains converge so slowly. Thanks!
Jun-Qing
-
- Posts: 22
- Joined: January 02 2005
- Affiliation: SISSA, Italy
Re: CosmoMC: How long can chains converge?
If I use one CPU (Intel Itanium 1.3GHz CPU) to run one chain when I add two slow parameters (as in the case with w_0 and w_1), how long would the chain converge typically?Antony Lewis wrote:I'd run much fewer than 128 chains - maybe 6. It's certainly not tested for that many, though I'm not aware of definite problems. But certainly a waste of CPU time unless for some reason you need to massively oversample, and it's not likely to help MPI convergence if you have problems.
Looking at the output is usually the best way to diagnose problems.
I have not yet figured out how to run one chain on N CPUs. Any suggestion? P.S. Would the time of convergence then get shortened around N times typically?
Thanks!
Jun-Qing
-
- Posts: 1944
- Joined: September 23 2004
- Affiliation: University of Sussex
- Contact:
Re: CosmoMC: How long can chains converge?
You can set num_threads in the cosmomc .ini file to automatically force OMP_NUM_THREADS for open-mp parallelization. By default the number of threads is usually set from an environment variable (on small computers, the same as the number of CPUs). On many systems you need to specify the number of threads when submitting the job.
Using N CPUs per chains will generally scale well with N up to N>8 or so (depending on the system).
With one CPU convergence will probably take for ages. Using more than one with MPI lets it learn the covariance matrix, and is recommended unless you have a good covariance matrix to start with.
Using N CPUs per chains will generally scale well with N up to N>8 or so (depending on the system).
With one CPU convergence will probably take for ages. Using more than one with MPI lets it learn the covariance matrix, and is recommended unless you have a good covariance matrix to start with.
-
- Posts: 22
- Joined: January 02 2005
- Affiliation: SISSA, Italy
Re: CosmoMC: How long can chains converge?
In CosmoMC's Readme:Antony Lewis wrote:You can set num_threads in the cosmomc .ini file to automatically set OMP_NUM_THREADS for open-mp parallelization. (however on some systems you also need to specify this when submitting the job).
The num_threads parameter will determine the number of openMP threads (in MPI runs, usually set to the number of CPUs on each node).
The supercomputer which I use have 4 CPUs on each node. So the maximum number of CPUs to run one chain is 4?!
I probably did not catch what you mean. So if I use 128 CPUs for one chain (4 CPUs on each node) the time can only be shortened to 1/8? ......Antony Lewis wrote:Using N CPUs per chains will generally scale well with N up to N>8 or so (depending on the system).
-
- Posts: 1944
- Joined: September 23 2004
- Affiliation: University of Sussex
- Contact:
Re: CosmoMC: How long can chains converge?
Yes.Jun-Qing Xia wrote: The supercomputer which I use have 4 CPUs on each node. So the maximum number of CPUs to run one chain is 4?!
Using 4 CPUs per chain will speed it up by a factor of about 4. Running more than a handful of chains probably won't speed convergence at all (at least initially).Jun-Qing Xia wrote: I probably did not catch what you mean. So if I use 128 CPUs for one chain (4 CPUs on each node) the time can only be shortened to 1/8? ......
I would run 6 chains on 6 nodes, 4 CPUs per chain on each node.
-
- Posts: 22
- Joined: January 02 2005
- Affiliation: SISSA, Italy
CosmoMC: How long can chains converge?
I am puzzled.
If I use two nodes (if each node has 4 CPUs) to run two chains. How many can output files "file_root_NN.log" produce? Two or eight?
If I set parameter num_threads = 0 in the cosmomc params.ini , does the program automatically use 4 CPUs to run each chain (if each node has 4 CPUs)?
Thanks!
If I use two nodes (if each node has 4 CPUs) to run two chains. How many can output files "file_root_NN.log" produce? Two or eight?
If I set parameter num_threads = 0 in the cosmomc params.ini , does the program automatically use 4 CPUs to run each chain (if each node has 4 CPUs)?
Thanks!
-
- Posts: 1944
- Joined: September 23 2004
- Affiliation: University of Sussex
- Contact:
Re: CosmoMC: How long can chains converge?
Two, you get one file per chain.Jun-Qing Xia wrote:I am puzzled.
If I use two nodes (if each node has 4 CPUs) to run two chains. How many can output files "file_root_NN.log" produce? Two or eight?
If compiled with -openmp, then yes, probably (it may depend on your computer's settings).Jun-Qing Xia wrote: If I set parameter num_threads = 0 in the cosmomc params.ini , does the program automatically use 4 CPUs to run each chain (if each node has 4 CPUs)?
Thanks!