shadab alam wrote:
Hello Sheng Li,
I tried checking a few things from your last reply.
Sounds like you have some chains stopped before it reached 100 effective samples.
Also, 64 chains per proc? How many cores for this processor?
Maybe you will be suggested to use 8 chains at most for each proc.
Then try to see if you have the same problem. But, this will not guarantee non-stop chains like you found for these 64 chains run.
I have runned 16 chain on 16 processor. Therefore 1 chain/processor.
It has finished total of ~26000 step. On average, each chain has finished 1600 steps and all of them is showing a decrease in my data chi-square. Each chain finished more than 100 steps. But, I still dont see any *chk file.
Do you mind giving me some other hint/posible reason for this issue?
Sorry for the late reply.
As I said before, you will ONLY have .chk files generated when each of your chain file (root= you_named_it_to_save_params_txtfile) had 100 records.
Otherwise there will be NO chk file in your directory. To be clear, once any of your chain file has 100 records -- 100 lines of params, you will see the corresponding .chk file for this chain file.
No matter what steps you have described in all these posts meant, CosmoMC will only check the number of the accepted chains which mean in turn your likelihood function if proper for your task.
Therefore, you may know how to 'hack' or play a trick to have your chk generated. That is to say, you can modify the threshold 100 in:
Code: Select all
checkpoint_freq = 100
to 10 or some other number subject to how many lines you can find in your chain file.
For example, if you can find the minimal lines (let me assume 10 or less) in some file, then you can change this checkpoint_freq = 2 to 10 or less, so as to examine if your program can actually run properly.
*On this number, threshold, I think there is no theoretical reason to set 100 or 1000 or 10; but just for practical reason to save space and running time for computational intensive program, like program in MPI, CUDA, etc.
If you can not see .chk file neither, then you have to think about your program or your modification was possibly wrong or ill modified.
Also, Antony has already suspected that your chains were not moving properly. It is likely for your case. This is to say your chains had never reached 100 times for accepting.
Besides, you may check the reply from Jason, just above.