CosmoMC generic mode MPI error

Use of Cobaya. camb, CLASS, cosmomc, compilers, etc.
Post Reply
Pete Hague
Posts: 3
Joined: November 22 2012
Affiliation: University of Leicester

CosmoMC generic mode MPI error

Post by Pete Hague » November 22 2012

Hi,

I'm using CosmoMC in generic mode, and currently run this on 8 cores (either all on one node, or spread across multiple ones - both work albeit at different speeds)

I want to increase the number of cores so I can do a much larger run in reasonable time, but the code that works with 8 cores does not work when I try to use 32.

All 32 chains start, but they then stop with errors that look like:

[node031:18348] [[34105,0],0]-[[34105,1],7] mca_oob_tcp_msg_recv: readv failed: Connection reset by peer (104)

Any ideas? I'm suspecting perhaps it is a memory issue. Does CosmoMC require a great deal more memory for a higher number of cores?

Antony Lewis
Posts: 1943
Joined: September 23 2004
Affiliation: University of Sussex
Contact:

Re: CosmoMC generic mode MPI error

Post by Antony Lewis » November 24 2012

I'm not sure what the error is, but usually running more than 8 chains will not help very much, I usually do 4-8. So unless your generic likelihood is well open-mp parallelized more cores may not help much.

Post Reply