I get an unexpected error (see below) when running Cobaya on SNe launched like this :

Code: Select all

`$ nohup mpirun -np 4 cobaya-run sne.yaml &`

Code: Select all

```
output: sne_chains/sne
theory:
camb:
extra_args:
halofit_version: mead
bbn_predictor: PArthENoPE_880.2_standard.dat
lens_potential_accuracy: 1
num_massive_neutrinos: 1
nnu: 3.046
theta_H0_range:
- 20
- 100
likelihood:
sn.pantheon: null
params:
logA:
prior:
min: 1.61
max: 3.91
ref:
dist: norm
loc: 3.05
scale: 0.001
proposal: 0.001
latex: \log(10^{10} A_\mathrm{s})
drop: true
As:
value: 'lambda logA: 1e-10*np.exp(logA)'
latex: A_\mathrm{s}
ns:
prior:
min: 0.8
max: 1.2
ref:
dist: norm
loc: 0.965
scale: 0.004
proposal: 0.002
latex: n_\mathrm{s}
theta_MC_100:
prior:
min: 0.5
max: 10
ref:
dist: norm
loc: 1.04109
scale: 0.0004
proposal: 0.0002
latex: 100\theta_\mathrm{MC}
drop: true
renames: theta
cosmomc_theta:
value: 'lambda theta_MC_100: 1.e-2*theta_MC_100'
derived: false
H0:
latex: H_0
min: 20
max: 100
ombh2:
prior:
min: 0.005
max: 0.1
ref:
dist: norm
loc: 0.0224
scale: 0.0001
proposal: 0.0001
latex: \Omega_\mathrm{b} h^2
omch2:
prior:
min: 0.001
max: 0.99
ref:
dist: norm
loc: 0.12
scale: 0.001
proposal: 0.0005
latex: \Omega_\mathrm{c} h^2
omegam:
latex: \Omega_\mathrm{m}
omegamh2:
derived: 'lambda omegam, H0: omegam*(H0/100)**2'
latex: \Omega_\mathrm{m} h^2
mnu: 0.06
omk:
prior:
min: -0.3
max: 0.3
ref:
dist: norm
loc: 0.
scale: 0.006
proposal: 0.003
latex: \Omega_\mathrm{k}
omega_de:
latex: \Omega_\Lambda
YHe:
latex: Y_\mathrm{P}
Y_p:
latex: Y_P^\mathrm{BBN}
DHBBN:
derived: 'lambda DH: 10**5*DH'
latex: 10^5 \mathrm{D}/\mathrm{H}
tau:
prior:
min: 0.01
max: 0.8
ref:
dist: norm
loc: 0.055
scale: 0.006
proposal: 0.003
latex: \tau_\mathrm{reio}
zrei:
latex: z_\mathrm{re}
sigma8:
latex: \sigma_8
s8h5:
derived: 'lambda sigma8, H0: sigma8*(H0*1e-2)**(-0.5)'
latex: \sigma_8/h^{0.5}
s8omegamp5:
derived: 'lambda sigma8, omegam: sigma8*omegam**0.5'
latex: \sigma_8 \Omega_\mathrm{m}^{0.5}
s8omegamp25:
derived: 'lambda sigma8, omegam: sigma8*omegam**0.25'
latex: \sigma_8 \Omega_\mathrm{m}^{0.25}
A:
derived: 'lambda As: 1e9*As'
latex: 10^9 A_\mathrm{s}
clamp:
derived: 'lambda As, tau: 1e9*As*np.exp(-2*tau)'
latex: 10^9 A_\mathrm{s} e^{-2\tau}
age:
latex: '{\rm{Age}}/\mathrm{Gyr}'
rdrag:
latex: r_\mathrm{drag}
sampler:
mcmc:
drag: true
oversample_power: 0.4
proposal_scale: 1.9
covmat: auto
Rminus1_stop: 0.01
Rminus1_cl_stop: 0.2
resume: true
```

Code: Select all

```
[2 : mcmc] Learn + convergence test @ 8400 samples accepted.
[3 : mcmc] Learn + convergence test @ 8000 samples accepted.
[1 : mcmc] Progress @ 2022-05-18 03:12:41 : 11818 steps taken, and 9588 accepted.
[3 : mcmc] Progress @ 2022-05-18 03:12:42 : 11228 steps taken, and 8098 accepted.
[2 : mcmc] Progress @ 2022-05-18 03:12:42 : 11439 steps taken, and 8505 accepted.
[0 : mcmc] Progress @ 2022-05-18 03:12:43 : 10541 steps taken, and 5937 accepted.
[1 : mcmc] Learn + convergence test @ 9600 samples accepted.
[2 : mcmc] Learn + convergence test @ 8600 samples accepted.
[3 : mcmc] Learn + convergence test @ 8200 samples accepted.
[1 : mcmc] Progress @ 2022-05-18 03:13:41 : 11983 steps taken, and 9730 accepted.
[3 : mcmc] Progress @ 2022-05-18 03:13:42 : 11398 steps taken, and 8216 accepted.
[2 : mcmc] Progress @ 2022-05-18 03:13:42 : 11608 steps taken, and 8657 accepted.
[0 : mcmc] Progress @ 2022-05-18 03:13:43 : 10674 steps taken, and 5970 accepted.
[1 : mcmc] Learn + convergence test @ 9800 samples accepted.
[0 : mcmc] Learn + convergence test @ 6000 samples accepted.
[1 : mcmc] Progress @ 2022-05-18 03:14:41 : 12206 steps taken, and 9864 accepted.
[3 : mcmc] Progress @ 2022-05-18 03:14:42 : 11579 steps taken, and 8373 accepted.
[2 : mcmc] Progress @ 2022-05-18 03:14:43 : 11762 steps taken, and 8775 accepted.
[0 : mcmc] Progress @ 2022-05-18 03:14:43 : 10803 steps taken, and 6007 accepted.
[3 : mcmc] Learn + convergence test @ 8400 samples accepted.
[2 : mcmc] Learn + convergence test @ 8800 samples accepted.
[1 : mcmc] Progress @ 2022-05-18 03:15:41 : 12388 steps taken, and 9995 accepted.
[3 : mcmc] Progress @ 2022-05-18 03:15:42 : 11758 steps taken, and 8532 accepted.
[2 : mcmc] Progress @ 2022-05-18 03:15:43 : 11917 steps taken, and 8891 accepted.
[0 : mcmc] Progress @ 2022-05-18 03:15:43 : 10930 steps taken, and 6048 accepted.
[1 : mcmc] Learn + convergence test @ 10000 samples accepted.
[3 : mcmc] Learn + convergence test @ 8600 samples accepted.
[2 : mcmc] Learn + convergence test @ 9000 samples accepted.
[1 : mcmc] Progress @ 2022-05-18 03:16:42 : 12570 steps taken, and 10139 accepted.
[3 : mcmc] Progress @ 2022-05-18 03:16:42 : 11972 steps taken, and 8689 accepted.
[2 : mcmc] Progress @ 2022-05-18 03:16:43 : 12080 steps taken, and 9027 accepted.
[0 : mcmc] Progress @ 2022-05-18 03:16:43 : 11058 steps taken, and 6088 accepted.
[1 : mcmc] Learn + convergence test @ 10200 samples accepted.
[1 : mcmc] *ERROR* Waiting for too long for all chains to be ready. Maybe one of them is stuck or died unexpectedly?
[1 : mcmc] Aborting MPI due to error
Abort(1) on node 1 (rank 1 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, 1) - process 1
```

However, the entire computing is stopped and impossible to notify to Cobaya to continue the execution.

Does anyone know a clue/track to fix that behavior ?

Regards