I started running into a very strange error recently (~1 month ago) with cobaya when running things in parallel, which I didn't have before. Namely,
Code: Select all
cobaya-run config/all_bao_heft.yml --force
Code: Select all
mpirun -np 2 cobaya-run config/all_bao_heft.yml --force
The versions of cobaya I have tried are the latest pip install version (3.1.1) as well as the latest cobaya version on github (#07e3933).
For mpi4py, I have tried 3.0.0, 3.1.1, 3.0.1.
I have also tried this on a completely different cluster and got the same error.
My config file has only the normal likelihoods and none of the external ones; for example, bao.sdss_dr12_consensus_bao: null
Error:
Code: Select all
[boryanah@(NEW) glamdring:GrandConjuration]$ mpirun -np 2 cobaya-run config/planckBAO.yaml --force
[0 : output] Output to be read-from/written-into folder 'cobaya_out', with prefix 'planckBAO'
[0 : output] Found existing info files with the requested output prefix: 'cobaya_out/planckBAO'
[0 : output] Will delete previous products ('force' was requested).
[0 : run] --------------------
Traceback (most recent call last):
File "/usr/lib/python3.6/pickle.py", line 918, in save_global
obj2, parent = _getattribute(module, name)
File "/usr/lib/python3.6/pickle.py", line 266, in _getattribute
.format(name, obj))
AttributeError: Can't get local attribute 'logger_setup.<locals>.MyFormatter' on <function logger_setup at 0x7f2395289510>
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/mnt/zfsusers/boryanah/repos/cobaya/cobaya/run.py", line 103, in run
force=info.get("force"), infix=infix) as out:
File "/mnt/zfsusers/boryanah/repos/cobaya/cobaya/output.py", line 574, in get_output
return Output(*args, **kwargs)
File "/mnt/zfsusers/boryanah/repos/cobaya/cobaya/mpi.py", line 279, in wrapper
share_mpi([result] + [getattr(self, var, None) for var in atts])
File "/mnt/zfsusers/boryanah/repos/cobaya/cobaya/mpi.py", line 136, in share_mpi
return get_mpi_comm().bcast(data, root=root)
File "mpi4py/MPI/Comm.pyx", line 1569, in mpi4py.MPI.Comm.bcast
File "mpi4py/MPI/msgpickle.pxi", line 721, in mpi4py.MPI.PyMPI_bcast
File "mpi4py/MPI/msgpickle.pxi", line 145, in mpi4py.MPI.pickle_dump
File "mpi4py/MPI/msgpickle.pxi", line 133, in mpi4py.MPI.cdumps
File "/users/boryanah/.local/lib/python3.6/site-packages/dill/_dill.py", line 304, in dumps
dump(obj, file, protocol, byref, fmode, recurse, **kwds)#, strictio)
File "/users/boryanah/.local/lib/python3.6/site-packages/dill/_dill.py", line 276, in dump
Pickler(file, protocol, **_kwds).dump(obj)
File "/users/boryanah/.local/lib/python3.6/site-packages/dill/_dill.py", line 498, in dump
StockPickler.dump(self, obj)
File "/usr/lib/python3.6/pickle.py", line 409, in dump
self.save(obj)
File "/usr/lib/python3.6/pickle.py", line 476, in save
f(self, obj) # Call unbound method with explicit self
File "/usr/lib/python3.6/pickle.py", line 781, in save_list
self._batch_appends(obj)
File "/usr/lib/python3.6/pickle.py", line 805, in _batch_appends
save(x)
File "/usr/lib/python3.6/pickle.py", line 521, in save
self.save_reduce(obj=obj, *rv)
File "/usr/lib/python3.6/pickle.py", line 634, in save_reduce
save(state)
File "/usr/lib/python3.6/pickle.py", line 476, in save
f(self, obj) # Call unbound method with explicit self
File "/users/boryanah/.local/lib/python3.6/site-packages/dill/_dill.py", line 990, in save_module_dict
StockPickler.save_dict(pickler, obj)
File "/usr/lib/python3.6/pickle.py", line 821, in save_dict
self._batch_setitems(obj.items())
File "/usr/lib/python3.6/pickle.py", line 847, in _batch_setitems
save(v)
File "/usr/lib/python3.6/pickle.py", line 521, in save
self.save_reduce(obj=obj, *rv)
File "/usr/lib/python3.6/pickle.py", line 634, in save_reduce
save(state)
File "/usr/lib/python3.6/pickle.py", line 476, in save
f(self, obj) # Call unbound method with explicit self
File "/users/boryanah/.local/lib/python3.6/site-packages/dill/_dill.py", line 990, in save_module_dict
StockPickler.save_dict(pickler, obj)
File "/usr/lib/python3.6/pickle.py", line 821, in save_dict
self._batch_setitems(obj.items())
File "/usr/lib/python3.6/pickle.py", line 847, in _batch_setitems
save(v)
File "/usr/lib/python3.6/pickle.py", line 476, in save
f(self, obj) # Call unbound method with explicit self
File "/usr/lib/python3.6/pickle.py", line 781, in save_list
self._batch_appends(obj)
File "/usr/lib/python3.6/pickle.py", line 808, in _batch_appends
save(tmp[0])
File "/usr/lib/python3.6/pickle.py", line 521, in save
self.save_reduce(obj=obj, *rv)
File "/usr/lib/python3.6/pickle.py", line 634, in save_reduce
save(state)
File "/usr/lib/python3.6/pickle.py", line 476, in save
f(self, obj) # Call unbound method with explicit self
File "/users/boryanah/.local/lib/python3.6/site-packages/dill/_dill.py", line 990, in save_module_dict
StockPickler.save_dict(pickler, obj)
File "/usr/lib/python3.6/pickle.py", line 821, in save_dict
self._batch_setitems(obj.items())
File "/usr/lib/python3.6/pickle.py", line 847, in _batch_setitems
save(v)
File "/usr/lib/python3.6/pickle.py", line 521, in save
self.save_reduce(obj=obj, *rv)
File "/usr/lib/python3.6/pickle.py", line 605, in save_reduce
save(cls)
File "/usr/lib/python3.6/pickle.py", line 476, in save
f(self, obj) # Call unbound method with explicit self
File "/users/boryanah/.local/lib/python3.6/site-packages/dill/_dill.py", line 1439, in save_type
StockPickler.save_global(pickler, obj, name=name)
File "/usr/lib/python3.6/pickle.py", line 922, in save_global
(obj, module_name, name))
_pickle.PicklingError: Can't pickle <class 'cobaya.log.logger_setup.<locals>.MyFormatter'>: it's not found as cobaya.log.logger_setup.<locals>.MyFormatter
-------------------------------------
--------------------------------------------------------------------------
MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD
with errorcode 1.
NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
You may or may not see output from other processes, depending on
exactly when Open MPI kills them.
--------------------------------------------------------------------------