cobaya error when running in parallel

Use of Cobaya. camb, CLASS, cosmomc, compilers, etc.
Post Reply
boryana hadzhiyska
Posts: 5
Joined: April 28 2021
Affiliation: Harvard

cobaya error when running in parallel

Post by boryana hadzhiyska » August 24 2021

Hi,

I started running into a very strange error recently (~1 month ago) with cobaya when running things in parallel, which I didn't have before. Namely,

Code: Select all

cobaya-run config/all_bao_heft.yml --force
works well, whereas

Code: Select all

mpirun -np 2 cobaya-run config/all_bao_heft.yml --force
breaks. The error is about pickling and is shown below.
The versions of cobaya I have tried are the latest pip install version (3.1.1) as well as the latest cobaya version on github (#07e3933).
For mpi4py, I have tried 3.0.0, 3.1.1, 3.0.1.

I have also tried this on a completely different cluster and got the same error.

My config file has only the normal likelihoods and none of the external ones; for example, bao.sdss_dr12_consensus_bao: null

Error:

Code: Select all

[boryanah@(NEW) glamdring:GrandConjuration]$ mpirun -np 2 cobaya-run config/planckBAO.yaml --force
[0 : output] Output to be read-from/written-into folder 'cobaya_out', with prefix 'planckBAO'
[0 : output] Found existing info files with the requested output prefix: 'cobaya_out/planckBAO'
[0 : output] Will delete previous products ('force' was requested).
[0 : run] --------------------

Traceback (most recent call last):
  File "/usr/lib/python3.6/pickle.py", line 918, in save_global
    obj2, parent = _getattribute(module, name)
  File "/usr/lib/python3.6/pickle.py", line 266, in _getattribute
    .format(name, obj))
AttributeError: Can't get local attribute 'logger_setup.<locals>.MyFormatter' on <function logger_setup at 0x7f2395289510>

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/mnt/zfsusers/boryanah/repos/cobaya/cobaya/run.py", line 103, in run
    force=info.get("force"), infix=infix) as out:
  File "/mnt/zfsusers/boryanah/repos/cobaya/cobaya/output.py", line 574, in get_output
    return Output(*args, **kwargs)
  File "/mnt/zfsusers/boryanah/repos/cobaya/cobaya/mpi.py", line 279, in wrapper
    share_mpi([result] + [getattr(self, var, None) for var in atts])
  File "/mnt/zfsusers/boryanah/repos/cobaya/cobaya/mpi.py", line 136, in share_mpi
    return get_mpi_comm().bcast(data, root=root)
  File "mpi4py/MPI/Comm.pyx", line 1569, in mpi4py.MPI.Comm.bcast
  File "mpi4py/MPI/msgpickle.pxi", line 721, in mpi4py.MPI.PyMPI_bcast
  File "mpi4py/MPI/msgpickle.pxi", line 145, in mpi4py.MPI.pickle_dump
  File "mpi4py/MPI/msgpickle.pxi", line 133, in mpi4py.MPI.cdumps
  File "/users/boryanah/.local/lib/python3.6/site-packages/dill/_dill.py", line 304, in dumps
    dump(obj, file, protocol, byref, fmode, recurse, **kwds)#, strictio)
  File "/users/boryanah/.local/lib/python3.6/site-packages/dill/_dill.py", line 276, in dump
    Pickler(file, protocol, **_kwds).dump(obj)
  File "/users/boryanah/.local/lib/python3.6/site-packages/dill/_dill.py", line 498, in dump
    StockPickler.dump(self, obj)
  File "/usr/lib/python3.6/pickle.py", line 409, in dump
    self.save(obj)
  File "/usr/lib/python3.6/pickle.py", line 476, in save
    f(self, obj) # Call unbound method with explicit self
  File "/usr/lib/python3.6/pickle.py", line 781, in save_list
    self._batch_appends(obj)
  File "/usr/lib/python3.6/pickle.py", line 805, in _batch_appends
    save(x)
  File "/usr/lib/python3.6/pickle.py", line 521, in save
    self.save_reduce(obj=obj, *rv)
  File "/usr/lib/python3.6/pickle.py", line 634, in save_reduce
    save(state)
  File "/usr/lib/python3.6/pickle.py", line 476, in save
    f(self, obj) # Call unbound method with explicit self
  File "/users/boryanah/.local/lib/python3.6/site-packages/dill/_dill.py", line 990, in save_module_dict
    StockPickler.save_dict(pickler, obj)
  File "/usr/lib/python3.6/pickle.py", line 821, in save_dict
    self._batch_setitems(obj.items())
  File "/usr/lib/python3.6/pickle.py", line 847, in _batch_setitems
    save(v)
  File "/usr/lib/python3.6/pickle.py", line 521, in save
    self.save_reduce(obj=obj, *rv)
  File "/usr/lib/python3.6/pickle.py", line 634, in save_reduce
    save(state)
  File "/usr/lib/python3.6/pickle.py", line 476, in save
    f(self, obj) # Call unbound method with explicit self
  File "/users/boryanah/.local/lib/python3.6/site-packages/dill/_dill.py", line 990, in save_module_dict
    StockPickler.save_dict(pickler, obj)
  File "/usr/lib/python3.6/pickle.py", line 821, in save_dict
    self._batch_setitems(obj.items())
  File "/usr/lib/python3.6/pickle.py", line 847, in _batch_setitems
    save(v)
  File "/usr/lib/python3.6/pickle.py", line 476, in save
    f(self, obj) # Call unbound method with explicit self
  File "/usr/lib/python3.6/pickle.py", line 781, in save_list
    self._batch_appends(obj)
  File "/usr/lib/python3.6/pickle.py", line 808, in _batch_appends
    save(tmp[0])
  File "/usr/lib/python3.6/pickle.py", line 521, in save
    self.save_reduce(obj=obj, *rv)
  File "/usr/lib/python3.6/pickle.py", line 634, in save_reduce
    save(state)
  File "/usr/lib/python3.6/pickle.py", line 476, in save
    f(self, obj) # Call unbound method with explicit self
  File "/users/boryanah/.local/lib/python3.6/site-packages/dill/_dill.py", line 990, in save_module_dict
    StockPickler.save_dict(pickler, obj)
  File "/usr/lib/python3.6/pickle.py", line 821, in save_dict
    self._batch_setitems(obj.items())
  File "/usr/lib/python3.6/pickle.py", line 847, in _batch_setitems
    save(v)
  File "/usr/lib/python3.6/pickle.py", line 521, in save
    self.save_reduce(obj=obj, *rv)
  File "/usr/lib/python3.6/pickle.py", line 605, in save_reduce
    save(cls)
  File "/usr/lib/python3.6/pickle.py", line 476, in save
    f(self, obj) # Call unbound method with explicit self
  File "/users/boryanah/.local/lib/python3.6/site-packages/dill/_dill.py", line 1439, in save_type
    StockPickler.save_global(pickler, obj, name=name)
  File "/usr/lib/python3.6/pickle.py", line 922, in save_global
    (obj, module_name, name))
_pickle.PicklingError: Can't pickle <class 'cobaya.log.logger_setup.<locals>.MyFormatter'>: it's not found as cobaya.log.logger_setup.<locals>.MyFormatter

-------------------------------------
--------------------------------------------------------------------------
MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD
with errorcode 1.

NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
You may or may not see output from other processes, depending on
exactly when Open MPI kills them.
--------------------------------------------------------------------------

Antony Lewis
Posts: 1936
Joined: September 23 2004
Affiliation: University of Sussex
Contact:

Re: cobaya error when running in parallel

Post by Antony Lewis » August 25 2021

Not sure, might be worth trying with a more up to date python version, as several packages are dropping 3.6 support?

Gabriela Marques
Posts: 1
Joined: August 26 2021
Affiliation: Florida State University

Re: cobaya error when running in parallel

Post by Gabriela Marques » August 26 2021

Hi,
I was having the exact same error as you using python 3.6 but after upgrade to python 3.8, I could run cobaya using mpi.
Specifically, I'm using intel and intelmpi modules loaded (I'm not sure if that will matches if your resources and if this is relevant to make this work but just saying in case you want to test with this configuration).

I hope that helps!

boryana hadzhiyska
Posts: 5
Joined: April 28 2021
Affiliation: Harvard

Re: cobaya error when running in parallel

Post by boryana hadzhiyska » November 21 2021

Thank you! That was indeed the problem -- switching to a different python/intel version fixed it.

Post Reply