GetDist: Saving the plot sample instead of an image

Use of Cobaya. camb, CLASS, cosmomc, compilers, etc.
Post Reply
José Ferreira
Posts: 2
Joined: August 27 2021
Affiliation: Faculdade de Ciências da Universidade de Lisboa

GetDist: Saving the plot sample instead of an image

Post by José Ferreira » September 04 2021

Dear all,

I would like to know if its possible using GetDist, or matplotlib since its built on top of it, to save the plot sample instead of an image. By plot sample I mean the raw data that is then displayed to us when we show the plot. It should also be possible to latter combine several samples into one plot.

The reasoning behind this is that when using MCMC methods, I've been using the `emcee` Python package, the chains can be quite large. Without compression and using a HDF5 file to store the chain I get ≈ 71 MiB for 10.000 steps with 2 parameters. This seems to scale linearly at least with the number of steps, which means that in an MCMC run that takes hundred of thousands of steps, which mine often do, it will take a few GiB. Not only the size in disk but the IO also kills the performance. So, in order to avoid storing the chain, but because I want later to be able to show two or more different runs in the same plot, I would like to know if there is a way to store only the plot data, as if it were an image file, but the positions were indexed somehow, to allow for future manipulation.

Although this is more of a Python related question I decided to ask it here as there might be somebody with a different solution to storing and analyzing their MCMC chains which I am not aware, and I should be looking to do things differently instead of trying to save this data to disk.

Thanks in advance,
José Ferreira

Antony Lewis
Posts: 1720
Joined: September 23 2004
Affiliation: University of Sussex
Contact:

Re: GetDist: Saving the plot sample instead of an image

Post by Antony Lewis » September 17 2021

May be slower, but still not very slow for large chains. But you can probably also thin the chains with very little loss of information, saving the thinned samples.

GetDist also does have functions to return Density2D objects, etc. if that's what you mean.

José Ferreira
Posts: 2
Joined: August 27 2021
Affiliation: Faculdade de Ciências da Universidade de Lisboa

Re: GetDist: Saving the plot sample instead of an image

Post by José Ferreira » September 19 2021

May be slower, but still not very slow for large chains. But you can probably also thin the chains with very little loss of information, saving the thinned samples.
I've thought about it, but how to get a thin parameter that removes only redundant information, or even minimal information, isn't so obvious to me.
GetDist also does have functions to return Density2D objects, etc. if that's what you mean.
This was more what I meant yes, my idea was to somehow save that density2D object to a file instead of saving the entire chain.
It wouldn't be as flexible as having the entire chain available but for most cases I've met so far, which is running a few MCMC and comparing them together, is more than enough.

However I realized that tar actually does a pretty good job at compressing the chains.
So currently my Python script runs emcee on a target model with a target data, writes the chain to /tmp which is mounted in tmpfs, and then tars it to disk once the MCMC is complete.
This is actually working out quite nicely so far, it isn't very creative, but it reduces the chain size by ≈ 1/5 and execution time decreases when the model and the data are faster than IO, which is quite often for me so far.

As I said before, I'm still curious as to how one would deal with these computational issues, which I am sure that most physicists met so far, as there are countless of papers in cosmology who performed MCMC methods, however these small details aren't naturally explained in the papers and most times the source code doesn't seem to be available and there's only a reference to the software used.

Thank you for your time.

Post Reply