GetDist: Saving the plot sample instead of an image
Posted: September 04 2021
Dear all,
I would like to know if its possible using GetDist, or matplotlib since its built on top of it, to save the plot sample instead of an image. By plot sample I mean the raw data that is then displayed to us when we show the plot. It should also be possible to latter combine several samples into one plot.
The reasoning behind this is that when using MCMC methods, I've been using the `emcee` Python package, the chains can be quite large. Without compression and using a HDF5 file to store the chain I get ≈ 71 MiB for 10.000 steps with 2 parameters. This seems to scale linearly at least with the number of steps, which means that in an MCMC run that takes hundred of thousands of steps, which mine often do, it will take a few GiB. Not only the size in disk but the IO also kills the performance. So, in order to avoid storing the chain, but because I want later to be able to show two or more different runs in the same plot, I would like to know if there is a way to store only the plot data, as if it were an image file, but the positions were indexed somehow, to allow for future manipulation.
Although this is more of a Python related question I decided to ask it here as there might be somebody with a different solution to storing and analyzing their MCMC chains which I am not aware, and I should be looking to do things differently instead of trying to save this data to disk.
Thanks in advance,
José Ferreira
I would like to know if its possible using GetDist, or matplotlib since its built on top of it, to save the plot sample instead of an image. By plot sample I mean the raw data that is then displayed to us when we show the plot. It should also be possible to latter combine several samples into one plot.
The reasoning behind this is that when using MCMC methods, I've been using the `emcee` Python package, the chains can be quite large. Without compression and using a HDF5 file to store the chain I get ≈ 71 MiB for 10.000 steps with 2 parameters. This seems to scale linearly at least with the number of steps, which means that in an MCMC run that takes hundred of thousands of steps, which mine often do, it will take a few GiB. Not only the size in disk but the IO also kills the performance. So, in order to avoid storing the chain, but because I want later to be able to show two or more different runs in the same plot, I would like to know if there is a way to store only the plot data, as if it were an image file, but the positions were indexed somehow, to allow for future manipulation.
Although this is more of a Python related question I decided to ask it here as there might be somebody with a different solution to storing and analyzing their MCMC chains which I am not aware, and I should be looking to do things differently instead of trying to save this data to disk.
Thanks in advance,
José Ferreira