### [1003.0879] The Data Big Bang and the Expanding Digital Uni

Posted:

**March 16 2010**This paper discusses the need for astronomers to draw on techniques from the fields of computer science, applied math, statistics, and artificial intelligence in order to manage increasingly massive and complex astronomical data sets. Its aim is to introduce astronomers to some of the existing and under-development techniques already out there and encourage interdisciplinary collaboration to tackle these issues.

One key idea is dimensional reduction, i.e., finding ways to represent high dimensional data in fewer dimensions while retaining as much information as possible. For example, they discuss techniques for combining images at different wavelengths into a single information-rich image. They suggest that such processing could be a useful pre-processing step before applying existing astronomical image analysis tools such as SExtractor.

Principal Component Analysis (PCA) is discussed as one example of dimensional reduction, but they point out that it has a major drawback in that it is a linear technique, and so will not provide an accurate description of the data if the data points lie on a nonlinear manifold in the high-dimensional parameter space. They discuss a number of nonlinear techniques, including one the authors have developed, discussed in more detail here.

The paper reads more like a laundry list of ideas and techniques than a practical guide for an astronomer to apply them, but hopefully more practical information can be found in the references. I know the LSST folks have been thinking about these kinds of things, and I'm sure some of these techniques will be important for other next-generation cosmology projects as well.

Some potentially useful links mentioned in the paper:

Center for Astrostatistics at Penn State

Bayesian Inference for the Physical Sciences (BIPS)

International Computational Astrostatistics (InCA) Group at Carnegie Mellon University

One key idea is dimensional reduction, i.e., finding ways to represent high dimensional data in fewer dimensions while retaining as much information as possible. For example, they discuss techniques for combining images at different wavelengths into a single information-rich image. They suggest that such processing could be a useful pre-processing step before applying existing astronomical image analysis tools such as SExtractor.

Principal Component Analysis (PCA) is discussed as one example of dimensional reduction, but they point out that it has a major drawback in that it is a linear technique, and so will not provide an accurate description of the data if the data points lie on a nonlinear manifold in the high-dimensional parameter space. They discuss a number of nonlinear techniques, including one the authors have developed, discussed in more detail here.

The paper reads more like a laundry list of ideas and techniques than a practical guide for an astronomer to apply them, but hopefully more practical information can be found in the references. I know the LSST folks have been thinking about these kinds of things, and I'm sure some of these techniques will be important for other next-generation cosmology projects as well.

Some potentially useful links mentioned in the paper:

Center for Astrostatistics at Penn State

Bayesian Inference for the Physical Sciences (BIPS)

International Computational Astrostatistics (InCA) Group at Carnegie Mellon University