This paper discusses the need for astronomers to draw on techniques from the fields of computer science, applied math, statistics, and artificial intelligence in order to manage increasingly massive and complex astronomical data sets. Its aim is to introduce astronomers to some of the existing and under-development techniques already out there and encourage interdisciplinary collaboration to tackle these issues.
One key idea is dimensional reduction, i.e., finding ways to represent high dimensional data in fewer dimensions while retaining as much information as possible. For example, they discuss techniques for combining images at different wavelengths into a single information-rich image. They suggest that such processing could be a useful pre-processing step before applying existing astronomical image analysis tools such as SExtractor.
Principal Component Analysis (PCA) is discussed as one example of dimensional reduction, but they point out that it has a major drawback in that it is a linear technique, and so will not provide an accurate description of the data if the data points lie on a nonlinear manifold in the high-dimensional parameter space. They discuss a number of nonlinear techniques, including one the authors have developed, discussed in more detail here.
The paper reads more like a laundry list of ideas and techniques than a practical guide for an astronomer to apply them, but hopefully more practical information can be found in the references. I know the LSST folks have been thinking about these kinds of things, and I'm sure some of these techniques will be important for other next-generation cosmology projects as well.
Some potentially useful links mentioned in the paper:
Center for Astrostatistics at Penn State
Bayesian Inference for the Physical Sciences (BIPS)
International Computational Astrostatistics (InCA) Group at Carnegie Mellon University
[1003.0879] The Data Big Bang and the Expanding Digital Universe: High-Dimensional, Complex and Massive Data Sets in an Inflationary Epoch
|Authors:||Meyer Z. Pesenson (1), Isaac Z. Pesenson (2), Bruce McCollum (1) ((1) California Institute of Technology, (2) Temple University)|
|Abstract:||Recent and forthcoming advances in instrumentation, and giant new surveys, are creating astronomical data sets that are not amenable to the methods of analysis familiar to astronomers. Traditional methods are often inadequate not merely because of the size in bytes of the data sets, but also because of the complexity of modern data sets. Mathematical limitations of familiar algorithms and techniques in dealing with such data sets create a critical need for new paradigms for the representation, analysis and scientific visualization (as opposed to illustrative visualization) of heterogeneous, multiresolution data across application domains. Some of the problems presented by the new data sets have been addressed by other disciplines such as applied mathematics, statistics and machine learning and have been utilized by other sciences such as space-based geosciences. Unfortunately, valuable results pertaining to these problems are mostly to be found only in publications outside of astronomy. Here we offer brief overviews of a number of concepts, techniques and developments, some "old" and some new. These are generally unknown to most of the astronomical community, but are vital to the analysis and visualization of complex datasets and images. In order for astronomers to take advantage of the richness and complexity of the new era of data, and to be able to identify, adopt, and apply new solutions, the astronomical community needs a certain degree of awareness and understanding of the new concepts. One of the goals of this paper is to help bridge the gap between applied mathematics, artificial intelligence and computer science on the one side and astronomy on the other.|
|[PDF] [PS] [BibTex] [Bookmark]|
Discussion related to specific recent arXiv papers
1 post • Page 1 of 1