• Call for Physical Sciences Papers
  • Science Sessions: The PNAS Podcast Program

Modeling and replicating statistical topology and evidence for CMB nonhomogeneity

  1. Pratyush Pranava
  1. aAndrew and Erna Viterbi Faculty of Electrical Engineering, Technion – Israel Institute of Technology, Haifa 32000, Israel
  1. Edited by Larry Wasserman, Carnegie Mellon University, Pittsburgh, PA, and approved August 29, 2017 (received for review April 25, 2017)

Significance

Under the general heading of “topological data analysis” (TDA), the recent adoption of topological methods for the analysis of large, complex, and high-dimensional data sets has established that the abstract concepts of algebraic topology provide powerful tools for data analysis. However, despite the successes of TDA, most applications have lacked formal statistical veracity, primarily due to difficulties in deriving distributional information about topological descriptors. We present an approach, Replicating Statistical Topology (RST), which takes the most basic descriptor of TDA, the persistence diagram, and, using models based on Gibbs distributions and Markov chain Monte Carlo, provides replications of it. These allow for formal statistical hypothesis testing, without requiring costly, or perhaps intrinsically unavailable, replications of the original data set.

Abstract

Under the banner of “big data,” the detection and classification of structure in extremely large, high-dimensional, data sets are two of the central statistical challenges of our times. Among the most intriguing new approaches to this challenge is “TDA,” or “topological data analysis,” one of the primary aims of which is providing nonmetric, but topologically informative, preanalyses of data which make later, more quantitative, analyses feasible. While TDA rests on strong mathematical foundations from topology, in applications, it has faced challenges due to difficulties in handling issues of statistical reliability and robustness, often leading to an inability to make scientific claims with verifiable levels of statistical confidence. We propose a methodology for the parametric representation, estimation, and replication of persistence diagrams, the main diagnostic tool of TDA. The power of the methodology lies in the fact that even if only one persistence diagram is available for analysis—the typical case for big data applications—the replications permit conventional statistical hypothesis testing. The methodology is conceptually simple and computationally practical, and provides a broadly effective statistical framework for persistence diagram TDA analysis. We demonstrate the basic ideas on a toy example, and the power of the parametric approach to TDA modeling in an analysis of cosmic microwave background (CMB) nonhomogeneity.

Footnotes

  • ?1To whom correspondence should be addressed. Email: radler{at}technion.ac.il.

Published under the PNAS license.

Online Impact

                                                1. 336531258 2018-02-17
                                                2. 6455421257 2018-02-17
                                                3. 5128821256 2018-02-17
                                                4. 4014601255 2018-02-17
                                                5. 9637141254 2018-02-17
                                                6. 6087041253 2018-02-17
                                                7. 6141561252 2018-02-17
                                                8. 16211251 2018-02-17
                                                9. 202981250 2018-02-17
                                                10. 1634281249 2018-02-17
                                                11. 2115681248 2018-02-17
                                                12. 8627591247 2018-02-17
                                                13. 1184961246 2018-02-17
                                                14. 9203941245 2018-02-17
                                                15. 4504061244 2018-02-16
                                                16. 5597191243 2018-02-16
                                                17. 5234981242 2018-02-16
                                                18. 6285841241 2018-02-16
                                                19. 3913011240 2018-02-16
                                                20. 5129741239 2018-02-16