Topology, Metrics and Data: Computational Methods and Applications.
Abstract
The eld of topological data analysis (TDA) combines computational geometry and
algebraic topology notions for analyzing data. This thesis presents methods and e cient
algorithms that extend the TDA toolset.
After introducing the needed background information about Euler characteristic curves
and persistent homology, the former objects are extended to bi-dimensional ltrations.
The result are Euler characteristic surfaces, which capture insights about data over a
pair of parameters. Moreover, algorithms to compute these objects are described for
both image and point data.
Persistent homology in `1 metric is also studied. It is proven that in this setting Alpha
and Cech ltration are not equivalent in general. On the other hand, two new ltrations
| Alpha
ag and Minibox | are de ned and proven equivalent to Cech ltrations
in homological dimensions zero and one. Algorithms for nding Minibox edges are
described, and Minibox ltrations are empirically shown to speed up the computation
of Cech persistence diagrams with computational experiments.
Then a new family of summary functions of persistence diagrams is de ned, which is
related to persistence landscapes. These are called cumulative landscapes and are used
to vectorize the information contained in persistence diagrams. In particular, discretizations
of these functions and their Fourier coe cients are used to obtain feature vectors
that can be applied in supervised classi cation problems. The e ectiveness of these
feature vectors for the classi cation of data is compared against vectors obtained using
persistence landscapes on two open-source datasets.
Finally, a novel method is described for the analysis of high-dimensional genomics data.
Optimized metrics are de ned on genomic vectors making use of a loss function. These
are used in combination with a distance-based classi cation method, showing good performance
compared to standard machine learning algorithms. Moreover, the structure
of the given optimized metrics helps identify coordinates of the genomic vectors, which
are most important for the classi cation task under study.
Authors
Beltramo., Gabriele.Collections
- Theses [4209]