Analysis of high-dimensional data in the context of intrinsic dimensionality
dc.contributor.advisor | Schubert, Erich | |
dc.contributor.author | Thordsen, Erik | |
dc.contributor.referee | Aumüller, Martin | |
dc.date.accepted | 2025-05-12 | |
dc.date.accessioned | 2025-07-29T12:20:06Z | |
dc.date.available | 2025-07-29T12:20:06Z | |
dc.date.issued | 2025 | |
dc.description.abstract | Many modern machine learning applications and data analysis make use of or produce large amounts of high-dimensional data. While it is commonly known, that the performance of many algorithms degrades with increasing dimensionality both in terms of speed and quality, the performance on real-world datasets is often times much better than expected. In fact, real-world datasets tend to occupy only a lower-dimensional manifold in the available observed space, either due to the underlying generative process or the sparsity of the data. To describe that phenomenon, the concept of Intrinsic Dimensionality (ID) was introduced, which describes the minimum number of latent variables to produce the observed manifold. In a pursuit to estimate the ID of non-linear manifolds, small localities of the data are often considered, giving rise to the concept of Local Intrinsic Dimensionality (LID) which aside from estimating a global ID also allows for a spatially resolved analysis of the data. Since the LID eludes direct observation beyond two or three dimensions, it has to be estimated from the data. In this thesis, we explore multiple new approaches to estimate the LID of data in Euclidean spaces, investigate their analytical and empirical properties, and compare them to existing approaches both qualitatively and quantitatively. One of these approaches, the Angle-Based Intrinsic Dimensionality (ABID) estimator, has very useful theoretical properties. We therefore provide an exemplary derivation of ABID to vector fields as a potential control mechanism in algorithms such as Gradient Descent as a showcase of how LID estimation can be useful beyond point clouds. We also investigate if and how LID estimation approaches can be applied to non-Euclidean spaces. While the theory of LID estimation in non-Euclidean spaces remains largely unresolved, we provide a visionary prospect on the future of the field and provide anecdotical evidence for possible future applications of LID. | |
dc.identifier.uri | http://hdl.handle.net/2003/43821 | |
dc.identifier.uri | http://dx.doi.org/10.17877/DE290R-25595 | |
dc.language.iso | en | |
dc.subject | High-dimensional data | |
dc.subject | Intrinsic dimensionality | |
dc.subject | Geometry | |
dc.subject.ddc | 004 | |
dc.subject.rswk | Hochdimensionale Daten | de |
dc.subject.rswk | Geometrie | de |
dc.title | Analysis of high-dimensional data in the context of intrinsic dimensionality | en |
dc.type | Text | |
dc.type.publicationtype | PhDThesis | |
dcterms.accessRights | open access | |
eldorado.secondarypublication | false |