Analysis of high-dimensional data in the context of intrinsic dimensionality

dc.contributor.advisorSchubert, Erich
dc.contributor.authorThordsen, Erik
dc.contributor.refereeAumüller, Martin
dc.date.accepted2025-05-12
dc.date.accessioned2025-07-29T12:20:06Z
dc.date.available2025-07-29T12:20:06Z
dc.date.issued2025
dc.description.abstractMany modern machine learning applications and data analysis make use of or produce large amounts of high-dimensional data. While it is commonly known, that the performance of many algorithms degrades with increasing dimensionality both in terms of speed and quality, the performance on real-world datasets is often times much better than expected. In fact, real-world datasets tend to occupy only a lower-dimensional manifold in the available observed space, either due to the underlying generative process or the sparsity of the data. To describe that phenomenon, the concept of Intrinsic Dimensionality (ID) was introduced, which describes the minimum number of latent variables to produce the observed manifold. In a pursuit to estimate the ID of non-linear manifolds, small localities of the data are often considered, giving rise to the concept of Local Intrinsic Dimensionality (LID) which aside from estimating a global ID also allows for a spatially resolved analysis of the data. Since the LID eludes direct observation beyond two or three dimensions, it has to be estimated from the data. In this thesis, we explore multiple new approaches to estimate the LID of data in Euclidean spaces, investigate their analytical and empirical properties, and compare them to existing approaches both qualitatively and quantitatively. One of these approaches, the Angle-Based Intrinsic Dimensionality (ABID) estimator, has very useful theoretical properties. We therefore provide an exemplary derivation of ABID to vector fields as a potential control mechanism in algorithms such as Gradient Descent as a showcase of how LID estimation can be useful beyond point clouds. We also investigate if and how LID estimation approaches can be applied to non-Euclidean spaces. While the theory of LID estimation in non-Euclidean spaces remains largely unresolved, we provide a visionary prospect on the future of the field and provide anecdotical evidence for possible future applications of LID.
dc.identifier.urihttp://hdl.handle.net/2003/43821
dc.identifier.urihttp://dx.doi.org/10.17877/DE290R-25595
dc.language.isoen
dc.subjectHigh-dimensional data
dc.subjectIntrinsic dimensionality
dc.subjectGeometry
dc.subject.ddc004
dc.subject.rswkHochdimensionale Datende
dc.subject.rswkGeometriede
dc.titleAnalysis of high-dimensional data in the context of intrinsic dimensionalityen
dc.typeText
dc.type.publicationtypePhDThesis
dcterms.accessRightsopen access
eldorado.secondarypublicationfalse

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Dissertation_Thordsen.pdf
Size:
18.58 MB
Format:
Adobe Portable Document Format
Description:
DNB
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
4.82 KB
Format:
Item-specific license agreed upon to submission
Description: