Analysis of high-dimensional data in the context of intrinsic dimensionality

Lade...
Vorschaubild

Datum

Autor:innen

Zeitschriftentitel

ISSN der Zeitschrift

Bandtitel

Verlag

Sonstige Titel

Zusammenfassung

Many modern machine learning applications and data analysis make use of or produce large amounts of high-dimensional data. While it is commonly known, that the performance of many algorithms degrades with increasing dimensionality both in terms of speed and quality, the performance on real-world datasets is often times much better than expected. In fact, real-world datasets tend to occupy only a lower-dimensional manifold in the available observed space, either due to the underlying generative process or the sparsity of the data. To describe that phenomenon, the concept of Intrinsic Dimensionality (ID) was introduced, which describes the minimum number of latent variables to produce the observed manifold. In a pursuit to estimate the ID of non-linear manifolds, small localities of the data are often considered, giving rise to the concept of Local Intrinsic Dimensionality (LID) which aside from estimating a global ID also allows for a spatially resolved analysis of the data. Since the LID eludes direct observation beyond two or three dimensions, it has to be estimated from the data. In this thesis, we explore multiple new approaches to estimate the LID of data in Euclidean spaces, investigate their analytical and empirical properties, and compare them to existing approaches both qualitatively and quantitatively. One of these approaches, the Angle-Based Intrinsic Dimensionality (ABID) estimator, has very useful theoretical properties. We therefore provide an exemplary derivation of ABID to vector fields as a potential control mechanism in algorithms such as Gradient Descent as a showcase of how LID estimation can be useful beyond point clouds. We also investigate if and how LID estimation approaches can be applied to non-Euclidean spaces. While the theory of LID estimation in non-Euclidean spaces remains largely unresolved, we provide a visionary prospect on the future of the field and provide anecdotical evidence for possible future applications of LID.

Beschreibung

Inhaltsverzeichnis

Schlagwörter

High-dimensional data, Intrinsic dimensionality, Geometry

Schlagwörter nach RSWK

Hochdimensionale Daten, Geometrie

Zitierform

Befürwortung

Review

Ergänzt durch

Referenziert von