Authors: Munteanu, Alexander
Title: On large-scale probabilistic and statistical data analysis
Language (ISO): en
Abstract: In this manuscript we develop and apply modern algorithmic data reduction techniques to tackle scalability issues and enable statistical data analysis of massive data sets. Our algorithms follow a general scheme, where a reduction technique is applied to the large-scale data to obtain a small summary of sublinear size to which a classical algorithm is applied. The techniques for obtaining these summaries depend on the problem that we want to solve. The size of the summaries is usually parametrized by an approximation parameter, expressing the trade-off between efficiency and accuracy. In some cases the data can be reduced to a size that has no or only negligible dependency on the initial number of data items. However, for other problems it turns out that sublinear summaries do not exist in the worst case. In such situations, we exploit statistical or geometric relaxations to obtain useful sublinear summaries under certain mildness assumptions. We present, in particular, the data reduction methods called coresets and subspace embeddings, and several algorithmic techniques to construct these via random projections and sampling.
Subject Headings: Data reduction
Regression
Random projections
Coresets
Subject Headings (RSWK): Datenkompression
Regressionsanalyse
Dimensionsreduktion
URI: http://hdl.handle.net/2003/37116
http://dx.doi.org/10.17877/DE290R-19112
Issue Date: 2018
Appears in Collections:LS 02 Komplexitätstheorie und Effiziente Algorithmen

Files in This Item:
File Description SizeFormat 
Dissertation_Munteanu.pdfDNB917.29 kBAdobe PDFView/Open


This item is protected by original copyright



This item is protected by original copyright rightsstatements.org