Autor(en): Munteanu, Alexander
Titel: On large-scale probabilistic and statistical data analysis
Sprache (ISO): en
Zusammenfassung: In this manuscript we develop and apply modern algorithmic data reduction techniques to tackle scalability issues and enable statistical data analysis of massive data sets. Our algorithms follow a general scheme, where a reduction technique is applied to the large-scale data to obtain a small summary of sublinear size to which a classical algorithm is applied. The techniques for obtaining these summaries depend on the problem that we want to solve. The size of the summaries is usually parametrized by an approximation parameter, expressing the trade-off between efficiency and accuracy. In some cases the data can be reduced to a size that has no or only negligible dependency on the initial number of data items. However, for other problems it turns out that sublinear summaries do not exist in the worst case. In such situations, we exploit statistical or geometric relaxations to obtain useful sublinear summaries under certain mildness assumptions. We present, in particular, the data reduction methods called coresets and subspace embeddings, and several algorithmic techniques to construct these via random projections and sampling.
Schlagwörter: Data reduction
Regression
Random projections
Coresets
Schlagwörter (RSWK): Datenkompression
Regressionsanalyse
Dimensionsreduktion
URI: http://hdl.handle.net/2003/37116
http://dx.doi.org/10.17877/DE290R-19112
Erscheinungsdatum: 2018
Enthalten in den Sammlungen:LS 02 Komplexitätstheorie und Effiziente Algorithmen

Dateien zu dieser Ressource:
Datei Beschreibung GrößeFormat 
Dissertation_Munteanu.pdfDNB917.29 kBAdobe PDFÖffnen/Anzeigen


Diese Ressource ist urheberrechtlich geschützt.



Diese Ressource ist urheberrechtlich geschützt. rightsstatements.org