Streaming statistical models via Merge & Reduce

dc.contributor.authorGeppert, Leo N.
dc.contributor.authorIckstadt, Katja
dc.contributor.authorMunteanu, Alexander
dc.contributor.authorSohler, Christian
dc.date.accessioned2021-03-22T15:05:32Z
dc.date.available2021-03-22T15:05:32Z
dc.date.issued2020-06-12
dc.description.abstractMerge & Reduce is a general algorithmic scheme in the theory of data structures. Its main purpose is to transform static data structures—that support only queries—into dynamic data structures—that allow insertions of new elements—with as little overhead as possible. This can be used to turn classic offline algorithms for summarizing and analyzing data into streaming algorithms. We transfer these ideas to the setting of statistical data analysis in streaming environments. Our approach is conceptually different from previous settings where Merge & Reduce has been employed. Instead of summarizing the data, we combine the Merge & Reduce framework directly with statistical models. This enables performing computationally demanding data analysis tasks on massive data sets. The computations are divided into small tractable batches whose size is independent of the total number of observations n. The results are combined in a structured way at the cost of a bounded O(logn) factor in their memory requirements. It is only necessary, though nontrivial, to choose an appropriate statistical model and design merge and reduce operations on a casewise basis for the specific type of model. We illustrate our Merge & Reduce schemes on simulated and real-world data employing (Bayesian) linear regression models, Gaussian mixture models and generalized linear models.en
dc.identifier.urihttp://hdl.handle.net/2003/40091
dc.identifier.urihttp://dx.doi.org/10.17877/DE290R-21968
dc.language.isoende
dc.relation.ispartofseriesInt J Data Sci Anal;10
dc.rights.urihttps://creativecommons.org/licenses/by/4.0/
dc.subjectMergeable statistical modelsen
dc.subjectLarge dataen
dc.subjectStreamingen
dc.subjectDistributeden
dc.subjectRegression analysisen
dc.subject.ddc310
dc.titleStreaming statistical models via Merge & Reduceen
dc.typeTextde
dc.type.publicationtypearticleen
dcterms.accessRightsopen access
eldorado.secondarypublicationtruede
eldorado.secondarypublication.primarycitationGeppert, L.N., Ickstadt, K., Munteanu, A. et al. Streaming statistical models via Merge & Reduce. Int J Data Sci Anal 10, 331–347 (2020).de
eldorado.secondarypublication.primaryidentifierhttps://doi.org/10.1007/s41060-020-00226-0de

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Geppert2020_Article_StreamingStatisticalModelsViaM.pdf
Size:
1.17 MB
Format:
Adobe Portable Document Format
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
4.85 KB
Format:
Item-specific license agreed upon to submission
Description: