Wartungsarbeiten: Am 13.04..2026 zwischen 10:30 und 11:30 Uhr kommt es zu Unterbrechungen. Bitte stellen Sie sich entsprechend darauf ein. Maintenance: at 2026-04-13 the system will experience outages from 10.30 a.m. until 11.30 a.m. Please plan accordingly.

Bayesian and frequentist regression approaches for very large data sets

dc.contributor.advisorIckstadt, Katja
dc.contributor.authorGeppert, Leo Nikolaus
dc.contributor.refereeGroll, Andreas
dc.date.accepted2018-11-16
dc.date.accessioned2019-03-19T08:27:15Z
dc.date.available2019-03-19T08:27:15Z
dc.date.issued2018
dc.description.abstractThis thesis is concerned with the analysis of frequentist and Bayesian regression models for data sets with a very large number of observations. Such large data sets pose a challenge when conducting regression analysis, because of the memory required (mainly for frequentist regression models) and the running time of the analysis (mainly for Bayesian regression models). I present two different approaches that can be employed in this setting. The first approach is based on random projections and reduces the number of observations to manageable level as a first step before the regression analysis. The reduced number of observations depends on the number of variables in the data set and the desired goodness of the approximation. It is, however, independent of the number of observations in the original data set, making it especially useful for very large data sets. Theoretical guarantees for Bayesian linear regression are presented, which extend known guarantees for the frequentist case. The fundamental theorem covers Bayesian linear regression with arbitrary normal distributions or non-informative uniform distributions as prior distributions. I evaluate how close the posterior distributions of the original model and the reduced data set are for this theoretically covered case as well as for extensions towards hierarchical models and models using q-generalised normal distributions as prior. The second approach presents a transfer of the Merge & Reduce-principle from data structures to regression models. In Computer Science, Merge & Reduce is employed in order to enable the use of static data structures in a streaming setting. Here, I present three possibilities of employing Merge & Reduce directly on regression models. This enables sequential or parallel analysis of subsets of the data set. The partial results are then combined in a way that recovers the regression model on the full data set well. This approach is suitable for a wide range of regression models. I evaluate the performance on simulated and real world data sets using linear and Poisson regression models. Both approaches are able to recover regression models on the original data set well. They thus offer scalable versions of frequentist or Bayesian regression analysis for linear regression as well as extensions to generalised linear models, hierarchical models, and q-generalised normal distributions as prior distribution. Application on data streams or in distributed settings is also possible. Both approaches can be combined with multiple algorithms for frequentist or Bayesian regression analysis.en
dc.identifier.urihttp://hdl.handle.net/2003/37946
dc.identifier.urihttp://dx.doi.org/10.17877/DE290R-19931
dc.language.isoende
dc.subjectRegression analysisen
dc.subjectVery large data setsen
dc.subjectRandom projectionsen
dc.subjectMerge & reduceen
dc.subjectData reductionen
dc.subject.ddc310
dc.subject.rswkRegressionsanalysede
dc.subject.rswkMassendatende
dc.subject.rswkDatenkompressionde
dc.subject.rswkDimensionsreduktionde
dc.titleBayesian and frequentist regression approaches for very large data setsen
dc.typeTexten
dc.type.publicationtypedoctoralThesisde
dcterms.accessRightsopen access
eldorado.dnb.deposittruede
eldorado.secondarypublicationfalsede

Dateien

Originalbündel

Gerade angezeigt 1 - 1 von 1
Lade...
Vorschaubild
Name:
Dissertation Leo Geppert Belegexemplar.pdf
Größe:
1.86 MB
Format:
Adobe Portable Document Format
Beschreibung:
DNB

Lizenzbündel

Gerade angezeigt 1 - 1 von 1
Lade...
Vorschaubild
Name:
license.txt
Größe:
4.85 KB
Format:
Item-specific license agreed upon to submission
Beschreibung: