Scalable bayesian methods for large-scale data

dc.contributor.advisorIckstadt, Katja
dc.contributor.authorDing, Zeyu
dc.contributor.refereeMunteanu, Alexander
dc.date.accepted2026-02-12
dc.date.accessioned2026-04-07T06:16:20Z
dc.date.issued2026
dc.description.abstractIn the era of big data, traditional Bayesian inference methods face significant challenges in computational efficiency and scalability. This thesis presents a comprehensive framework addressing these challenges through theoretical innovations and practical implementations. We introduce a novel $p$-probit model incorporating $p$-generalized normal distributions, which offers enhanced flexibility in modeling tail behavior through an adaptive parameter $p$. To address computational challenges with large-scale datasets, we develop an efficient \emph{coreset}-based data reduction technique for the $p$-probit model, with theoretical guarantees based on the Wasserstein distance. Furthermore, we extend scalable inference to semi-parametric Multivariate Conditional Transformation Models (MCTMs). We propose a novel hybrid \emph{coreset} strategy that combines leverage score sampling with a geometric convex hull approximation. This approach effectively resolves the numerical instabilities of logarithmic terms in the likelihood, enabling efficient learning of complex dependence structures with rigorous error guarantees. This exploration of distribution metrics leads to our investigation of scalable computation methods for probability distribution distances, where we propose novel approximation approaches using sliced-Wasserstein distances and random Fourier features in Physics applications. These theoretical advances are implemented in two open-source software packages: an \texttt{R} package for the $p$-probit model and a \texttt{Julia} package for distribution metric computation. Our empirical results demonstrate significant improvements in both computational efficiency and statistical accuracy across various large-scale applications, contributing to both theoretical understanding and practical capabilities in modern Bayesian inference.en
dc.identifier.urihttp://hdl.handle.net/2003/44803
dc.identifier.urihttp://dx.doi.org/10.17877/DE290R-26567
dc.language.isoen
dc.subjectMathematische Statistikde
dc.subjectCoreseten
dc.subjectDistributional metricsen
dc.subjectData reductionen
dc.subject.ddc310
dc.subject.rswkStatistikde
dc.subject.rswkMetrik <Mathematik>de
dc.subject.rswkData Sciencede
dc.subject.rswkBayes-Verfahrende
dc.titleScalable bayesian methods for large-scale dataen
dc.title.alternativeData reduction and efficient computation of distribution metricsen
dc.typeText
dc.type.publicationtypePhDThesis
dcterms.accessRightsopen access
eldorado.dnb.deposittrue
eldorado.secondarypublicationfalse

Dateien

Originalbündel

Gerade angezeigt 1 - 1 von 1
Lade...
Vorschaubild
Name:
Dissertation_Ding.pdf
Größe:
26.54 MB
Format:
Adobe Portable Document Format
Beschreibung:
DNB

Lizenzbündel

Gerade angezeigt 1 - 1 von 1
Lade...
Vorschaubild
Name:
license.txt
Größe:
4.82 KB
Format:
Item-specific license agreed upon to submission
Beschreibung: