Scalable bayesian methods for large-scale data
| dc.contributor.advisor | Ickstadt, Katja | |
| dc.contributor.author | Ding, Zeyu | |
| dc.contributor.referee | Munteanu, Alexander | |
| dc.date.accepted | 2026-02-12 | |
| dc.date.accessioned | 2026-04-07T06:16:20Z | |
| dc.date.issued | 2026 | |
| dc.description.abstract | In the era of big data, traditional Bayesian inference methods face significant challenges in computational efficiency and scalability. This thesis presents a comprehensive framework addressing these challenges through theoretical innovations and practical implementations. We introduce a novel $p$-probit model incorporating $p$-generalized normal distributions, which offers enhanced flexibility in modeling tail behavior through an adaptive parameter $p$. To address computational challenges with large-scale datasets, we develop an efficient \emph{coreset}-based data reduction technique for the $p$-probit model, with theoretical guarantees based on the Wasserstein distance. Furthermore, we extend scalable inference to semi-parametric Multivariate Conditional Transformation Models (MCTMs). We propose a novel hybrid \emph{coreset} strategy that combines leverage score sampling with a geometric convex hull approximation. This approach effectively resolves the numerical instabilities of logarithmic terms in the likelihood, enabling efficient learning of complex dependence structures with rigorous error guarantees. This exploration of distribution metrics leads to our investigation of scalable computation methods for probability distribution distances, where we propose novel approximation approaches using sliced-Wasserstein distances and random Fourier features in Physics applications. These theoretical advances are implemented in two open-source software packages: an \texttt{R} package for the $p$-probit model and a \texttt{Julia} package for distribution metric computation. Our empirical results demonstrate significant improvements in both computational efficiency and statistical accuracy across various large-scale applications, contributing to both theoretical understanding and practical capabilities in modern Bayesian inference. | en |
| dc.identifier.uri | http://hdl.handle.net/2003/44803 | |
| dc.identifier.uri | http://dx.doi.org/10.17877/DE290R-26567 | |
| dc.language.iso | en | |
| dc.subject | Mathematische Statistik | de |
| dc.subject | Coreset | en |
| dc.subject | Distributional metrics | en |
| dc.subject | Data reduction | en |
| dc.subject.ddc | 310 | |
| dc.subject.rswk | Statistik | de |
| dc.subject.rswk | Metrik <Mathematik> | de |
| dc.subject.rswk | Data Science | de |
| dc.subject.rswk | Bayes-Verfahren | de |
| dc.title | Scalable bayesian methods for large-scale data | en |
| dc.title.alternative | Data reduction and efficient computation of distribution metrics | en |
| dc.type | Text | |
| dc.type.publicationtype | PhDThesis | |
| dcterms.accessRights | open access | |
| eldorado.dnb.deposit | true | |
| eldorado.secondarypublication | false |
Dateien
Originalbündel
1 - 1 von 1
Lade...
- Name:
- Dissertation_Ding.pdf
- Größe:
- 26.54 MB
- Format:
- Adobe Portable Document Format
- Beschreibung:
- DNB
Lizenzbündel
1 - 1 von 1
Lade...
- Name:
- license.txt
- Größe:
- 4.82 KB
- Format:
- Item-specific license agreed upon to submission
- Beschreibung:
