Scalable bayesian methods for large-scale data

Lade...
Vorschaubild

Datum

Autor:innen

Zeitschriftentitel

ISSN der Zeitschrift

Bandtitel

Verlag

Sonstige Titel

Data reduction and efficient computation of distribution metrics

Zusammenfassung

In the era of big data, traditional Bayesian inference methods face significant challenges in computational efficiency and scalability. This thesis presents a comprehensive framework addressing these challenges through theoretical innovations and practical implementations. We introduce a novel $p$-probit model incorporating $p$-generalized normal distributions, which offers enhanced flexibility in modeling tail behavior through an adaptive parameter $p$. To address computational challenges with large-scale datasets, we develop an efficient \emph{coreset}-based data reduction technique for the $p$-probit model, with theoretical guarantees based on the Wasserstein distance. Furthermore, we extend scalable inference to semi-parametric Multivariate Conditional Transformation Models (MCTMs). We propose a novel hybrid \emph{coreset} strategy that combines leverage score sampling with a geometric convex hull approximation. This approach effectively resolves the numerical instabilities of logarithmic terms in the likelihood, enabling efficient learning of complex dependence structures with rigorous error guarantees. This exploration of distribution metrics leads to our investigation of scalable computation methods for probability distribution distances, where we propose novel approximation approaches using sliced-Wasserstein distances and random Fourier features in Physics applications. These theoretical advances are implemented in two open-source software packages: an \texttt{R} package for the $p$-probit model and a \texttt{Julia} package for distribution metric computation. Our empirical results demonstrate significant improvements in both computational efficiency and statistical accuracy across various large-scale applications, contributing to both theoretical understanding and practical capabilities in modern Bayesian inference.

Beschreibung

Inhaltsverzeichnis

Schlagwörter

Mathematische Statistik, Coreset, Distributional metrics, Data reduction

Schlagwörter nach RSWK

Statistik, Metrik <Mathematik>, Data Science, Bayes-Verfahren

Zitierform

Befürwortung

Review

Ergänzt durch

Referenziert von