Bayesian mixtures for cluster analysis and flexible modeling of distributions

Fritsch, Arno

Bayesian mixtures for cluster analysis and flexible modeling of distributions

dc.contributor.advisor	Ickstadt, Katja
dc.contributor.author	Fritsch, Arno
dc.contributor.referee	Weihs, Claus
dc.date.accepted	2010-06-11
dc.date.accessioned	2010-07-02T13:36:58Z
dc.date.available	2010-07-02T13:36:58Z
dc.date.issued	2010-07-02
dc.description.abstract	Finite mixture models assume that a distribution is a combination of several parametric distributions. They offer a compromise between the interpretability of parametric models and the flexibility of nonparametric models. This thesis considers a Bayesian approach to these models, which has several advantages. For example, using only weak prior information, it can solve problems with unbounded likelihood functions, that can occur in mixture models. The Bayesian approach also allows an elegant extension of finite to (countable) infinite mixture models. Depending on the application, the components of mixture models can either be viewed as just a means to the flexible modeling of a distribution or as defining subgroups of a population with different parametric distributions. Regarding the former case consistency results for Bayesian mixtures are stated. An example concerning the flexible modeling of a random effects distribution in a logistic regression is also given. The application considers the goalkeeper's effect in saving a penalty. In the latter case mixture models can be used for clustering. Bayesian mixtures then allow the estimation of the number of clusters at the same time as the cluster-specific parameters. For cluster analysis the standard approach for fitting Bayesian mixtures, Markov Chain Monte Carlo (MCMC), unfortunately leads to inferential difficulties. The labels associated with the clusters can change during the MCMC run, a phenomenon called label-switching. The problem gets severe, if the number of clusters is allowed to vary. Existing methods to deal with label-switching and a varying number of components are reviewed and new approaches are proposed for both situations. The first consists of a variant of the relabeling algorithm of Stephens (2000). The variant is more general, as it applies to drawn clusterings and not drawn parameter values. Therefore it does not depend on the specific form of the component distributions. The second approach is based on pairwise posterior probabilities and is an improvement of a commonly used loss function due to Binder (1978). Minimization of this loss is shown to be equivalent to maximizing the posterior expected Rand index with the true clustering. As the adjusted Rand index is preferable to the raw index, the maximization of the posterior expected adjusted Rand is proposed. The new approaches are compared to the previous methods on simulated and real data. The real data used for cluster analysis are two gene expression data sets and Fisher's iris data.	en
dc.identifier.uri	http://hdl.handle.net/2003/27292
dc.identifier.uri	http://dx.doi.org/10.17877/DE290R-14740
dc.identifier.urn	urn:nbn:de:hbz:290-2003/27292-3
dc.language.iso	en	en
dc.subject	Finite mixture	en
dc.subject	Dirichlet process	en
dc.subject	Bayesian statistics	en
dc.subject	Cluster analysis	en
dc.subject	MCMC	en
dc.subject	Adjusted rand index	en
dc.subject	Goalkeepers performance	en
dc.subject	Gene expression data	en
dc.subject.ddc	310
dc.title	Bayesian mixtures for cluster analysis and flexible modeling of distributions	en
dc.type	Text	de
dc.type.publicationtype	doctoralThesis	de
dcterms.accessRights	open access

Files

Original bundle

Now showing 1 - 2 of 2

Name:: Diss_Arno_Fritsch.pdf
Size:: 1 MB
Format:: Adobe Portable Document Format
Description:: DNB

Download

Name:: Abstract_Diss_Fritsch.pdf
Size:: 50.94 KB
Format:: Adobe Portable Document Format

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 1.85 KB
Format:: Item-specific license agreed upon to submission
Description:

Download

Collections

Lehrstuhl Mathematische Statistik und biometrische Anwendungen