Bayesian mixtures for cluster analysis and flexible modeling of distributions

Fritsch, Arno

Langanzeige der Metadaten

DC Element	Wert	Sprache
dc.contributor.advisor	Ickstadt, Katja	-
dc.contributor.author	Fritsch, Arno	-
dc.date.accessioned	2010-07-02T13:36:58Z	-
dc.date.available	2010-07-02T13:36:58Z	-
dc.date.issued	2010-07-02	-
dc.identifier.uri	http://hdl.handle.net/2003/27292	-
dc.identifier.uri	http://dx.doi.org/10.17877/DE290R-14740	-
dc.description.abstract	Finite mixture models assume that a distribution is a combination of several parametric distributions. They offer a compromise between the interpretability of parametric models and the flexibility of nonparametric models. This thesis considers a Bayesian approach to these models, which has several advantages. For example, using only weak prior information, it can solve problems with unbounded likelihood functions, that can occur in mixture models. The Bayesian approach also allows an elegant extension of finite to (countable) infinite mixture models. Depending on the application, the components of mixture models can either be viewed as just a means to the flexible modeling of a distribution or as defining subgroups of a population with different parametric distributions. Regarding the former case consistency results for Bayesian mixtures are stated. An example concerning the flexible modeling of a random effects distribution in a logistic regression is also given. The application considers the goalkeeper's effect in saving a penalty. In the latter case mixture models can be used for clustering. Bayesian mixtures then allow the estimation of the number of clusters at the same time as the cluster-specific parameters. For cluster analysis the standard approach for fitting Bayesian mixtures, Markov Chain Monte Carlo (MCMC), unfortunately leads to inferential difficulties. The labels associated with the clusters can change during the MCMC run, a phenomenon called label-switching. The problem gets severe, if the number of clusters is allowed to vary. Existing methods to deal with label-switching and a varying number of components are reviewed and new approaches are proposed for both situations. The first consists of a variant of the relabeling algorithm of Stephens (2000). The variant is more general, as it applies to drawn clusterings and not drawn parameter values. Therefore it does not depend on the specific form of the component distributions. The second approach is based on pairwise posterior probabilities and is an improvement of a commonly used loss function due to Binder (1978). Minimization of this loss is shown to be equivalent to maximizing the posterior expected Rand index with the true clustering. As the adjusted Rand index is preferable to the raw index, the maximization of the posterior expected adjusted Rand is proposed. The new approaches are compared to the previous methods on simulated and real data. The real data used for cluster analysis are two gene expression data sets and Fisher's iris data.	en
dc.language.iso	en	en
dc.subject	Finite mixture	en
dc.subject	Dirichlet process	en
dc.subject	Bayesian statistics	en
dc.subject	Cluster analysis	en
dc.subject	MCMC	en
dc.subject	Adjusted rand index	en
dc.subject	Goalkeepers performance	en
dc.subject	Gene expression data	en
dc.subject.ddc	310	-
dc.title	Bayesian mixtures for cluster analysis and flexible modeling of distributions	en
dc.type	Text	de
dc.contributor.referee	Weihs, Claus	-
dc.date.accepted	2010-06-11	-
dc.type.publicationtype	doctoralThesis	de
dc.identifier.urn	urn:nbn:de:hbz:290-2003/27292-3	-
dcterms.accessRights	open access	-
Enthalten in den Sammlungen:	Lehrstuhl Mathematische Statistik und biometrische Anwendungen

Dateien zu dieser Ressource:

Datei	Beschreibung	Größe	Format
Diss_Arno_Fritsch.pdf	DNB	1.03 MB	Adobe PDF	Öffnen/Anzeigen
Abstract_Diss_Fritsch.pdf		50.94 kB	Adobe PDF	Öffnen/Anzeigen

Diese Ressource ist urheberrechtlich geschützt.

Lizenzbestimmungen ansehen

Zur Kurzanzeige

Diese Ressource ist urheberrechtlich geschützt. rightsstatements.org