Finite Bayesian mixture models with applications in spatial cluster analysis and bioinformatics

Schäfer, Martin

Authors:	Schäfer, Martin
Title:	Finite Bayesian mixture models with applications in spatial cluster analysis and bioinformatics
Language (ISO):	en
Abstract:	In many statistical applications, one encounters populations that form homogenous subgroups regarding one or several characteristics. Across the subgroups, however, heterogeneity may often be found. Mixture distributions are a natural means to model data from such applications. This PhD thesis is based on two projects that focus on such applications. In the first project, spatial nanoscale clusters formed by Ras proteins in the cell membrane are investigated. Such clusters play a crucial role in intracellular communication and are thus of interest in cancer research. In this case, the subgroups are clustered and non-clustered proteins. In the second project, epigenomic data obtained from sequencing experiments are integrated with another genomic or epigenomic input, aiming, e.g., to detect genes that contribute to the development of cancer. Here, the subgroups are defined by a) genes presenting congruent (epi)genomic aberrations in both considered variables, b) genes presenting incongruent aberrations, and c) genes lacking aberrations in at least one of the variables. Employing a Bayesian framework, objects are classified in both projects by fitting finite univariate mixture distributions with a small fixed number of components to values from a score summarizing relevant information about the research question. Such mixture distributions have favorable characteristics in terms of interpretation and present little sensitivity to label switching in Markov Chain Monte Carlo analyses. Mixtures of gamma distributions are considered for Ras proteins, while mixtures of normal and exponential or gamma distributions are a focus for the bioinformatic analysis. In the latter, classification is the primary goal, while in the Ras protein application, estimating key parameters of the spatial clustering is of more interest. The results of both projects are presented in this thesis. For both applications, the methods have been implemented in software and their performance is compared with competing approaches on experimental as well as on simulated data. To warrant an appropriate simulation of Ras protein patterns, a new cluster point process model called the double Matérn cluster process is developed and described in this thesis.
Subject Headings:	Bayesian statistics Finite mixture model Spatial cluster analysis Matérn cluster process Nearest neighbor distances Gene transcription ChIP-seq data Integrative analysis
URI:	http://hdl.handle.net/2003/34392 http://dx.doi.org/10.17877/DE290R-16464
Issue Date:	2015
Appears in Collections:	Lehrstuhl Mathematische Statistik und biometrische Anwendungen

Files in This Item:

File	Description	Size	Format
Dissertation.pdf	DNB	12.07 MB	Adobe PDF	View/Open

This item is protected by original copyright

View License

Show full item record

This item is protected by original copyright rightsstatements.org