Statistical analysis of genotype and gene expression data

Schwender, Holger

Statistical analysis of genotype and gene expression data

Files

diss_schwender.pdf (1.89 MB)

Date

2007-02-26T14:15:15Z

Authors

Schwender, Holger

Abstract

A common and important goal in cancer research is the identification of genetic markers such as genes or genetic variations that enable to determine if a person has a particular type of cancer, or lead to a higher risk of developing cancer. In recent years, many biotechnologies for measuring these markers have been developed. The most prominent examples are microarrays that can be used to, e.g., measure the expression levels of tens of thousands of genes simultaneously. The most widely used type of microarrays is the Affymetrix GeneChip on which each gene is represented by eleven pairs of probes. The corresponding probe intensities have to be preprocessed, i.e. summarized to one expression value per gene, before variable selection and classification methods can be applied to the gene expression data. This thesis is based on two projects: The goals of the first project are to identify the preprocessing method for Affymetrix microarrays that leads to the most efficient data reduction, and to provide a software enabling to apply this procedure to the data from studies comprising hundreds of Affymetrix GeneChips. The results of this project are presented in this thesis. The second project is concerned with SNPs (Single Nucleotide Polymorphisms), i.e. variations at a single base-pair position in the genome. While a vast number of papers on the analysis of gene expression data have been published, only a few variable selection and classification methods dealing with the specific needs of the analysis of SNP data have been proposed. One of the exceptions is logic regression. In this thesis, it is shown how approaches for the analysis of gene expression data can be adapted to SNP data, and a procedure based on a bagging version of logic regression is proposed that enables the detection of SNP interactions explanatory for a higher cancer risk. Furthermore, two measures for quantifying the importance of each of these interactions for prediction are presented, and compared with existing measures.

Keywords

Microarray, Single nucleotide polymorphism, SNP, Variable selection, Classification, Preprocessing, Cancer risk

URI

http://hdl.handle.net/2003/23306
http://dx.doi.org/10.17877/DE290R-8430

Collections

Lehrstuhl Mathematische Statistik und biometrische Anwendungen

Full item page

Statistical analysis of genotype and gene expression data

Files

Date

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Description

Table of contents

Keywords

Citation

URI

Collections