Authors: Grinberg, Marianna
Title: Statistical analysis of concentration-dependent high-dimensional gene expression data
Language (ISO): en
Abstract: Understanding the behavior of genes as a response to external influences, such as radiation or chemicals, on a fundamental level is one of the great challenges of modern biology. In specific, the investigation of chemically-induced toxicity is of major importance since it is crucial for the identification of biomarkers and the development of drugs. One approach to accomplish this objective utilizes toxicogenomics which is based upon the combination of toxicology and the analysis of genome-wide gene expression data. This research field uses the technology of microarrays which allows the simultaneous measurement of the expression of tens of thousands of genes. The thesis focuses on three topics which often arise in the context of gene expression analysis: First, the identification and characterization of genes associated with certain modes of action, second, the detection of biomarker candidates in the in vitro system for the prediction of toxicity in vivo and third, the identification of critical concentrations at which a pre-specified effect level is exceeded. To better understand the key principles of transcriptome changes, a genome-wide gene expression analysis is performed. For this, the Open TG-GATEs database is used which contains data for more than 150 compounds applied to cells from rats (liver, kidney) and to human hepatocytes using different concentration and time sets. Special attention is drawn to statistical challenges arising from working with large data sets. Besides the curse of dimensionality (many more variables than observations) and the small number of replicates, the statistical analysis is faced with additional complexity including batch effects and implausible concentration progressions. To address this issue in a general manner, a pipeline involving several curation steps and a systematic strategy for the identification of consensus genes is proposed. Regarding, the third topic of this thesis, a model-based approach is applied to gene expression data to detect concentrations with critical changes in gene expression. Typically, only measured concentrations are considered as potential candidates for alert concentrations. Based on the assumption that the response dependency of the dose can be described by a sigmoidal function, a four-parameter log-logistic (4pLL) model is fitted to the data. Two alert concentrations referring to critical compound concentrations are estimated from the fitted average trend and compared with those of the classical naïve approach where for each measured concentration separately it is tested if the critical effect level is exceeded. The results are evaluated in a simulation study and in a real dose-response study. The thesis serves to gain a better understanding to whether a model-based approach yields more accurate results in terms of predicting critical concentrations than the classical one which is often used for the analysis of large-scale toxicogenomics data sets.
Subject Headings: Gene expression analysis
Fitting sigmoidal dose-response curves
Identifying critical concentrations
Subject Headings (RSWK): Genexpression
Issue Date: 2017
Appears in Collections:Fachgebiet Statistische Methoden in der Genetik und Ökologie

Files in This Item:
File Description SizeFormat 
Dissertation_Grinberg_pdfA.pdfDNB20.53 MBAdobe PDFView/Open

This item is protected by original copyright

All resources in the repository are protected by copyright.