Authors: Madjar, Katrin
Title: Survival models with selection of genomic covariates in heterogeneous cancer studies
Language (ISO): en
Abstract: Building a risk prediction model for a specific subgroup of patients based on high-dimensional molecular measurements such as gene expression data is an important current field of biostatistical research. Major objectives in modeling high-dimensional data are good prediction performance and finding a subset of covariates that are truly relevant to the outcome (here: time-to-event endpoint). The latter requires variable selection to obtain a sparse, interpretable model solution. In this thesis, one further objective in modeling is taking into account heterogeneity in data due to known subgroups of patients that may differ in their relationship between genomic covariates and survival outcome. We consider multiple cancer studies as subgroups, however, our approaches can be applied to any other subgroups, for example, defined by clinical covariates. We aim at providing a separate prediction model for each subgroup that allows the identification of common as well as subgroup-specific effects and has improved prediction accuracy over standard approaches. Standard subgroup analysis includes only patients of the subgroup of interest and may lead to a loss of power when sample size is small, whereas standard combined analysis simply pools patients of all subgroups and may suffer from biased results and averaging of subgroup-specific effects. To overcome these drawbacks, we propose two different statistical models that allow sharing information between subgroups to increase power when this is supported by data. One approach is a classical frequentist Cox proportional hazards model with a lasso penalty for variable selection and a weighted version of the Cox partial likelihood that includes patients of all subgroups but assigns them individual weights based on their subgroup affiliation. Patients who fit well to the subgroup of interest receive higher weights in the subgroup-specific model. The other approach is a novel Bayesian Cox model that uses a stochastic search variable selection prior with latent indicators of variable inclusion. We assume a sparse graphical model that links genes within subgroups and the same genes across different subgroups. This graph structure is not known a priori and inferred simultaneously with the important variables of each subgroup. Both approaches are evaluated through extensive simulations and applied to real lung cancer studies. Simulation results demonstrate that our proposed models can achieve improved prediction and variable selection accuracy over standard subgroup models when sample size is low. As expected, the standard combined model only identifies common effects but fails to detect subgroup-specific effects.
Subject Headings: Cox proportional hazards model
Subgroup analysis
Weighted regression
Bayesian variable selection
Gaussian graphical model
Gene expression
High-dimensional data
Subject Headings (RSWK): Cox-Regressionsmodell
Hochdimensionale Daten
Issue Date: 2018
Appears in Collections:Statistische Methoden in der Genetik und Chemometrie

Files in This Item:
File Description SizeFormat 
Dissertation_Madjar.pdfDNB21.24 MBAdobe PDFView/Open

This item is protected by original copyright

All resources in the repository are protected by copyright.