Survival models with gene groups as covariates
Loading...
Date
2012-06-28
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
An important application of high-dimensional gene expression measurements is the
risk prediction and the interpretation of the variables in the resulting survival models.
A major problem in this context is the typically large number of genes compared to
the number of observations (individuals). Feature selection procedures can generate
predictive models with high prediction accuracy and at the same time low model
complexity. However, interpretability of the resulting models is still limited due to
little knowledge on many of the remaining selected genes. Thus, we summarize genes as
gene groups defined by the hierarchically structured Gene Ontology (GO) and include
these gene groups as covariates in the hazard regression models. Since expression
profiles within GO groups are often heterogeneous, we present a new method to obtain
subgroups with coherent patterns. We apply preclustering to genes within GO groups
according to the correlation of their gene expression measurements.
We compare Cox models for modeling disease free survival times of breast cancer
patients. Besides classical clinical covariates we consider genes, GO groups and
preclustered GO groups as additional genomic covariates. Survival models with
preclustered gene groups as covariates have improved prediction accuracy in long term
survival compared to models built only with single genes or GO groups. We also
provide an analysis of frequently chosen covariates and comparisons to models using
only clinical information.
The preclustering information enables a more detailed analysis of the biological meaning
of covariates selected in the final models. Compared to models built only with single
genes there is additional functional information contained in the GO annotation, and
compared to models using GO groups as covariates the preclustering yields coherent
representative gene expression profiles. For evaluation of fitted survival models, we
present prediction error curves revealing that models with preclustered gene groups
have improved prediction performance compared to models built with single genes or
GO groups.