Estimating the functional form of the effect of a continuous covariate on survival time
Loading...
Date
2002-04-03
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Universität Dortmund
Abstract
Bei der Analyse vieler medizinischer Studien geht man davon aus, dass der Effekt stetiger Kovariablen auf die Zielgröße linear ist. Diese Annahme ist aber nicht immer zutreffend. In dieser Dissertation werden verschiedene Methoden zur Schätzung der funktionalen Form des Effektes einer stetigen Kovariable im Rahmen des Cox'schen proportionalen Hazardmodells untersucht. Dabei werden datenunabhängige und datenabhängige Methoden verwendet. Datenunabhängigkeit bedeutet hier, dass die generelle funktionale Form vorgegeben ist (z.B. bei restricted cubic splines, bei der Kategorisierung der stetigen Kovariable an fest vorgegebenen Cutpoints). Aus den Daten werden lediglich die Parameter dieser Funktionen geschätzt. Bei den datenabhängigen Methoden (z.B. der Kategorisierung anhand Ergebnis-orientierter datenabhängiger Cutpoints, der Modellierung des Effektes durch fractional polynomials) wird zusätzlich die funktionale Form aus den Daten bestimmt. Dieser Prozess der Modellbildung kann zu einem gravierenden `Overfitting' führen. Anhand einer fest vorgegeben Funktion läßt sich dagegen der wahre Effekt möglicherweise nicht korrekt beschreiben. Um eine bessere Schätzung für die Risikofunktion zu erhalten und die negativen Auswirkungen der Modellbildung zu reduzieren habe ich alle Methoden erweitert, indem ich das von Breiman (1996) vorgeschlagene bootstrap aggregating (bagging) auf das vorliegende Problem übertragen habe. Bei diesem Ansatz wird die Risikofunktion in einer Menge von Bootstrap-Stichproben geschätzt, wobei jeweils die gleiche Methode angewendet wird wie in den Originaldaten. Durch Mittelung aller in den Bootstrap-Stichproben geschätzten Risikofunktionen erhält man dann einen neuen aggregierten Schätzer für die Risikofunktion. Zur Illustration aller Methoden wird der Effekt der stetigen Kovariable Alter hinsichtlich der rezidivfreien Überlebenszeit von Patientinnen mit Mammakarzinom modelliert. Beurteilt werden die Methoden anhand einer Simulationsstudie, bei der typische Risikofunktionen zugrundegelegt wurden. Es konnte gezeigt werden, dass bagging besonders dann zu einer Verbesserung der geschätzen Risikofunktion führt, wenn der Prozess der Modellbildung in den Originaldaten einen instabilen Schätzer liefert. Weitere Aspekte wie z.B. die Anwendung verschiedener Fehlermaße zur Beurteilung der Ergebnisse oder die Verwendung verschiedener Ansätze, Risikofunktionen vergleichbar zu machen, werden ebenfalls diskutiert.
In the analysis of many medical studies the effect of a continuous covariate on an outcome variable is assumed to be linear. However, this assumption is not appropriate in all situations. In this thesis several methods for estimating the functional form of the effect of one continuous covariate are investigated in the framework of the Cox proportional hazards model. In particular, I consider data-independent and data-dependent methods: Using data-independent methods (e.g. restricted cubic splines, the categorisation of the continuous covariate by fix cutpoints) the general functional form is given in advance, the data are used for estimating the parameters of these functions only. With data-dependent methods (e.g. the categorisation by data-driven cutpoints, modeling the effect by fractional polynomials) the functional form is estimated from the data, too. This process of model building can lead to a drastic `overfitting' whereas a specific prespecified functional form may be not suitable to describe the true effect correctly. In order to obtain more appropriate risk functions and to correct for bias caused by model building I extend all methods by adapting a method called bootstrap aggregating (bagging) proposed by Breiman (1996). With this approach the risk function is estimated in a set of bootstrap samples using the same method as in the original data. An aggregated risk function is then obtained by averaging the functions over all bootstrap samples. All methods are illustrated by modeling the effect of the continuous covariate age with respect to recurrence free survival in patients with breast carcinoma. Considering typical risk functions a simulation study was performed in order to assess all methods. It could be shown that bagging is able to improve the estimation of risk functions, if the model selection process led to an unstabe risk function in the original data. Other topics concerning e.g. the use of different error measures for the assessment of the results or the comparison of methods for making risk functions comparable are also discussed. Breiman L (1996): Bagging predictors. Machine Learning, 26:123-140. 1
In the analysis of many medical studies the effect of a continuous covariate on an outcome variable is assumed to be linear. However, this assumption is not appropriate in all situations. In this thesis several methods for estimating the functional form of the effect of one continuous covariate are investigated in the framework of the Cox proportional hazards model. In particular, I consider data-independent and data-dependent methods: Using data-independent methods (e.g. restricted cubic splines, the categorisation of the continuous covariate by fix cutpoints) the general functional form is given in advance, the data are used for estimating the parameters of these functions only. With data-dependent methods (e.g. the categorisation by data-driven cutpoints, modeling the effect by fractional polynomials) the functional form is estimated from the data, too. This process of model building can lead to a drastic `overfitting' whereas a specific prespecified functional form may be not suitable to describe the true effect correctly. In order to obtain more appropriate risk functions and to correct for bias caused by model building I extend all methods by adapting a method called bootstrap aggregating (bagging) proposed by Breiman (1996). With this approach the risk function is estimated in a set of bootstrap samples using the same method as in the original data. An aggregated risk function is then obtained by averaging the functions over all bootstrap samples. All methods are illustrated by modeling the effect of the continuous covariate age with respect to recurrence free survival in patients with breast carcinoma. Considering typical risk functions a simulation study was performed in order to assess all methods. It could be shown that bagging is able to improve the estimation of risk functions, if the model selection process led to an unstabe risk function in the original data. Other topics concerning e.g. the use of different error measures for the assessment of the results or the comparison of methods for making risk functions comparable are also discussed. Breiman L (1996): Bagging predictors. Machine Learning, 26:123-140. 1
Description
Table of contents
Keywords
Proportionales Hazardmodell, Risikofunktion, Modellbildung, proportional hazards model, risk function, model building, bootstrap aggregating