Analyzing consistency and statistical inference in Random Forest models

Ramosaj, Burim

Analyzing consistency and statistical inference in Random Forest models

dc.contributor.advisor	Pauly, Markus
dc.contributor.author	Ramosaj, Burim
dc.contributor.referee	Rahnenführer, Jörg
dc.date.accepted	2020-07-10
dc.date.accessioned	2020-09-28T05:58:56Z
dc.date.available	2020-09-28T05:58:56Z
dc.date.issued	2020
dc.description.abstract	This thesis pays special attention to the Random Forest method as an ensemble learning technique using bagging and feature sub-spacing covering three main aspects: its behavior as a prediction tool under the presence of missing values, its role in uncertainty quantification and variable screening. In the first part, we focus on the performance of Random Forest models in prediction and missing value imputations while opposing it to other learning methods such as boosting procedures. Therein, we aim to discover potential modifications of Breiman’s original Random Forest in order to increase imputation performance of Random Forest based models using the normalized root mean squared error and the proportion of false classification as evaluation measures. Our results indicated the usage of a mixed model involving the stochastic gradient boosting and a Random Forest based on kernel sampling. Regarding inferential statistics after imputation, we were interested if Random Forest methods do deliver correct statistical inference procedures, especially in repeated measures ANOVA. Our results indicated a heavy inflation of type-I-error rates for testing no mean time effects. We could furthermore show that the between imputation variance according to Rubin’s multiple imputation rule vanishes almost surely, when repeatedly applying missForest as an imputation scheme. This has the consequence of less uncertainty quantification during imputation leading to scenarios where imputations are not proper. Closely related to the issue of valid statistical inference is the general topic of uncertainty quantification. Therein, we focused on consistency properties of several residual variance estimators in regression models and could deliver theoretical guarantees that Random Forest based estimators are consistent. Beside prediction, Random Forest is often used as a screening method for selecting informative features in potentially high-dimensional settings. Focusing on regression problems, we could deliver a formal proof that the Random Forest based internal permutation importance measure delivers on average correct results, i.e. is (asymptotically) unbiased. Simulation studies and real-life data examples from different fields support our findings in this thesis.	en
dc.identifier.uri	http://hdl.handle.net/2003/39552
dc.identifier.uri	http://dx.doi.org/10.17877/DE290R-21444
dc.language.iso	en	de
dc.subject	Random Forest	en
dc.subject	Consistency	en
dc.subject	Statistical inference	en
dc.subject	Uncertainty quantification	en
dc.subject	Missing value imputation	en
dc.subject	Prediction intervals	en
dc.subject	Industrial application	en
dc.subject.ddc	310
dc.subject.rswk	Partielle Information	de
dc.subject.rswk	Automatische Klassifikation	de
dc.subject.rswk	Regressionsmodell	de
dc.subject.rswk	Random Forest	de
dc.title	Analyzing consistency and statistical inference in Random Forest models	en
dc.type	Text	de
dc.type.publicationtype	doctoralThesis	de
dcterms.accessRights	open access
eldorado.secondarypublication	false	de

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Dissertation_BurimRamosaj.pdf
Size:: 5.76 MB
Format:: Adobe Portable Document Format
Description:: DNB

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 4.85 KB
Format:: Item-specific license agreed upon to submission
Description:

Download

Collections

Institut für Mathematische Statistik und industrielle Anwendungen