Analyzing consistency and statistical inference in Random Forest models

dc.contributor.advisorPauly, Markus
dc.contributor.authorRamosaj, Burim
dc.contributor.refereeRahnenführer, Jörg
dc.date.accepted2020-07-10
dc.date.accessioned2020-09-28T05:58:56Z
dc.date.available2020-09-28T05:58:56Z
dc.date.issued2020
dc.description.abstractThis thesis pays special attention to the Random Forest method as an ensemble learning technique using bagging and feature sub-spacing covering three main aspects: its behavior as a prediction tool under the presence of missing values, its role in uncertainty quantification and variable screening. In the first part, we focus on the performance of Random Forest models in prediction and missing value imputations while opposing it to other learning methods such as boosting procedures. Therein, we aim to discover potential modifications of Breiman’s original Random Forest in order to increase imputation performance of Random Forest based models using the normalized root mean squared error and the proportion of false classification as evaluation measures. Our results indicated the usage of a mixed model involving the stochastic gradient boosting and a Random Forest based on kernel sampling. Regarding inferential statistics after imputation, we were interested if Random Forest methods do deliver correct statistical inference procedures, especially in repeated measures ANOVA. Our results indicated a heavy inflation of type-I-error rates for testing no mean time effects. We could furthermore show that the between imputation variance according to Rubin’s multiple imputation rule vanishes almost surely, when repeatedly applying missForest as an imputation scheme. This has the consequence of less uncertainty quantification during imputation leading to scenarios where imputations are not proper. Closely related to the issue of valid statistical inference is the general topic of uncertainty quantification. Therein, we focused on consistency properties of several residual variance estimators in regression models and could deliver theoretical guarantees that Random Forest based estimators are consistent. Beside prediction, Random Forest is often used as a screening method for selecting informative features in potentially high-dimensional settings. Focusing on regression problems, we could deliver a formal proof that the Random Forest based internal permutation importance measure delivers on average correct results, i.e. is (asymptotically) unbiased. Simulation studies and real-life data examples from different fields support our findings in this thesis.en
dc.identifier.urihttp://hdl.handle.net/2003/39552
dc.identifier.urihttp://dx.doi.org/10.17877/DE290R-21444
dc.language.isoende
dc.subjectRandom Foresten
dc.subjectConsistencyen
dc.subjectStatistical inferenceen
dc.subjectUncertainty quantificationen
dc.subjectMissing value imputationen
dc.subjectPrediction intervalsen
dc.subjectIndustrial applicationen
dc.subject.ddc310
dc.subject.rswkPartielle Informationde
dc.subject.rswkAutomatische Klassifikationde
dc.subject.rswkRegressionsmodellde
dc.subject.rswkRandom Forestde
dc.titleAnalyzing consistency and statistical inference in Random Forest modelsen
dc.typeTextde
dc.type.publicationtypedoctoralThesisde
dcterms.accessRightsopen access
eldorado.secondarypublicationfalsede

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Dissertation_BurimRamosaj.pdf
Size:
5.76 MB
Format:
Adobe Portable Document Format
Description:
DNB
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
4.85 KB
Format:
Item-specific license agreed upon to submission
Description: