Statistical analyses of tree-based ensembles

dc.contributor.advisorPauly, Markus
dc.contributor.authorSchmid, Lena
dc.contributor.refereeAndreas, Groll
dc.date.accepted2024-02-26
dc.date.accessioned2024-08-01T10:21:30Z
dc.date.available2024-08-01T10:21:30Z
dc.date.issued2023
dc.description.abstractThis thesis focuses on the study of tree-based ensemble learners, with particular attention to their behavior as a prediction tool for multivariate or time-dependent outcomes and their implementation for efficient execution. In particular, well-known examples such as Random Forest and Extra Trees are often used for the prediction of univariate outcomes. However, for multivariate outcomes, the question arises whether it is better to fit univariate models separately or to follow a multivariate approach directly. Our results show that the advantages of the multivariate approach can be observed in scenarios where there is a high degree of dependency between the components of the results. In particular, significant differences in the performance of the different Random Forest approaches are observed. In terms of predictive performance for time series, we are interested in whether the use of tree-based methods can offer advantages over traditional time series methods such as ARIMA, particularly in the area of data-driven logistics, where the abundance of complex and noisy data - from supply chain transactions to customer interactions - requires accurate and timely insights. Our results indicate the effectiveness of machine learning methods, especially in scenarios where data generation processes are layered with a certain degree of further complexity. Motivated by the trend towards increasingly autonomous and decentralized processes on resource-constrained devices in logistics, we explore strategies to optimize the execution time of machine learning algorithms for inference, focusing on Random Forests and decision trees. In addition to the simple approach of enforcing shorter paths through decision trees, we also investigate hardware-oriented implementations. One optimization is to adapt the memory layout to prefer paths with higher probability, which is particularly beneficial in cases with uneven splits within tree nodes. We present a regularization method that reduces path lengths by rewarding uneven probability distributions during decision tree training. This method proves to be particularly valuable for a memory architecture-aware implementation, resulting in a substantial reduction in execution time with minimal degradation in accuracy, especially for large datasets or datasets concerning binary classification tasks. Simulation studies and real-life data examples from different fields support our findings in this thesis.de
dc.identifier.urihttp://hdl.handle.net/2003/42626
dc.identifier.urihttp://dx.doi.org/10.17877/DE290R-24462
dc.language.isoende
dc.subjectMachine learningde
dc.subjectTree-based modelsde
dc.subjectRandom forestde
dc.subject.ddc310
dc.subject.rswkMaschinelles Lernende
dc.subject.rswkEntscheidungsbaumde
dc.titleStatistical analyses of tree-based ensemblesde
dc.typeTextde
dc.type.publicationtypePhDThesisde
dcterms.accessRightsopen access
eldorado.secondarypublicationfalsede

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Dissertation_Schmid.pdf
Size:
3.14 MB
Format:
Adobe Portable Document Format
Description:
DNB
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
4.85 KB
Format:
Item-specific license agreed upon to submission
Description: