Statistical analyses of tree-based ensembles

Schmid, Lena

Authors:	Schmid, Lena
Title:	Statistical analyses of tree-based ensembles
Language (ISO):	en
Abstract:	This thesis focuses on the study of tree-based ensemble learners, with particular attention to their behavior as a prediction tool for multivariate or time-dependent outcomes and their implementation for efficient execution. In particular, well-known examples such as Random Forest and Extra Trees are often used for the prediction of univariate outcomes. However, for multivariate outcomes, the question arises whether it is better to fit univariate models separately or to follow a multivariate approach directly. Our results show that the advantages of the multivariate approach can be observed in scenarios where there is a high degree of dependency between the components of the results. In particular, significant differences in the performance of the different Random Forest approaches are observed. In terms of predictive performance for time series, we are interested in whether the use of tree-based methods can offer advantages over traditional time series methods such as ARIMA, particularly in the area of data-driven logistics, where the abundance of complex and noisy data - from supply chain transactions to customer interactions - requires accurate and timely insights. Our results indicate the effectiveness of machine learning methods, especially in scenarios where data generation processes are layered with a certain degree of further complexity. Motivated by the trend towards increasingly autonomous and decentralized processes on resource-constrained devices in logistics, we explore strategies to optimize the execution time of machine learning algorithms for inference, focusing on Random Forests and decision trees. In addition to the simple approach of enforcing shorter paths through decision trees, we also investigate hardware-oriented implementations. One optimization is to adapt the memory layout to prefer paths with higher probability, which is particularly beneficial in cases with uneven splits within tree nodes. We present a regularization method that reduces path lengths by rewarding uneven probability distributions during decision tree training. This method proves to be particularly valuable for a memory architecture-aware implementation, resulting in a substantial reduction in execution time with minimal degradation in accuracy, especially for large datasets or datasets concerning binary classification tasks. Simulation studies and real-life data examples from different fields support our findings in this thesis.
Subject Headings:	Machine learning Tree-based models Random forest
Subject Headings (RSWK):	Maschinelles Lernen Entscheidungsbaum
URI:	http://hdl.handle.net/2003/42626 http://dx.doi.org/10.17877/DE290R-24462
Issue Date:	2023
Appears in Collections:	Institut für Mathematische Statistik und industrielle Anwendungen

Files in This Item:

File	Description	Size	Format
Dissertation_Schmid.pdf	DNB	3.22 MB	Adobe PDF	View/Open

This item is protected by original copyright

View License

Show full item record

This item is protected by original copyright rightsstatements.org