Ensemble learning with discrete classifiers on small devices

Buschjäger, Sebastian

Full metadata record

DC Field	Value	Language
dc.contributor.advisor	Morik, Katharina	-
dc.contributor.author	Buschjäger, Sebastian	-
dc.date.accessioned	2022-11-16T11:19:26Z	-
dc.date.available	2022-11-16T11:19:26Z	-
dc.date.issued	2022	-
dc.identifier.uri	http://hdl.handle.net/2003/41132	-
dc.identifier.uri	http://dx.doi.org/10.17877/DE290R-22979	-
dc.description.abstract	Machine learning has become an integral part of everyday life ranging from applications in AI-powered search queries to (partial) autonomous driving. Many of the advances in machine learning and its application have been possible due to increases in computation power, i.e., by reducing manufacturing sizes while maintaining or even increasing energy consumption. However, 2-3 nm manufacturing is within reach, making further miniaturization increasingly difficult while thermal design power limits are simultaneously reached, rendering entire parts of the chip useless for certain computational loads. In this thesis, we investigate discrete classifier ensembles as a resource-efficient alternative that can be deployed to small devices that only require small amounts of energy. Discrete classifiers are classifiers that can be applied -- and oftentimes also trained -- without the need for costly floating-point operations. Hence, they are ideally suited for deployment to small devices with limited resources. The disadvantage of discrete classifiers is that their predictive performance often lacks behind their floating-point siblings. Here, the combination of multiple discrete classifiers into an ensemble can help to improve the predictive performance while still having a manageable resource consumption. This thesis studies discrete classifier ensembles from a theoretical point of view, an algorithmic point of view, and a practical point of view. In the theoretical investigation, the bias-variance decomposition and the double-descent phenomenon are examined. The bias-variance decomposition of the mean-squared error is re-visited and generalized to an arbitrary twice-differentiable loss function, which serves as a guiding tool throughout the thesis. Similarly, the double-descent phenomenon is -- for the first time -- studied comprehensively in the context of tree ensembles and specifically random forests. Contrary to established literature, the experiments in this thesis indicate that there is no double-descent in random forests. While the training of ensembles is well-studied in literature, the deployment to small devices is often neglected. Additionally, the training of ensembles on small devices has not been considered much so far. Hence, the algorithmic part of this thesis focuses on the deployment of discrete classifiers and the training of ensembles on small devices. First, a novel combination of ensemble pruning (i.e., removing classifiers from the ensemble) and ensemble refinement (i.e., re-training of classifiers in the ensemble) is presented, which uses a novel proximal gradient descent algorithm to minimize a combined loss function. The resulting algorithm removes unnecessary classifiers from an already trained ensemble while improving the performance of the remaining classifiers at the same time. Second, this algorithm is extended to the more challenging setting of online learning in which the algorithm receives training examples one by one. The resulting shrub ensembles algorithm allows the training of ensembles in an online fashion while maintaining a strictly bounded memory consumption. It outperforms existing state-of-the-art algorithms under resource constraints and offers competitive performance in the general case. Last, this thesis studies the deployment of decision tree ensembles to small devices by optimizing their memory layout. The key insight here is that decision trees have a probabilistic inference time because different observations can take different paths from the root to a leaf. By estimating the probability of visiting a particular node in the tree, one can place it favorably in the memory to maximize the caching behavior and, thus, increase its performance without changing the model. Last, several real-world applications of tree ensembles and Binarized Neural Networks are presented.	en
dc.language.iso	en	de
dc.subject	Machine learning	en
dc.subject	Ensemble learning	en
dc.subject	Decision tree	en
dc.subject	Resource constraints	en
dc.subject	Small devices	en
dc.subject	Embedded systems	en
dc.subject	Model deployment	en
dc.subject.ddc	004	-
dc.title	Ensemble learning with discrete classifiers on small devices	en
dc.type	Text	de
dc.contributor.referee	Fürnkranz, Johannes	-
dc.date.accepted	2022-10-10	-
dc.type.publicationtype	doctoralThesis	de
dc.subject.rswk	Maschinelles Lernen	de
dc.subject.rswk	Entscheidungsbaum	de
dc.subject.rswk	Kleingerät	de
dc.subject.rswk	Ressourcen	de
dc.subject.rswk	Eingebettetes System	de
dcterms.accessRights	open access	-
eldorado.secondarypublication	false	de
Appears in Collections:	LS 08 Künstliche Intelligenz

Files in This Item:

File	Description	Size	Format
Dissertation.pdf	DNB	3.89 MB	Adobe PDF	View/Open

This item is protected by original copyright

View License

Show simple item record

This item is protected by original copyright rightsstatements.org