Compilation-based explainability of tree ensembles

Loading...
Thumbnail Image

Date

2025

Authors

Murtovi, Alnis

Journal Title

Journal ISSN

Volume Title

Publisher

Alternative Title(s)

Abstract

Machine learning, particularly deep learning, has achieved remarkable success in areas such as image recognition, natural language processing, and speech recognition. However, for structured or tabular data, tree ensemble approaches, such as random forests and gradient boosted trees, often outperform deep learning-based approaches. Although they perform strongly, tree ensembles are in general considered to be black boxes, because the complexity of combining multiple decision trees makes it difficult to trace the reasoning behind individual predictions. Recent advancements in Explainable Artificial Intelligence have presented heuristic post-hoc explanation methods like LIME and SHAP. Nevertheless, these methods often rely on a model's input-output behavior and therefore only approximate how it internally arrives at its predictions. As a result, there is an urgent need for efficient and precise approaches to explaining ensembles, especially in domains where safety or fairness is critical. To tackle the interpretability gap, this thesis explores compilation-based techniques that transform an entire tree ensemble into a single, semantically equivalent structure such as a directed acyclic graph. This single graph representation reveals the ensemble's underlying logic, making it amenable to formal analysis. Once compiled, the model's internal logic becomes more transparent, allowing efficient generation of formal explanations, as well as support for verification tasks such as pre- and postcondition checks and model equivalence checking. One significant advantage is that, after the one-time cost of building the unified representation, subsequent explanations can be generated efficiently. This makes the proposed solutions particularly well-suited for real-time or interactive settings where many explanations are requested in sequence. The main focus of this thesis is efficiency and scalability. Existing compilation-based approaches can be very expensive on large ensembles. To address this, the thesis introduces novel compilation algorithms and optimizations, significantly reducing transformation time and memory usage while maintaining exact equivalence with the original tree ensemble. The experimental results show over an order of magnitude speedup in model compilation and multiple orders of magnitude in explanation generation compared to state-of-the-art solver-based approaches. Furthermore, the thesis introduces a user-friendly, web-based tool (Forest GUMP) that allows non-experts to train, visualize, verify, and explain tree ensembles interactively. Overall, these contributions advance the field of explainable AI by delivering efficient, formally grounded, and practical solutions for tree ensemble interpretability and explainability.

Description

Table of contents

Keywords

Explainable AI, Random forests, Gradient boosted trees, Tree esembles, Formal explainabilty

Subjects based on RSWK

Erklärbare künstliche Intelligenz, Random Forest, Boosting

Citation