Towards formal explainability: Faithful distillation of deep neural networks into interpretable surrogate models

Schlüter, Maximilian

Towards formal explainability: Faithful distillation of deep neural networks into interpretable surrogate models

dc.contributor.advisor	Steffen, Bernhard
dc.contributor.author	Schlüter, Maximilian
dc.contributor.referee	Jansen, Nils
dc.date.accepted	2025-04-04
dc.date.accessioned	2025-07-16T05:45:38Z
dc.date.available	2025-07-16T05:45:38Z
dc.date.issued	2025
dc.description.abstract	The goal of this thesis is to make the semantic function of trained deep neural networks accessible in a compact and efficient data structure based on formal principles. Deep neural networks are today the predominant machine learning model based on their remarkable performance over the last two decades. From the first breakthroughs in computer vision with AlexNet to today's sophisticated large language models for natural language processing, deep neural networks have become the state-of-the-art in machine learning. One key factor for this success is their ability to autonomously learn from data without human guidance, enabling end-to-end optimization. On the other hand, the lack of human involvement makes the internal structure of neural networks hard to understand, as it is missing structure and design. Besides their large number of learnable parameters, three key factors are identified that make these learned intermediate representations so difficult to understand: they are distributed, non-linear, and sub-symbolic. This thesis proposes a new post-hoc approach for explaining deep neural networks based on faithful surrogate models. Through a systematic and property-preserving decomposition of piecewise linear neural networks into their linear regions, the internal structure of neural networks is compiled into a new surrogate model. In this way, the typical representation of DNNs based on their dataflow, which is optimized for execution speed on graphics cards, is converted into a representation focusing on controlflow, which is more suitable for formal analysis. Consequently, the new representation is free of the distributed and non-linear internal representations. The surrogate model provides explanatory information from which different types of explanations can be derived, such as outcome explanations, class characterizations, and model explanations. At the core of this approach stands an optimized data structure, a binary decision tree, that combines ideas from Algebraic Decision Trees, Binary Space Partitioning Trees, and classic program optimization. By placing function composition at the center, these trees enable a modular approach to faithful distillation that is easily extensible and simplifies reasoning. Through optimizations, such as infeasible path elimination, redundancies in the tree are identified and pruned. As the distilled tree mirrors the network's behavior, it can be used to analyze its semantic properties, such as fairness or robustness. As a result of their formal grounding, these trees integrate well with mathematical notions. Furthermore, based on two-dimensional slices, it is possible to visualize the actual decision boundaries of a neural network, setting an ideal ground for exploring its behavior using intuition.	en
dc.identifier.uri	http://hdl.handle.net/2003/43796
dc.identifier.uri	http://dx.doi.org/10.17877/DE290R-25570
dc.language.iso	en
dc.subject	Deep neural networks	en
dc.subject	Model distillation	en
dc.subject	Descision Trees	en
dc.subject	Interpretable surrogate model	en
dc.subject	Activation pattern decompostition	en
dc.subject	Symbolic execution	en
dc.subject	Rule extraction	en
dc.subject	Input space patititon	en
dc.subject	Continuous Piece-wise linear	en
dc.subject.ddc	004
dc.subject.rswk	Tiefes neuronales Netz	de
dc.subject.rswk	Entscheidungsbaum	de
dc.subject.rswk	Symbolische Ausführung	de
dc.subject.rswk	Wissensextraktion	de
dc.subject.rswk	Stückweise lineare Funktion	de
dc.title	Towards formal explainability: Faithful distillation of deep neural networks into interpretable surrogate models	en
dc.type	Text
dc.type.publicationtype	PhDThesis
dcterms.accessRights	open access
eldorado.dnb.deposit	true
eldorado.secondarypublication	false

Dateien

Originalbündel

Gerade angezeigt 1 - 1 von 1

Name:: Dissertation_Schlueter.pdf
Größe:: 5.21 MB
Format:: Adobe Portable Document Format
Beschreibung:: DNB

Herunterladen

Lizenzbündel

Gerade angezeigt 1 - 1 von 1

Name:: license.txt
Größe:: 4.82 KB
Format:: Item-specific license agreed upon to submission
Beschreibung:

Herunterladen

Sammlungen

LS 05 Programmiersysteme