Towards formal explainability: Faithful distillation of deep neural networks into interpretable surrogate models

dc.contributor.advisorSteffen, Bernhard
dc.contributor.authorSchlüter, Maximilian
dc.contributor.refereeJansen, Nils
dc.date.accepted2025-04-04
dc.date.accessioned2025-07-16T05:45:38Z
dc.date.available2025-07-16T05:45:38Z
dc.date.issued2025
dc.description.abstractThe goal of this thesis is to make the semantic function of trained deep neural networks accessible in a compact and efficient data structure based on formal principles. Deep neural networks are today the predominant machine learning model based on their remarkable performance over the last two decades. From the first breakthroughs in computer vision with AlexNet to today's sophisticated large language models for natural language processing, deep neural networks have become the state-of-the-art in machine learning. One key factor for this success is their ability to autonomously learn from data without human guidance, enabling end-to-end optimization. On the other hand, the lack of human involvement makes the internal structure of neural networks hard to understand, as it is missing structure and design. Besides their large number of learnable parameters, three key factors are identified that make these learned intermediate representations so difficult to understand: they are distributed, non-linear, and sub-symbolic. This thesis proposes a new post-hoc approach for explaining deep neural networks based on faithful surrogate models. Through a systematic and property-preserving decomposition of piecewise linear neural networks into their linear regions, the internal structure of neural networks is compiled into a new surrogate model. In this way, the typical representation of DNNs based on their dataflow, which is optimized for execution speed on graphics cards, is converted into a representation focusing on controlflow, which is more suitable for formal analysis. Consequently, the new representation is free of the distributed and non-linear internal representations. The surrogate model provides explanatory information from which different types of explanations can be derived, such as outcome explanations, class characterizations, and model explanations. At the core of this approach stands an optimized data structure, a binary decision tree, that combines ideas from Algebraic Decision Trees, Binary Space Partitioning Trees, and classic program optimization. By placing function composition at the center, these trees enable a modular approach to faithful distillation that is easily extensible and simplifies reasoning. Through optimizations, such as infeasible path elimination, redundancies in the tree are identified and pruned. As the distilled tree mirrors the network's behavior, it can be used to analyze its semantic properties, such as fairness or robustness. As a result of their formal grounding, these trees integrate well with mathematical notions. Furthermore, based on two-dimensional slices, it is possible to visualize the actual decision boundaries of a neural network, setting an ideal ground for exploring its behavior using intuition.en
dc.identifier.urihttp://hdl.handle.net/2003/43796
dc.identifier.urihttp://dx.doi.org/10.17877/DE290R-25570
dc.language.isoen
dc.subjectDeep neural networksen
dc.subjectModel distillationen
dc.subjectDescision Treesen
dc.subjectInterpretable surrogate modelen
dc.subjectActivation pattern decompostitionen
dc.subjectSymbolic executionen
dc.subjectRule extractionen
dc.subjectInput space patititonen
dc.subjectContinuous Piece-wise linearen
dc.subject.ddc004
dc.subject.rswkTiefes neuronales Netzde
dc.subject.rswkEntscheidungsbaumde
dc.subject.rswkSymbolische Ausführungde
dc.subject.rswkWissensextraktionde
dc.subject.rswkStückweise lineare Funktionde
dc.titleTowards formal explainability: Faithful distillation of deep neural networks into interpretable surrogate modelsen
dc.typeText
dc.type.publicationtypePhDThesis
dcterms.accessRightsopen access
eldorado.secondarypublicationfalse

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Dissertation_Schlueter.pdf
Size:
5.21 MB
Format:
Adobe Portable Document Format
Description:
DNB
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
4.82 KB
Format:
Item-specific license agreed upon to submission
Description: