Aggressive aggregation

Loading...
Thumbnail Image

Date

2021

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Among the first steps in a compilation pipeline is the construction of an Intermediate Representation (IR), an in-memory representation of the input program. Any attempt to program optimisation, both in terms of size and running time, has to operate on this structure. There may be one or multiple such IRs, however, most compilers use some form of a Control Flow Graph (CFG) internally. This representation clearly aims at general-purpose programming languages, for which it is well suited and allows for many classical program optimisations. On the other hand, a growing structural difference between the input program and the chosen IR can lose or obfuscate information that can be crucial for effective optimisation. With today’s rise of a multitude of different programming languages, Domain-Specific Languages (DSLs), and computing platforms, the classical machine-oriented IR is reaching its limits and a broader variety of IRs is needed. This realisation yielded, e.g., Multi-Level Intermediate Representation (MLIR), a compiler framework that facilitates the creation of a wide range of IRs and encourages their reuse among different programming languages and the corresponding compilers. In this modern spirit, this dissertation explores the potential of Algebraic Decision Diagrams (ADDs) as an IR for (domain-specific) program optimisation. The data structure remains the state of the art for Boolean function representation for more than thirty years and is well-known for its optimality in size and depth, i.e. running time. As such, it is ideally suited to represent the corresponding classes of programs in the role of an IR. We will discuss its application in a variety of different program domains, ranging from DSLs to machine-learned programs and even to general-purpose programming languages. Two representatives for DSLs, a graphical and a textual one, prove the adequacy of ADDs for the program optimisation of modelled decision services. The resulting DSLs facilitate experimentation with ADDs and provide valuable insight into their potential and limitations: input programs can be aggregated in a radical fashion, at the risk of the occasional exponential growth. With the aggregation of large Random Forests into a single aggregated ADD, we bring this potential to a program domain of practical relevance. The results are impressive: both running time and size of the Random Forest program are reduced by multiple orders of magnitude. It turns out that this ADD-based aggregation can be generalised, even to generaliii purpose programming languages. The resulting method achieves impressive speedups for a seemingly optimal program: the iterative Fibonacci implementation. Altogether, ADDs facilitate effective program optimisation where the input programs allow for a natural transformation to the data structure. In these cases, they have proven to be an extremely powerful tool for the optimisation of a program’s running time and, in some cases, of its size. The exploration of their potential as an IR has only started and deserves attention in future research.

Description

Table of contents

Keywords

Domain-specific languages, Algebraic decision diagrams, Binary decision diagrams, Random forests, Explainability, Aggregation, Program optimisation

Citation