LS 08 Künstliche Intelligenz

Permanent URI for this collection

Browse

Recent Submissions

Now showing 1 - 20 of 95
  • Item
    Evaluable explainability and applications to 3D vision
    (2025) Tan, Hanxiao; Müller, Emmanuel; Neider, Daniel
    With the breakthroughs in the performance of deep neural networks, they are applied in a wide range of fields, including several with high requirements for security. However, these black-box models suffer from potential risks, the most threatening of which is their opaque decision-making process. The recent rise of explainability studies on black-box models is a promising research direction to enhance the trustworthiness of models. Nevertheless, existing explainability studies are still limited, one being that they are difficult to be evaluated objectively due to the lack of ground truth, and the other being that the vast majority of relevant studies are constrained to a particular data format and lack extensibility. The two main parts of this dissertation address each of these two limitations. In the first half, we aim to improve the evaluability of explainability methods. We optimize the choice of baseline involved in the explanation evaluations and part of the explainability approaches to satisfy the uninformative definition. In addition, we complement the explanation evaluation metrics from three novel perspectives, namely robustness to parameter perturbations, generalizability, and sensitivity consistency. In the latter half, we extend the applicability of the explainability approaches to 3D computer vision field so that the trustworthiness of point cloud models is enhanced. We first extend the perturbation-based approach to point clouds and provide online toolkits to facilitate practical implementation. Subsequently, we propose two activation maximization-based point cloud global explainability approaches, which visualize input instances that are representative for specific categories. Moreover, we propose a non-DNN point cloud classifier that utilizes multi-scale fractal windows to extract distributional information and makes predictions via random forests, which significantly enhances the explainability compared to DNNs. Further, we adversarially analyze the decision sensitivity of point cloud models with the help of saliency maps generated by explainability methods. Finally, we analyze how the model learns 3D geometric features by analyzing the distribution of activations in the intermediate layers. Extensive experiments demonstrate that the proposed method contributes to the explainability evaluation and its adaptability on point clouds.
  • Item
    Explainable and interpretable time series forecasting and predictive maintenance
    (2025) Jakobs, Matthias; Liebig, Thomas; Gama, João
    Time-series data is ubiquitous in numerous application domains today, including safety-critical settings such as medical and industrial scenarios. However, real-world data can change its characteristics with time, which is referred to as drifts in concept. The phenomenon of concept drifts is particularly troubling for time-series forecasting tasks, where the goal is to predict future time-series values given known, past values. Traditional forecasting methods assume that these characteristics are fixed to achieve a high predicting performance. In this thesis, we will inspect methods how to adapt to changing concepts for time-series forecasting using both complex, deep learning methods and simple, interpretable models. To do that, we utilize the online model selection and online ensemble pruning frameworks for selecting between pools of different models in an online manner. Specifically, we will focus on aspects of explainability and interpretability throughout as these aspects are crucial for applying forecasting methods in practice. Additionally, we investigate a real-world scenario for predictive maintenance, one area of application where explainable and interpretable time-series analysis methods are crucial to gain insights for practitioners and technicians alike.
  • Item
    Towards understanding and avoiding limitations of convolutions on graphs
    (2025) Roth, Andreas; Liebig, Thomas; Kriege, Nils Morten
    Many challenging tasks involve data with an inherent graph structure. Examples of such graphs include molecules, road networks, and transaction records. As such data becomes increasingly available, the need for methods that can effectively leverage graphstructured data grows accordingly. While message-passing neural networks (MPNNs) have shown promising results, their impact on real-world applications remains limited. Although various phenomena have been identified as possible causes that limit the performance of MPNNs, their theoretical foundations remain poorly understood, leading to fragmented research efforts. In this thesis, we provide an in-depth theoretical analysis and identify several key properties limiting the performance of MPNNs. Building on these findings, we propose several frameworks that address these shortcomings. We identify and analyze two properties exhibited by many MPNNs: shared component amplification (SCA), where each message-passing iteration amplifies the same components across all feature channels, and component dominance (CD), where a single component gets increasingly dominantly amplified as more message-passing steps are applied. These properties lead to the observable phenomenon of rank collapse of node representations, which we identify as a generalization of the established over-smoothing phenomenon. By generalizing and decomposing over-smoothing, we enable a deeper understanding of MPNNs, more targeted solutions, the identification of the relevance of each property, and more precise communication within the field. To avoid SCA, we show that utilizing multiple computational graphs or edge relations is necessary. We propose the multi-relational split (MRS) framework, which transforms any existing MPNN into a variant that employs multiple edge relations. We identify edge splitting properties that enable the resulting MPNN to avoid SCA. Additionally, we define the spectral graph convolution for multiple feature channels (MIMO-GC), which naturally uses multiple computational graphs. We propose localized MIMO-GCs (LMGC) as a framework for MPNNs that approximate the MIMO-GC and inherit its beneficial properties.We show that LMGCs can avoid SCA and be injective on multisets. To address CD, we demonstrate the close connection between MPNNs and the Page-Rank algorithm. This connection allows us to transfer insights and modifications from PageRank to MPNNs. Based on personalized PageRank, we propose a variant of MPNNs that similarly allows for infinitely many message-passing iterations, while preserving initial node features. Collectively, these results contribute to a more complete theoretical understanding of MPNNs, allowing future research to be more targeted. Our findings also establish frameworks for constructing MPNNs that exhibit favorable properties.
  • Item
    Advancing the sustainability of machine learning and artificial intelligence via labeling and meta-learning
    (2025) Fischer, Raphael; Liebig, Thomas; Webb, Geoffrey I.
    Artificial Intelligence (AI), driven primarily by advances in machine learning, is having a transformative impact on our world. While the availability of AI offers numerous technological opportunities, it also poses significant challenges to our society, economy, and environment. Despite the imperative need for sustainable development, the research community and service providers are mostly focused on scale and predictive quality, while neglecting the importance of resource efficiency and transparency. This observation is particularly problematic in the context of AI-as-a-service, as modern AI practitioners can neither be expected to understand intricate performance trade-offs nor make sustainable decisions. This dissertation addresses the challenge of advancing AI sustainability by establishing more transparency and aiding informed decision making, under consideration of diverse stakeholder perspectives. The thesis first introduces fundamental concepts of AI and discusses the vast research landscape in which it is situated. It then presents three central contributions based on respective scientific publications: (1) a methodology and software framework for sustainable and trustworthy reporting (STREP), (2) concepts and evaluations for the high-level labeling of AI models, and (3) a novel take on meta-learning and automated machine learning that allows for being resource-aware and user-centric. After critically reviewing the shortcomings of current reporting, the STREP methods are introduced and applied to investigate performance trade-offs and reporting biases regarding AI models and hardware. Inspired by consumer communication systems such as energy labels, concepts for labeling AI models are proposed and validated through interdisciplinary thematic analysis. Finally, the thesis extends the general idea of meta-learning to perform automated model selection while accounting for multiple performance dimensions and user-defined priorities. The accompanying software repository provides additional benefits to readers and practitioners, offering generalized implementations of the central STREP methodology and an interactive exploration tool for all experiments. The thesis concludes with a critical discussion of its findings, limitations, and directions for future research on AI sustainability. By proposing means for bridging knowledge gaps and explicitly considering resource efficiency during model creation, this work promotes sustainable development in the evolving AI landscape.
  • Item
    Analysis of high-dimensional data in the context of intrinsic dimensionality
    (2025) Thordsen, Erik; Schubert, Erich; Aumüller, Martin
    Many modern machine learning applications and data analysis make use of or produce large amounts of high-dimensional data. While it is commonly known, that the performance of many algorithms degrades with increasing dimensionality both in terms of speed and quality, the performance on real-world datasets is often times much better than expected. In fact, real-world datasets tend to occupy only a lower-dimensional manifold in the available observed space, either due to the underlying generative process or the sparsity of the data. To describe that phenomenon, the concept of Intrinsic Dimensionality (ID) was introduced, which describes the minimum number of latent variables to produce the observed manifold. In a pursuit to estimate the ID of non-linear manifolds, small localities of the data are often considered, giving rise to the concept of Local Intrinsic Dimensionality (LID) which aside from estimating a global ID also allows for a spatially resolved analysis of the data. Since the LID eludes direct observation beyond two or three dimensions, it has to be estimated from the data. In this thesis, we explore multiple new approaches to estimate the LID of data in Euclidean spaces, investigate their analytical and empirical properties, and compare them to existing approaches both qualitatively and quantitatively. One of these approaches, the Angle-Based Intrinsic Dimensionality (ABID) estimator, has very useful theoretical properties. We therefore provide an exemplary derivation of ABID to vector fields as a potential control mechanism in algorithms such as Gradient Descent as a showcase of how LID estimation can be useful beyond point clouds. We also investigate if and how LID estimation approaches can be applied to non-Euclidean spaces. While the theory of LID estimation in non-Euclidean spaces remains largely unresolved, we provide a visionary prospect on the future of the field and provide anecdotical evidence for possible future applications of LID.
  • Item
    Measuring and resource-efficient optimization of clustering quality
    (2025) Lenssen, Lars Filipp; Schubert, Erich; Zimek, Arthur
    Cluster analysis is a fundamental task in exploratory data mining, widely used to uncover hidden structures within datasets across various fields. It has broad applications, from identifying subgroups in gene expression data for disease research to segmenting customer bases in industry. Over time, a diverse range of clustering methods has been developed to handle the complex structure of different data domains. Despite these, key challenges remain, particularly in evaluating the quality of clustering results and optimizing the performance of clustering algorithms. The research presents the Average Medoid Silhouette (AMS), an improved version of the Average Silhouette Width (ASW), and introduces the FastMSC and FasterMSC algorithms, which optimize the AMS directly. The DynMSC algorithm is also proposed to simplify determining the optimal number of clusters. For categorical data, the Average Relative Entropy Score (ARES) and Minimum Relative Entropy Contrast (MREC) are introduced, forming the basis of the CatRED algorithm, an agglomerative hierarchical method applied in information systems research.
  • Item
    Feature selection on quantum computers
    (2023-02-20) Mücke, Sascha; Heese, Raoul; Müller, Sabine; Wolter, Moritz; Piatkowski, Nico
    In machine learning, fewer features reduce model complexity. Carefully assessing the influence of each input feature on the model quality is therefore a crucial preprocessing step. We propose a novel feature selection algorithm based on a quadratic unconstrained binary optimization (QUBO) problem, which allows to select a specified number of features based on their importance and redundancy. In contrast to iterative or greedy methods, our direct approach yields higher-quality solutions. QUBO problems are particularly interesting because they can be solved on quantum hardware. To evaluate our proposed algorithm, we conduct a series of numerical experiments using a classical computer, a quantum gate computer, and a quantum annealer. Our evaluation compares our method to a range of standard methods on various benchmark data sets. We observe competitive performance.
  • Item
    Joint leaf-refinement and ensemble pruning through L1 regularization
    (2023-03-15) Buschjäger, Sebastian; Morik, Katharina
    Ensembles are among the state-of-the-art in many machine learning applications. With the ongoing integration of ML models into everyday life, e.g., in the form of the Internet of Things, the deployment and continuous application of models become more and more an important issue. Therefore, small models that offer good predictive performance and use small amounts of memory are required. Ensemble pruning is a standard technique for removing unnecessary classifiers from a large ensemble that reduces the overall resource consumption and sometimes improves the performance of the original ensemble. Similarly, leaf-refinement is a technique that improves the performance of a tree ensemble by jointly re-learning the probability estimates in the leaf nodes of the trees, thereby allowing for smaller ensembles while preserving their predictive performance. In this paper, we develop a new method that combines both approaches into a single algorithm. To do so, we introduce L1 regularization into the leaf-refinement objective, which allows us to jointly prune and refine trees at the same time. In an extensive experimental evaluation, we show that our approach not only offers statistically significantly better performance than the state-of-the-art but also offers a better accuracy-memory trade-off. We conclude our experimental evaluation with a case study showing the effectiveness of our method in a real-world setting.
  • Item
    Smaller world models for reinforcement learning
    (2023-08-10) Robine, Jan; Uelwer, Tobias; Harmeling, Stefan
    Model-based reinforcement learning algorithms try to learn an agent by training a model that simulates the environment. However, the size of such models tends to be quite large which could be a burden as well. In this paper, we address the question, how we could design a model with fewer parameters than previous model-based approaches while achieving the same performance in the 100 K-interactions regime. For this purpose, we create a world model that combines a vector quantized-variational autoencoder to encode observations and a convolutional long short-term memory to model the dynamics. This is connected to a model-free proximal policy optimization agent to train purely on simulated experience from this world model. Detailed experiments on the Atari environments show that it is possible to reach comparable performance to the SimPLe method with a significantly smaller world model. A series of ablation studies justify our design choices and give additional insights.
  • Item
    Explainable online ensemble of deep neural network pruning for time series forecasting
    (2022-08-02) Saadallah, Amal; Jakobs, Matthias; Morik, Katharina
    Both the complex and evolving nature of time series data make forecasting among one of the most challenging tasks in machine learning. Typical methods for forecasting are designed to model time-evolving dependencies between data observations. However, it is generally accepted that none of them are universally valid for every application. Therefore, methods for learning heterogeneous ensembles by combining a diverse set of forecasters together appears as a promising solution to tackle this task. While several approaches in the context of time series forecasting have focused on how to combine individual models in an ensemble, ranging from simple and enhanced averaging tactics to applying meta-learning methods, few works have tackled the task of ensemble pruning, i.e. individual model selection to take part in the ensemble. In addition, in classical ML literature, ensemble pruning techniques are mostly restricted to operate in a static manner. To deal with changes in the relative performance of models as well as changes in the data distribution, we employ gradient-based saliency maps for online ensemble pruning of deep neural networks. This method consists of generating individual models’ performance saliency maps that are subsequently used to prune the ensemble by taking into account both aspects of accuracy and diversity. In addition, the saliency maps can be exploited to provide suitable explanations for the reason behind selecting specific models to construct an ensemble that plays the role of a forecaster at a certain time interval or instant. An extensive empirical study on many real-world datasets demonstrates that our method achieves excellent or on par results in comparison to the state-of-the-art approaches as well as several baselines. Our code is available on Github (https://github.com/MatthiasJakobs/os-pgsm/tree/ecml_journal_2022).
  • Item
    Approaching phase retrieval with deep learning
    (2023) Uelwer, Tobias; Harmeling, Stefan; Dickscheid, Timo
    Phase retrieval is the process of reconstructing images from only magnitude measurements. The problem is particularly challenging as most of the information about the image is contained in the missing phase. An important phase retrieval problem is Fourier phase retrieval, where the magnitudes of the Fourier transform are given. This problem is relevant in many areas of science, e.g., in X-ray crystallography, astronomy, microscopy, array imaging, and optics. In addition to Fourier phase retrieval, we also take a closer look at two additional phase retrieval problems: Fourier phase retrieval with a reference image and compressive Gaussian phase retrieval. Most methods for phase retrieval, e.g., the error-reduction algorithm or Fienup's hybrid-input output algorithms are optimization-based algorithms which solely minimize an error-function to reconstruct the image. These methods usually make strong assumptions about the measured magnitudes which do not always hold in practice. Thus, they only work reliably for easy instances of the phase retrieval problems but fail drastically for difficult instances. With the recent advances in the development of graphics processing units (GPUs), deep neural networks (DNNs) have become fashionable again and have led to breakthroughs in many research areas. In this thesis, we show how DNNs can be applied to solve the more difficult instances of phase retrieval problems when training data is available. On the one hand, we show how supervised learning can be used to greatly improve the reconstruction quality when training images and their corresponding measurements are available. We analyze the ability of these methods to generalize to out-of-distribution data. On the other hand, we take a closer look at an existing unsupervised method that relies on generative models. Unsupervised methods are agnostic toward the measurement process which is particularly useful for Gaussian phase retrieval. We apply this method to the Fourier phase retrieval problem and demonstrate how the reconstruction performance can be further improved with different initialization schemes. Furthermore, we demonstrate how optimizing intermediate representations of the underlying generative model can help overcoming the limited range of the model and, thus, can help to reach better solutions. Finally, we show how backpropagation can be used to learn reference images using a modification of the well-established error-reduction algorithm and discuss whether learning a reference image is always efficient. As it is common in machine learning research, we evaluate all methods on benchmark image datasets as it allows for easy reproducibility of the experiments and comparability to related methods. To better understand how the methods work, we perform extensive ablation experiments, and also analyze the influence of measurement noise and missing measurements.
  • Item
    Explainable adaptation of time series forecasting
    (2022) Saadallah, Amal; Morik, Katharina; Hammer, Barbara
    A time series is a collection of data points captured over time, commonly found in many fields such as healthcare, manufacturing, and transportation. Accurately predicting the future behavior of a time series is crucial for decision-making, and several Machine Learning (ML) models have been applied to solve this task. However, changes in the time series, known as concept drift, can affect model generalization to future data, requiring thus online adaptive forecasting methods. This thesis aims to extend the State-of-the-Art (SoA) in the ML literature for time series forecasting by developing novel online adaptive methods. The first part focuses on online time series forecasting, including a framework for selecting time series variables and developing ensemble models that are adaptive to changes in time series data and model performance. Empirical results show the usefulness and competitiveness of the developed methods and their contribution to the explainability of both model selection and ensemble pruning processes. Regarding the second part, the thesis contributes to the literature on online ML model-based quality prediction for three Industry 4.0 applications: NC-milling, bolt installation in the automotive industry, and Surface Mount Technology (SMT) in electronics manufacturing. The thesis shows how process simulation can be used to generate additional knowledge and how such knowledge can be integrated efficiently into the ML process. The thesis also presents two applications of explainable model-based quality prediction and their impact on smart industry practices.
  • Item
    Machine learning for acquiring knowledge in astro-particle physics
    (2022) Bunse, Mirko; Morik, Katharina; Sebastiani, Fabrizio
    This thesis explores the fundamental aspects of machine learning, which are involved with acquiring knowledge in the research field of astro-particle physics. This research field substantially relies on machine learning methods, which reconstruct the properties of astro-particles from the raw data that specialized telescopes record. These methods are typically trained from resource-intensive simulations, which reflect the existing knowledge about the particles—knowledge that physicists strive to expand. We study three fundamental machine learning tasks, which emerge from this goal. First, we address ordinal quantification, the task of estimating the prevalences of ordered classes in sets of unlabeled data. This task emerges from the need for testing the agreement of astro-physical theories with the class prevalences that a telescope observes. To this end, we unify existing methods on quantification, propose an alternative optimization process, and develop regularization techniques to address ordinality in quantification problems, both in and outside of astro-particle physics. These advancements provide more accurate reconstructions of the energy spectra of cosmic gamma ray sources and, hence, support physicists in drawing conclusions from their telescope data. Second, we address learning under class-conditional label noise. More particularly, we focus on a novel setting, in which one of the class-wise noise rates is known and one is not. This setting emerges from a data acquisition protocol, through which astro-particle telescopes simultaneously observe a region of interest and several background regions. We enable learning under this type of label noise with algorithms for consistent, noise-aware decision thresholding. These algorithms yield binary classifiers, which outperform the existing state-of-the-art in gamma hadron classification with the FACT telescope. Moreover, unlike the state-of-the-art, our classifiers are entirely trained from the real telescope data and thus do not require any resource-intensive simulation. Third, we address active class selection, the task of actively finding those proportions of classes which optimize the classification performance. In astro-particle physics, this task emerges from the simulation, which produces training data in any desired class proportions. We clarify the implications of this setting from two theoretical perspectives, one of which provides us with bounds of the resulting classification performance. We employ these bounds in a certificate of model robustness, which declares a set of class proportions for which the model is accurate with a high probability. We also employ these bounds in an active strategy for class-conditional data acquisition. Our strategy uniquely considers existing uncertainties about those class proportions that have to be handled during the deployment of the classifier, while being theoretically well-justified.
  • Item
    Some representation learning tasks and the inspection of their models
    (2022) Pfahler, Lukas; Morik, Katharina; Hotho, Andreas
    Today, the field of machine learning knows a wide range of tasks with a wide range of supervision sources, ranging from the traditional classification tasks with neatly labeled data, over data with noisy labels to data with no labels, where we have to rely on other forms of supervision, like self-supervision. In the first part of this thesis, we design machine learning tasks for applications where we do not immediately have access to neatly-labeled training data. First, we design unsupervised representation learning tasks for training embedding models for mathematical expression that allow retrieval of related formulae. We train convolutional neural networks, transformer models and graph neural networks to embed formulas from scientific articles into a real-valued vector space using contextual similarity tasks as well as self-supervised tasks. We base our studies on a novel dataset that consists of over 28 million formulae that we have extracted from scientific articles published on arXiv.org. We represent the formulas in different input formats — images, sequences or trees — depending on the embedding model. We compile an evaluation dataset with annotated search queries from several different disciplines and showcase the usefulness of our approach for deploying a search engine for mathematical expressions. Second, we investigate machine learning tasks in astrophysics. Prediction models are currently trained on simulated data, with hand-crafted features and using multiple singletask models. In contrast, we build a single multi-task convolutional neural network that works directly on telescope images and uses convolution layers to learn suitable feature representations automatically. We design loss functions for each task and propose a novel way to combine the different loss functions to account for their different scales and behaviors. Next, we explore another form of supervision that does not rely on simulated training data, but learns from actual telescope recordings. Through the framework of noisy label learning, we propose an approach for learning gamma hadron classifiers that outperforms existing classifiers trained on simulated, fully-labeled data. Our method is general: it can be used for training models in scenarios that fit our noise assumption of class-conditional label noise with exactly one known noise probability. In the second part of this work, we develop methods to inspect models and gain trust into their decisions. We focus on large, non-linear models that can no longer be understood in their entirety through plain inspection of their trainable parameters. We investigate three approaches for establishing trust in models. First, we propose a method to highlight influential input nodes for similarity computations performed by graph neural networks. We test this approach with our embedding models for retrieval of related formulas and show that it can help understand the similarity scores computed by the models. Second, we investigate explanation methods that provide explanations based on the training process that produced the model. This way we provide explanations that are not merely an approximation of the computation of the prediction function, but actually an investigation into why the model learned to produce an output grounded in the actual data. We propose two different methods for tracking the training process and show how they can be easily implemented within existing deep learning frameworks. Third, we contribute a method to verify the adversarial robustness of random forest classifiers. Our method is based on knowledge distillation of a random forest model into a decision tree model. We bound the approximation error of using the decision tree as a proxy model to the given random forest model and use these bounds to provide guarantees on the adversarial robustness of the random forest. Consequently, our robustness guarantees are approximative, but we can provably control the quality of our results using a hyperparameter.
  • Item
    Ensemble learning with discrete classifiers on small devices
    (2022) Buschjäger, Sebastian; Morik, Katharina; Fürnkranz, Johannes
    Machine learning has become an integral part of everyday life ranging from applications in AI-powered search queries to (partial) autonomous driving. Many of the advances in machine learning and its application have been possible due to increases in computation power, i.e., by reducing manufacturing sizes while maintaining or even increasing energy consumption. However, 2-3 nm manufacturing is within reach, making further miniaturization increasingly difficult while thermal design power limits are simultaneously reached, rendering entire parts of the chip useless for certain computational loads. In this thesis, we investigate discrete classifier ensembles as a resource-efficient alternative that can be deployed to small devices that only require small amounts of energy. Discrete classifiers are classifiers that can be applied -- and oftentimes also trained -- without the need for costly floating-point operations. Hence, they are ideally suited for deployment to small devices with limited resources. The disadvantage of discrete classifiers is that their predictive performance often lacks behind their floating-point siblings. Here, the combination of multiple discrete classifiers into an ensemble can help to improve the predictive performance while still having a manageable resource consumption. This thesis studies discrete classifier ensembles from a theoretical point of view, an algorithmic point of view, and a practical point of view. In the theoretical investigation, the bias-variance decomposition and the double-descent phenomenon are examined. The bias-variance decomposition of the mean-squared error is re-visited and generalized to an arbitrary twice-differentiable loss function, which serves as a guiding tool throughout the thesis. Similarly, the double-descent phenomenon is -- for the first time -- studied comprehensively in the context of tree ensembles and specifically random forests. Contrary to established literature, the experiments in this thesis indicate that there is no double-descent in random forests. While the training of ensembles is well-studied in literature, the deployment to small devices is often neglected. Additionally, the training of ensembles on small devices has not been considered much so far. Hence, the algorithmic part of this thesis focuses on the deployment of discrete classifiers and the training of ensembles on small devices. First, a novel combination of ensemble pruning (i.e., removing classifiers from the ensemble) and ensemble refinement (i.e., re-training of classifiers in the ensemble) is presented, which uses a novel proximal gradient descent algorithm to minimize a combined loss function. The resulting algorithm removes unnecessary classifiers from an already trained ensemble while improving the performance of the remaining classifiers at the same time. Second, this algorithm is extended to the more challenging setting of online learning in which the algorithm receives training examples one by one. The resulting shrub ensembles algorithm allows the training of ensembles in an online fashion while maintaining a strictly bounded memory consumption. It outperforms existing state-of-the-art algorithms under resource constraints and offers competitive performance in the general case. Last, this thesis studies the deployment of decision tree ensembles to small devices by optimizing their memory layout. The key insight here is that decision trees have a probabilistic inference time because different observations can take different paths from the root to a leaf. By estimating the probability of visiting a particular node in the tree, one can place it favorably in the memory to maximize the caching behavior and, thus, increase its performance without changing the model. Last, several real-world applications of tree ensembles and Binarized Neural Networks are presented.
  • Item
    Randomized outlier detection with trees
    (2020-12-15) Buschjäger, Sebastian; Honysz, Philipp-Jan; Morik, Katharina
    Isolation forest (IF) is a popular outlier detection algorithm that isolates outlier observations from regular observations by building multiple random isolation trees. The average number of comparisons required to isolate a given observation can then be used as a measure of its outlierness. Multiple extensions of this approach have been proposed in the literature including the extended isolation forest (EIF) as well as the SCiForest. However, we find a lack of theoretical explanation on why IF, EIF, and SCiForest offer such good practical performance. In this paper, we present a theoretical framework that views these approaches from a distributional viewpoint. Using this viewpoint, we show that isolation-based approaches first accurately approximate the data distribution and then secondly approximate the coefficients of mixture components using the average path length. Using this framework, we derive the generalized isolation forest (GIF) that also trains random isolation trees, but combining them moves beyond using the average path length. That is, GIF splits the data into multiple sub-spaces by sampling random splits as do the original IF variants do and directly estimates the mixture coefficients of a mixture distribution to score the outlierness on entire regions of data. In an extensive evaluation, we compare GIF with 18 state-of-the-art outlier detection methods on 14 different datasets. We show that GIF outperforms three competing tree-based methods and has a competitive performance to other nearest-neighbor approaches while having a lower runtime. Last, we highlight a use-case study that uses GIF to detect transaction fraud in financial data.
  • Item
    Geospatial IoT—the need for event-driven architectures in contemporary spatial data infrastructures
    (2018-09-25) Rieke, Matthes; Bigagli, Lorenzo; Herle, Stefan; Jirka, Simon; Kotsev, Alexander; Liebig, Thomas; Malewski, Christian; Paschke, Thomas; Stasch, Christoph
    The nature of contemporary spatial data infrastructures lies in the provision of geospatial information in an on-demand fashion. Although recent applications identified the need to react to real-time information in a time-critical way, research efforts in the field of geospatial Internet of Things in particular have identified substantial gaps in this context, ranging from a lack of standardisation for event-based architectures to the meaningful handling of real-time information as “events”. This manuscript presents work in the field of event-driven architectures as part of spatial data infrastructures with a particular focus on sensor networks and the devices capturing in-situ measurements. The current landscape of spatial data infrastructures is outlined and used as the basis for identifying existing gaps that retain certain geospatial applications from using real-time information. We present a selection of approaches—developed in different research projects—to overcome these gaps. Being designed for specific application domains, these approaches share commonalities as well as orthogonal solutions and can build the foundation of an overall event-driven spatial data infrastructure.
  • Item
    How is a data-driven approach better than random choice in label space division for multi-label classification?
    (2016-07-30) Szymanski, Piotr; Kajdanowicz, Tomasz; Kersting, Kristian
    We propose using five data-driven community detection approaches from social networks to partition the label space in the task of multi-label classification as an alternative to random partitioning into equal subsets as performed by RAkELd. We evaluate modularity-maximizing using fast greedy and leading eigenvector approximations, infomap, walktrap and label propagation algorithms. For this purpose, we propose to construct a label co-occurrence graph (both weighted and unweighted versions) based on training data and perform community detection to partition the label set. Then, each partition constitutes a label space for separate multi-label classification sub-problems. As a result, we obtain an ensemble of multi-label classifiers that jointly covers the whole label space. Based on the binary relevance and label powerset classification methods, we compare community detection methods to label space divisions against random baselines on 12 benchmark datasets over five evaluation measures. We discover that data-driven approaches are more efficient and more likely to outperform RAkELd than binary relevance or label powerset is, in every evaluated measure. For all measures, apart from Hamming loss, data-driven approaches are significantly better than RAkELd ( α=0.05 ), and at least one data-driven approach is more likely to outperform RAkELd than a priori methods in the case of RAkELd’s best performance. This is the largest RAkELd evaluation published to date with 250 samplings per value for 10 values of RAkELd parameter k on 12 datasets published to date.
  • Item
    A mathematical theory of making hard decisions: model selection and robustness of matrix factorization with binary constraints
    (2018) Heß, Sibylle Charlotte; Morik, Katharina; Siebes, Arno P. J. M.
    One of the first and most fundamental tasks in machine learning is to group observations within a dataset. Given a notion of similarity, finding those instances which are outstandingly similar to each other has manifold applications. Recommender systems and topic analysis in text data are examples which are most intuitive to grasp. The interpretation of the groups, called clusters, is facilitated if the assignment of samples is definite. Especially in high-dimensional data, denoting a degree to which an observation belongs to a specified cluster requires a subsequent processing of the model to filter the most important information. We argue that a good summary of the data provides hard decisions on the following question: how many groups are there, and which observations belong to which clusters? In this work, we contribute to the theoretical and practical background of clustering tasks, addressing one or both aspects of this question. Our overview of state-of-the-art clustering approaches details the challenges of our ambition to provide hard decisions. Based on this overview, we develop new methodologies for two branches of clustering: the one concerns the derivation of nonconvex clusters, known as spectral clustering; the other addresses the identification of biclusters, a set of samples together with similarity defining features, via Boolean matrix factorization. One of the main challenges in both considered settings is the robustness to noise. Assuming that the issue of robustness is controllable by means of theoretical insights, we have a closer look at those aspects of established clustering methods which lack a theoretical foundation. In the scope of Boolean matrix factorization, we propose a versatile framework for the optimization of matrix factorizations subject to binary constraints. Especially Boolean factorizations have been computed by intuitive methods so far, implementing greedy heuristics which lack quality guarantees of obtained solutions. In contrast, we propose to build upon recent advances in nonconvex optimization theory. This enables us to provide convergence guarantees to local optima of a relaxed objective, requiring only approximately binary factor matrices. By means of this new optimization scheme PAL-Tiling, we propose two approaches to automatically determine the number of clusters. The one is based on information theory, employing the minimum description length principle, and the other is a novel statistical approach, controlling the false discovery rate. The flexibility of our framework PAL-Tiling enables the optimization of novel factorization schemes. In a different context, where every data point belongs to a pre-defined class, a characterization of the classes may be obtained by Boolean factorizations. However, there are cases where this traditional factorization scheme is not sufficient. Therefore, we propose the integration of another factor matrix, reflecting class-specific differences within a cluster. Our theoretical considerations are complemented by empirical evaluations, showing how our methods combine theoretical soundness with practical advantages.
  • Item
    Exponential families on resource-constrained systems
    (2018) Piatkowski, Nico Philipp; Morik, Katharina; Ermon, Stefano
    This work is about the estimation of exponential family models on resource-constrained systems. Our main goal is learning probabilistic models on devices with highly restricted storage, arithmetic, and computational capabilities—so called, ultra-low-power devices. Enhancing the learning capabilities of such devices opens up opportunities for intelligent ubiquitous systems in all areas of life, from medicine, over robotics, to home automation—to mention just a few. We investigate the inherent resource consumption of exponential families, review existing techniques, and devise new methods to reduce the resource consumption. The resource consumption, however, must not be reduced at all cost. Exponential families possess several desirable properties that must be preserved: Any probabilistic model encodes a conditional independence structure—our methods keep this structure intact. Exponential family models are theoretically well-founded. Instead of merely finding new algorithms based on intuition, our models are formalized within the framework of exponential families and derived from first principles. We do not introduce new assumptions which are incompatible with the formal derivation of the base model, and our methods do not rely on properties of particular high-level applications. To reduce the memory consumption, we combine and adapt reparametrization and regularization in an innovative way that facilitates the sparse parametrization of high-dimensional non-stationary time-series. The procedure allows us to load models in memory constrained systems, which would otherwise not fit. We provide new theoretical insights and prove that the uniform distance between the data generating process and our reparametrized solution is bounded. To reduce the arithmetic complexity of the learning problem, we derive the integer exponential family, based on the very definition of sufficient statistics and maximum entropy estimation. New integer-valued inference and learning algorithms are proposed, based on variational inference, proximal optimization, and regularization. The benefit of this technique is larger, the weaker the underlying system is, e.g., the probabilistic inference on a state-of-the-art ultra-lowpower microcontroller can be accelerated by a factor of 250. While our integer inference is fast, the underlying message passing relies on the variational principle, which is inexact and has unbounded error on general graphs. Since exact inference and other existing methods with bounded error exhibit exponential computational complexity, we employ near minimax optimal polynomial approximations to yield new stochastic algorithms for approximating the partition function and the marginal probabilities. Changing the polynomial degree allows us to control the complexity and the error of our new stochastic method. We provide an error bound that is parametrized by the number of samples, the polynomial degree, and the norm of the model’s parameter vector. Moreover, important intermediate quantities can be precomputed and shared with the weak computational device to reduce the resource requirement of our method even further. All new techniques are empirically evaluated on synthetic and real-world data, and the results confirm the properties which are predicted by our theoretical derivation. Our novel techniques allow a broader range of models to be learned on resource-constrained systems and imply several new research possibilities.