Chair of Data Science and Data Engineering
Permanent URI for this collection
Browse
Recent Submissions
Item Rankings and importance scores as multi-facets of explainable machine learning(2024) Balestra, Chiara; Müller, Emmanuel; De Bie, TijlRankings represent the natural way to access the importance of a finite set of items. Ubiquitous in real-world applications and machine-learning methods, they mostly derive from automated or human-based importance score assignments. Many fields involving rankings, such as Recommender Systems, feature selection, and anomaly detection, overlap with human-derived scoring systems, such as candidate selection and operational risk assessments. Rankings are explicitly hard to evaluate; several challenges derive from concerned biases, fairness issues, and also from their derivation and evaluation. This thesis spins around deriving importance scores and rankings as solutions in various contexts and applications. Starting from unsupervised feature importance scores based on an unconventional use of Shapley values for unlabeled data, it will touch a more applied field with an ad-hoc unsupervised methodology for reducing the dimensionality of collections of gene sets. We then focus on feature importance scores in a time-dependent context, focusing on detecting correlational concept drifts in the univariate dimensions of unlabeled streaming data. The whole work is commonly characterized by seeking to improve abstract concepts of trustworthiness and reliability, with an open eye on the consistency of evaluations and methods. In this direction, we add insights into using saliency importance score assignments for interpreting time series classification methods and define desirable mathematical properties for ranking evaluation metrics. Furthermore, we use Shapley values to interpret unsupervised anomaly detection deep methods based on features bagging. Lastly, we introduce some future and current challenges related to fairness issues in rank aggregations and some possible extensions of the current work.Item Unsupervised temporal anomaly detection(2024) Li, Bin; Müller, Emmanuel; Gama, JoãoAnomaly detection becomes essential across diverse domains. Data is usually collected sequentially in real-world applications such as sensor records and network logs. Consequently, a major challenge in anomaly detection is the real-time volatile sequential abnormal events. Recent research on time series has gained supreme advancements, leveraging the vast development of deep models like recurrent neural networks and transformers. However, most existing deep models focus on static time series while neglecting the dynamic streaming feature inherent in real-world deployment. A critical issue arises from the potential occurrence of distributional drift in streaming data, after which the pre-trained models become invalid. Furthermore, as machine learning models are applied in the safety-crucial fields like autonomous vehicles and medical diagnoses, the trustworthiness of model predictions becomes a growing concern. A desired anomaly detector is expected to both predict and interpret the abnormal events. This dissertation focuses on the intersecting research area between time series and data stream anomaly detection as well as their interpretability. We first develop a contrastivelearning- based self-supervised approach for time series anomaly detection, contributing to the effective representation learning of time series anomalies without labels. Subsequently, we investigate a novel concept drift detection approach for identifying correlation changes in the data stream. We also propose a state-transition-aware online anomaly detection framework for data streams. Finally, we delve into the necessary properties of time series interpreters, including cohesiveness, consistency, and robustness. We also showcase an example-based interpreter for reconstruction-based anomaly detection models, which provides intuitive and contrastive explanations of the reasons behind anomalies. The proposed approaches are rigorously evaluated on various popular real-world benchmark datasets and simulations.Item Verification of unsupervised neural networks(2023) Böing, Benedikt; Müller, Emmanuel; König, BarbaraNeural networks are at the forefront of machine learning being responsible for achievements such as AlphaGo. As they are being deployed in more and more environments - even in safety-critical ones such as health care - we are naturally interested in assuring their reliability. However, the discovery of so-called adver- sarial attacks for supervised neural networks demonstrated that tiny distortions in the input space can lead to misclassifications and thus, to potentially catas- trophic errors: Patients could be diagnosed wrongly, or a car might confuse stop signs and traffic lights. Thus, ideally, we would like to guarantee that these types of attacks cannot occur. In this thesis we extend the research on reliable neural networks to the realm of unsupervised learning. This includes defining proper notions of reliability, as well as analyzing and adapting unsupervised neural networks with respect to this notion. Our definitions of reliability depend on the underlying neural networks and the problems they are meant to solve. However, in all our cases, we aim for guarantees on a continuous input space containing infinitely many points. Therefore we extend the traditional setting of testing against a finite dataset such that we require specialized tools to actually check a given network for reliability. We will demonstrate how we can leverage neural network verification for these purposes. Using neural network verification, however, entails a major challenge: It does not scale up to large networks. To overcome this limitation, we design a novel training procedure yielding networks that are both more reliable according to our definition as well as more amenable for neural network verification. By exploiting the piecewise affine structure of our networks, we can locally simplify them and thus decrease verification runtime significantly. We also take a per- spective that complements a neural network’s training by exploring how we can repair non-reliable neural network ensembles. With this thesis, we paradigmatically show the necessity and the complications of unsupervised neural network verification. It aims to pave the way for more research to come and towards a safe usage of these simple-to-build yet difficult-to-understand models given by unsupervised neural networks.Item Event impact analysis for time series(2022) Scharwächter, Erik; Müller, Emmanuel; Jentsch, CarstenTime series arise in a variety of application domains—whenever data points are recorded over time and stored for subsequent analysis. A critical question is whether the occurrence of events like natural disasters, technical faults, or political interventions leads to changes in a time series, for example, temporary deviations from its typical behavior. The vast majority of existing research on this topic focuses on the specific impact of a single event on a time series, while methods to generically capture the impact of a recurring event are scarce. In this thesis, we fill this gap by introducing a novel framework for event impact analysis in the case of randomly recurring events. We develop a statistical perspective on the problem and provide a generic notion of event impacts based on a statistical independence relation. The main problem we address is that of establishing the presence of event impacts in stationary time series using statistical independence tests. Tests for event impacts should be generic, powerful, and computationally efficient. We develop two algorithmic test strategies for event impacts that satisfy these properties. The first is based on coincidences between events and peaks in the time series, while the second is based on multiple marginal associations. We also discuss a selection of follow-up questions, including ways to measure, model and visualize event impacts, and the relationship between event impact analysis and anomaly detection in time series. At last, we provide a first method to study event impacts in nonstationary time series. We evaluate our methodological contributions on several real-world datasets and study their performance within large-scale simulation studies.