Some representation learning tasks and the inspection of their models
Loading...
Date
2022
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Alternative Title(s)
Abstract
Today, the field of machine learning knows a wide range of tasks with a wide range
of supervision sources, ranging from the traditional classification tasks with neatly labeled
data, over data with noisy labels to data with no labels, where we have to rely
on other forms of supervision, like self-supervision. In the first part of this thesis, we
design machine learning tasks for applications where we do not immediately have access
to neatly-labeled training data.
First, we design unsupervised representation learning tasks for training embedding
models for mathematical expression that allow retrieval of related formulae. We train
convolutional neural networks, transformer models and graph neural networks to embed
formulas from scientific articles into a real-valued vector space using contextual similarity
tasks as well as self-supervised tasks. We base our studies on a novel dataset
that consists of over 28 million formulae that we have extracted from scientific articles
published on arXiv.org. We represent the formulas in different input formats — images,
sequences or trees — depending on the embedding model. We compile an evaluation
dataset with annotated search queries from several different disciplines and showcase the
usefulness of our approach for deploying a search engine for mathematical expressions.
Second, we investigate machine learning tasks in astrophysics. Prediction models are
currently trained on simulated data, with hand-crafted features and using multiple singletask
models. In contrast, we build a single multi-task convolutional neural network that
works directly on telescope images and uses convolution layers to learn suitable feature
representations automatically. We design loss functions for each task and propose a
novel way to combine the different loss functions to account for their different scales and
behaviors. Next, we explore another form of supervision that does not rely on simulated
training data, but learns from actual telescope recordings. Through the framework of
noisy label learning, we propose an approach for learning gamma hadron classifiers that
outperforms existing classifiers trained on simulated, fully-labeled data. Our method is
general: it can be used for training models in scenarios that fit our noise assumption of
class-conditional label noise with exactly one known noise probability.
In the second part of this work, we develop methods to inspect models and gain
trust into their decisions. We focus on large, non-linear models that can no longer be
understood in their entirety through plain inspection of their trainable parameters. We
investigate three approaches for establishing trust in models.
First, we propose a method to highlight influential input nodes for similarity computations
performed by graph neural networks. We test this approach with our embedding
models for retrieval of related formulas and show that it can help understand the similarity
scores computed by the models.
Second, we investigate explanation methods that provide explanations based on the
training process that produced the model. This way we provide explanations that are
not merely an approximation of the computation of the prediction function, but actually
an investigation into why the model learned to produce an output grounded in the actual
data. We propose two different methods for tracking the training process and show how
they can be easily implemented within existing deep learning frameworks.
Third, we contribute a method to verify the adversarial robustness of random forest
classifiers. Our method is based on knowledge distillation of a random forest model into
a decision tree model. We bound the approximation error of using the decision tree as
a proxy model to the given random forest model and use these bounds to provide guarantees
on the adversarial robustness of the random forest. Consequently, our robustness
guarantees are approximative, but we can provably control the quality of our results
using a hyperparameter.
Description
Table of contents
Keywords
Maschinelles Lernen, Künstliche Intelligenz, Deep Learning, Trustworthy AI
Subjects based on RSWK
Maschinelles Lernen, Künstliche Intelligenz, Deep learning