Machine learning for acquiring knowledge in astro-particle physics
Loading...
Date
2022
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
This thesis explores the fundamental aspects of machine learning, which are involved with acquiring knowledge in the research field of astro-particle physics. This research field substantially relies on machine learning methods, which reconstruct the properties of astro-particles from the raw data that specialized telescopes record. These methods are typically trained from resource-intensive simulations, which reflect the existing knowledge about the particles—knowledge that physicists strive to expand. We study three fundamental machine learning tasks, which emerge from this goal.
First, we address ordinal quantification, the task of estimating the prevalences of ordered classes in sets of unlabeled data. This task emerges from the need for testing the agreement of astro-physical theories with the class prevalences that a telescope observes. To this end, we unify existing methods on quantification, propose an alternative optimization process, and develop regularization techniques to address ordinality in quantification problems, both in and outside of astro-particle physics. These advancements provide more accurate reconstructions of the energy spectra of cosmic gamma ray sources and, hence, support physicists in drawing conclusions from their telescope data.
Second, we address learning under class-conditional label noise. More particularly, we focus on a novel setting, in which one of the class-wise noise rates is known and one is not. This setting emerges from a data acquisition protocol, through which astro-particle telescopes simultaneously observe a region of interest and several background regions. We enable learning under this type of label noise with algorithms for consistent, noise-aware decision thresholding. These algorithms yield binary classifiers, which outperform the existing state-of-the-art in gamma hadron classification with the FACT telescope. Moreover, unlike the state-of-the-art, our classifiers are entirely trained from the real telescope data and thus do not require any resource-intensive simulation.
Third, we address active class selection, the task of actively finding those proportions of classes which optimize the classification performance. In astro-particle physics, this task emerges from the simulation, which produces training data in any desired class proportions. We clarify the implications of this setting from two theoretical perspectives, one of which provides us with bounds of the resulting classification performance. We employ these bounds in a certificate of model robustness, which declares a set of class proportions for which the model is accurate with a high probability. We also employ these bounds in an active strategy for class-conditional data acquisition. Our strategy uniquely considers existing uncertainties about those class proportions that have to be handled during the deployment of the classifier, while being theoretically well-justified.
Description
Table of contents
Keywords
Machine learning, Supervised learning, Ordinal quantification, Class-conditional label noise, Active class selection, Cherenkov telescope, Gamma-ray astronomy, Simulation, Data analysis, Big data