Wöhler, ChristianWilhelm, Thorsten2020-07-072020-07-072019http://hdl.handle.net/2003/39195http://dx.doi.org/10.17877/DE290R-21113In this thesis, a contribution to explainable artificial intelligence is made. More specifically, the aspect of artificial intelligence which focusses on recreating the human perception is tackled from a previously neglected direction. A variant of human perception is building a mental model of the extents of semantic objects which appear in the field of view. If this task is performed by an algorithm, it is termed image segmentation. Recent methods in this area are mostly trained in a supervised fashion by exploiting an as extensive as possible data set of ground truth segmentations. Further, semantic segmentation is almost exclusively tackled by Deep Neural Networks (DNNs). Both trends pose several issues. First, the annotations have to be acquired somehow. This is especially inconvenient if, for instance, a new sensor becomes available, new domains are explored, or different quantities become of interest. In each case, the cumbersome and potentially costly labelling of the raw data has to be redone. While annotating keywords to an image can be achieved in a reasonable amount of time, annotating every pixel of an image with its respective ground truth class is an order of magnitudes more time-consuming. Unfortunately, the quality of the labels is an issue as well because fine-grained structures like hair, grass, or the boundaries of biological cells have to be outlined exactly in image segmentation in order to derive meaningful conclusions. Second, DNNs are discriminative models. They simply learn to separate the features of the respective classes. While this works exceptionally well if enough data is provided, quantifying the uncertainty with which a prediction is made is then not directly possible. In order to allow this, the models have to be designed differently. This is achieved through generatively modelling the distribution of the features instead of learning the boundaries between classes. Hence, image segmentation is tackled from a generative perspective in this thesis. By utilizing mixture models which belong to the set of generative models, the quantification of uncertainty is an implicit property. Additionally, the dire need of annotations can be reduced because mixture models are conveniently estimated in the unsupervised setting. Starting with the computation of the upper bounds of commonly used probability distributions, this knowledge is used to build a novel probability distribution. It is based on flexible marginal distributions and a copula which models the dependence structure of multiple features. This modular approach allows great flexibility and shows excellent performance at image segmentation. After deriving the upper bounds, different ways to reach them in an unsupervised fashion are presented. Including the probable locations of edges in the unsupervised model estimation greatly increases the performance. The proposed models surpass state-of-the-art accuracies in the generative and unsupervised setting and are on-par with many discriminative models. The analyses are conducted following the Bayesian paradigm which allows computing uncertainty estimates of the model parameters. Finally, a novel approach combining a discriminative DNN and a local appearance model in a weakly supervised setting is presented. This combination yields a generative semantic segmentation model with minimal annotation effort.enImage segmentationMixture modelsUnsupervised learningDeep learningUncertainty quantification620Uncertainty-based image segmentation with unsupervised mixture modelsTextBildsegmentierungUnüberwachtes LernenDeep learningQuantifizierung