Segmentation-free word spotting with bag-of-features hidden Markov models

Rothacker, Leonard2019-08-222019-08-222019http://hdl.handle.net/2003/3818610.17877/DE290R-20165The method that is proposed in this thesis makes document images searchable with minimum manual effort. This works in the query-by-example scenario where the user selects an exemplary occurrence of the query word in a document image. Afterwards, an entire collection of document images is searched automatically. The major challenge is to detect relevant words and to sort them according to similarity to the query. However, recognizing text in historic document images can be considered as extremely challenging. Different historic document collections have highly irregular visual appearances due to non-standardized layouts or the large variabilities in handwritten script. An automatic text recognizer requires huge amounts of annotated samples from the collection that are usually not directly available. In order to search document images with just a single example of the query word, the information that is available about the problem domain is integrated at various levels. Bag-of-features are a powerful image representation that can be adapted to the data automatically. The query word is represented with a hidden Markov model. This statistical sequence model is very suitable for the sequential structure of text. An important assumption is that the visual variability of the text within a single collection is limited. For example, this is typically the case if the documents have been written by only a few writers. Furthermore, the proposed method requires only minimal heuristic assumptions about the visual appearance of text. This is achieved by processing document images as a whole without requiring a given segmentation of the images on word level or on line level. The detection of potentially relevant document regions is based on similarity to the query. It is not required to recognize words in general. Word size variabilities can be handled by the hidden Markov model. In order to make the computationally costly application of the sequence model feasible in practice, regions are retrieved according to approximate similarity with an efficient model decoding algorithm. Since the approximate approach retrieves regions with high recall, re-ranking these regions with the sequence model leads to highly accurate word spotting results. In addition, the method can be extended to textual queries, i.e., query-by-string, if annotated samples become available. The method is evaluated on five benchmark datasets. In the segmentation-free query-by-example scenario where no annotated sample set is available, the method outperforms all other methods that have been evaluated on any of these five benchmarks. If only a small dataset of annotated samples is available, the performance in the query-by-string scenario is competitive with the state-of-the-art.enWord spottingDocument image analysisImage retrievalComputer vision004Segmentation-free word spotting with bag-of-features hidden Markov modelsdoctoral thesisBildanalyseMaschinelles Sehen