From handwritten words to cuneiform signs retrieval using semantic attributes

Loading...
Thumbnail Image

Date

2024

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Archives all over the world store a vast number of manuscripts that contain invaluable information on cultural heritage. Extracting the content from these manuscripts requires considerable human effort. Being able to sift through extensive manuscript collections quickly is particularly interesting for historical documents. For this purpose, pattern recognition methods have been used to transform handwritten texts from document images into machine-readable formats fully automatically. Well-known approaches are based on recognizing individual letters or words to transcribe a text. While these approaches achieve outstanding performance for machine-printed documents, their results for handwritten texts often fail to meet expectations, especially for collections of historical documents. Handwritten texts from old manuscripts typically exhibit sundry degradations and large variability in writing styles. Faced with these challenges, document analysis approaches have increasingly adopted retrieval-based methods since they do not rely on individual letter or word recognition but rather evaluate similarities between a query and segmented parts of document images. The obtained retrieval list contains document parts sorted in descending order according to their similarity to the query. This feature lets the user compare the list's contents and determine relevant items. Words in handwritten documents are often the focus of interest, leading to the task of retrieving words being called word spotting in the document image analysis community. This thesis presents a methodology for segmentation-based word spotting in handwritten documents. Word images and textual strings must first be transformed into numerical representations to obtain a sorted retrieval list. Subsequently, a similarity between two representations is evaluated by a particular measure. This thesis utilizes the highly successful word embedding technique pyramidal histogram of characters (PHOC). This embedding is inspired by semantic attributes where textual strings are decomposed in a pyramidal scheme, and the occurrences of individual characters are indicated using binary values. A similarity between two numerical word representations is assessed through a novel measurement named probabilistic retrieval model (PRM). This model evaluates the probability between two binary-valued semantic attribute representations based on the assumption that PHOC attributes are Bernoulli distributed. The similarities obtained from the PRM highly depend on the quality of predicted attributes. While representations from textual strings are obtained through the PHOC algorithm, handwritten word images must be transformed first. This thesis uses convolutional neural networks (CNNs) to predict PHOC representations for corresponding word images. During the last decade, these networks have consistently achieved state-of-the-art results in various computer vision tasks. As a result, CNNs are nowadays the de-facto standard models for image classification. The presented method applies a statistical framework named generalized linear models (GLMs) to derive the binary cross-entropy loss (BCEL) as the suitable loss function for the PRM similarity measure. The BCEL is subsequently used to train CNN models to estimate binary attribute probabilities accurately. A significant advantage is a direct connection between the binary cross-entropy loss and the probabilistic retrieval model. Minimizing the BCEL function is equivalent to maximizing the PRM similarity between two equal PHOC representations. This word-spotting methodology is adapted to an ancient writing system called cuneiform. Cuneiform signs are formed by characteristic wedge-shaped impressions. Norbert Gottstein proposed a representation based on alphanumeric expressions, describing a cuneiform sign according to their wedge impressions. These alphanumeric expressions are used to define a set of semantic attributes named Gottstein representation. This thesis further develops the holistic Gottstein representation, including spatial information. The wedge positions are encoded by applying a pyramidal segmentation, which yields binary attribute representations indicating wedge semantics and their approximate position within the cuneiform sign. This thesis presents two approaches to decomposing cuneiform signs according to predefined pyramidal schemes. The first approach divides cuneiform signs horizontally and vertically, following a grid-like pattern called spatial pyramid Gottstein (SPG) representation. The second approach is based on annotations describing wedge constellations according to their sequential order. These sign encodings are assigned to pyramidal splits by applying the PHOC algorithm, and the resulting representation is referred to as temporal pyramid Gottstein (TPG). By using these representations, signs can now be expressed by their wedge constellations and types, which enables the method to perform cuneiform sign spotting in a novel retrieval scenario named query-by-expression (QbX). This thesis proposes three core contributions: the design of the probabilistic retrieval model as a novel similarity measure, deriving the binary cross-entropy as the suitable loss function for the PRM, and two different cuneiform sign representations based on wedge impressions encoded as binary semantic attributes. These contributions are evaluated on six benchmarks in total. Four include handwritten text documents, and two benchmarks contain images of cuneiform tablets. The experiments show that combining the PRM and BCEL achieves state-of-the-art results and even exceeds the performance using other combinations of similarity measures and loss functions. Representing cuneiform signs by their wedge impressions enables the user to query a database without the necessity of visual examples. Moreover, the use of pyramidal decomposition provides a more detailed description of cuneiform signs, leading to increased retrieval performance.

Description

Table of contents

Keywords

Word spotting, Handwritten words, Cuneiform, Retrieval, Semantic attributes

Citation