Generation of training data for handwritten text recognition using latent diffusion models
Lade...
Datum
Autor:innen
Zeitschriftentitel
ISSN der Zeitschrift
Bandtitel
Verlag
Sonstige Titel
Zusammenfassung
Handwritten documents have served as the predominant medium for the preservation and transmission of information. Numerous works aim at implementing automatic searchability and information extraction from such documents. Additionally, a research field has emerged that deals with the generation of handwritten words and documents. The task of handwritten text generation (HTG) constitutes the generation of a realistic looking image depicting a given string in a handwriting with a desired style. In particular, the correctness and readability of the generated word as well as the imitation of the desired style pose major challenges. Generative models such as generative adversarial networks and diffusion models have been adopted to approach this task. HTG offers the promising opportunity to generate annotated training data for other document analysis models. This is particularly interesting when training data is generated for a new target dataset for which no annotated examples are available. In this thesis, an HTG system based on a latent diffusion model for the generation of training data for handwritten text recognition models is proposed. In order to generate images for an unseen target dataset, a pretrained masked autoencoder is used to extract style encodings from a set of example images. Together with embeddings of the string to be generated, these encodings are used to condition the generation process using classifier-free guidance. In order to enhance the generation quality for styles from the target dataset, two semi-supervised training schemes for the HTG model are presented in this work. These training schemes enable the model to leverage information about new styles either from examples that are only annotated with writer IDs or from examples without any annotation. The obtained HTG system is used to generate a synthetic dataset which contains samples with handwriting styles similar to those in the target dataset. A handwriting recognition model is then trained on this stylized synthetic dataset. The experimental results demonstrate the successful application of the proposed HTG model for the generation of training data for a handwriting recognition model. Even if the HTG model is trained with a dataset other than the target dataset, it is shown that a recognition model can successfully be trained using only generated training samples. Furthermore, the experiments demonstrate that including unlabeled samples from the target dataset using the proposed semi-supervised training schemes results in considerable improvements of the recognition model trained on the generated data. In summary, the HTG system presented in this thesis offers a promising approach toward the generation of training data for unseen datasets and can facilitate the training of other document analysis models.
Beschreibung
Inhaltsverzeichnis
Schlagwörter
Document analysis, Handwriting generation, Artificial intelligence, Neural networks
Schlagwörter nach RSWK
Dokumentanalyse, Handschrift, Künstliche Intelligenz, Neuronales Netz
