Generation of training data for handwritten text recognition using latent diffusion models

dc.contributor.advisorFink, Gernot A.
dc.contributor.authorBrandenbusch, Kai Ingo Wilhelm
dc.contributor.refereeHarmeling, Stefan
dc.date.accepted2026-04-16
dc.date.accessioned2026-07-02T10:37:26Z
dc.date.issued2026
dc.description.abstractHandwritten documents have served as the predominant medium for the preservation and transmission of information. Numerous works aim at implementing automatic searchability and information extraction from such documents. Additionally, a research field has emerged that deals with the generation of handwritten words and documents. The task of handwritten text generation (HTG) constitutes the generation of a realistic looking image depicting a given string in a handwriting with a desired style. In particular, the correctness and readability of the generated word as well as the imitation of the desired style pose major challenges. Generative models such as generative adversarial networks and diffusion models have been adopted to approach this task. HTG offers the promising opportunity to generate annotated training data for other document analysis models. This is particularly interesting when training data is generated for a new target dataset for which no annotated examples are available. In this thesis, an HTG system based on a latent diffusion model for the generation of training data for handwritten text recognition models is proposed. In order to generate images for an unseen target dataset, a pretrained masked autoencoder is used to extract style encodings from a set of example images. Together with embeddings of the string to be generated, these encodings are used to condition the generation process using classifier-free guidance. In order to enhance the generation quality for styles from the target dataset, two semi-supervised training schemes for the HTG model are presented in this work. These training schemes enable the model to leverage information about new styles either from examples that are only annotated with writer IDs or from examples without any annotation. The obtained HTG system is used to generate a synthetic dataset which contains samples with handwriting styles similar to those in the target dataset. A handwriting recognition model is then trained on this stylized synthetic dataset. The experimental results demonstrate the successful application of the proposed HTG model for the generation of training data for a handwriting recognition model. Even if the HTG model is trained with a dataset other than the target dataset, it is shown that a recognition model can successfully be trained using only generated training samples. Furthermore, the experiments demonstrate that including unlabeled samples from the target dataset using the proposed semi-supervised training schemes results in considerable improvements of the recognition model trained on the generated data. In summary, the HTG system presented in this thesis offers a promising approach toward the generation of training data for unseen datasets and can facilitate the training of other document analysis models.en
dc.identifier.urihttp://hdl.handle.net/2003/44957
dc.identifier.urihttp://dx.doi.org/10.17877/DE290R-26724
dc.language.isoen
dc.subjectDocument analysisen
dc.subjectHandwriting generationen
dc.subjectArtificial intelligenceen
dc.subjectNeural networksen
dc.subject.ddc620
dc.subject.ddc670
dc.subject.rswkDokumentanalysede
dc.subject.rswkHandschriftde
dc.subject.rswkKünstliche Intelligenzde
dc.subject.rswkNeuronales Netzde
dc.titleGeneration of training data for handwritten text recognition using latent diffusion modelsen
dc.typeText
dc.type.publicationtypePhDThesis
dcterms.accessRightsopen access
eldorado.dnb.deposittrue
eldorado.secondarypublicationfalse

Dateien

Originalbündel

Gerade angezeigt 1 - 1 von 1
Lade...
Vorschaubild
Name:
Dissertation_Brandenbusch.pdf
Größe:
2.05 MB
Format:
Adobe Portable Document Format
Beschreibung:
DNB

Lizenzbündel

Gerade angezeigt 1 - 1 von 1
Lade...
Vorschaubild
Name:
license.txt
Größe:
4.82 KB
Format:
Item-specific license agreed upon to submission
Beschreibung: