Eldorado Collection:
http://hdl.handle.net/2003/27341
2024-03-29T08:33:17ZSubword-based Stochastic Segment Modeling for Offline Arabic Handwriting Recognition
http://hdl.handle.net/2003/27564
Title: Subword-based Stochastic Segment Modeling for Offline Arabic Handwriting Recognition
Authors: Cao, Huaigu; Manohar, Vasant; Natarajan, Prem; Prasad, Rohit; Subramanian, Krishna
Abstract: In this paper, we describe several experiments in which we use a stochastic segment model (SSM) to improve offline handwriting recognition (OHR) performance. We use the SSM to re-rank (re-score) multiple decoder hypotheses. Then, a probabilistic multi-class SVM is trained to model stochastic segments obtained from force aligning transcriptions with the underlying image. We extract multiple features from the stochastic segments that are sensitive to larger context span to train the SVM. Our experiments show that using confidence scores from the trained SVM within the SSM framework can significantly improve OHR performance. We also show that OHR performance can be improved by using a combination of character-based and parts-of-Arabic-words (PAW)-based SSMs.2011-01-12T00:00:00ZArabic Handwritten Alphanumeric Character Recognition using Fuzzy Attributed Turning Functions
http://hdl.handle.net/2003/27563
Title: Arabic Handwritten Alphanumeric Character Recognition using Fuzzy Attributed Turning Functions
Authors: Mahmoud, Sabri; Parvez, Mohammad Tanvir
Abstract: In this paper, we present a novel method for recognition of unconstrained handwritten Arabic alphanumeric characters. The algorithm binarizes the character image, smoothes it and extracts its contour. A novel approach for polygonal approximation of handwritten character contours is applied. The directions and length features are extracted from the polygonal approximation. These features are used to build character models in the training phase. For the recognition purpose, we introduce Fuzzy Attributed Turning Functions (FATF) and define a dissimilarity measure based on FATF for comparing polygonal shapes. Experimental results demonstrate the effectiveness of our algorithm for recognition of handwritten Arabic characters. We have obtained around 98% accuracy for Arabic handwritten characters and more than 97% accuracy for handwritten Arabic numerals.2011-01-12T00:00:00ZArabic Handwriting Synthesis
http://hdl.handle.net/2003/27562
Title: Arabic Handwriting Synthesis
Authors: Al-Muhtaseb, Husni; Elarian, Yousef; Ghouti, Lahouari
Abstract: Training and testing data for optical character recognition are cumbersome to obtain. If large amounts of data can be produced from small amounts, much time and effort can be saved. This paper presents an approach to synthesize Arabic handwriting. We segment word images into labeled characters and then use these in synthesizing arbitrary words. The synthesized text should look natural; hence, we define some criteria to decide on what is acceptable as natural-looking.
The text that is synthesized by using the natural-looking constrain is compared to text that is synthesized without using the natural-looking constrain for evaluation.2011-01-12T00:00:00ZA Lexicon of Connected Components for Arabic Optical Text Recognition
http://hdl.handle.net/2003/27561
Title: A Lexicon of Connected Components for Arabic Optical Text Recognition
Authors: Elarian, Yousef; Idris, Fayez
Abstract: Arabic is a cursive script that lacks the ease of character segmentation. Hence, we suggest a unit that is discrete in nature, viz. the connected component, for Arabic text recognition. A lexicon listing valid Arabic connected components is necessary to any system that is to use such unit. Here, we produce and analyze a comprehensive lexicon of connected components.
A lexicon can be extracted from corpora or synthesized from morphemes. We follow both approaches and merge their results. Besides, generation of a lexicon of connected components encompasses extra tokenization and point-normalization steps to make the size of the lexicon tractable. We produce a lexicon of surface-words, reduce it into a lexicon of connected components, and finally into a lexicon of point normalized connected components. The lexicon of point normalized connected components contains 684,743 entries, showing a percent decrease of 97.17% from the word-lexicon.2011-01-12T00:00:00ZWriter Identification of Arabic Handwritten Digits
http://hdl.handle.net/2003/27560
Title: Writer Identification of Arabic Handwritten Digits
Authors: Awaida, Sameh; Mahmoud, Sabri
Abstract: This paper addresses the identification of Arabic handwritten digits. In addition to digit identifiability, the paper presents digit recognition. The digit image is divided into grids based on the distribution of the black pixels in the image. Several types of features are extracted (viz. gradient, curvature, density, horizontal and vertical run lengths, stroke, and concavity features) from the grid segments. K-Nearest Neighbor and Nearest Mean classifiers are used. A database of 70000 of Arabic handwritten digit samples written by 700 writers is used in the analysis and experimentations.
The identifiability of isolated and combined digits are tested. The analysis of the results indicates that Arabic digits 3 (٣), 4 (٤), 8 (٨), and 9 (٩) are more identifiable than other digits while Arabic digit 0 (٠) and 1 (١) are the least identifiable. In addition, the paper shows that combining the writer’s digits increases the discriminability power of Arabic handwritten digits. Combining the features of all digits, K-NN provided the best accuracy in text-independent writer identification with top-1 result of 88.14%, top-5 result of 94.81%, and top-10 results of 96.48%.2011-01-12T00:00:00ZA new System for offline Printed Arabic Recognition for Large Vocabulary : SPARLV
http://hdl.handle.net/2003/27559
Title: A new System for offline Printed Arabic Recognition for Large Vocabulary : SPARLV
Authors: Dhouib, Mariem Miledi; Kanoun, Slim
Abstract: This paper presents a contribution for the
Arabic printed recognition. In fact, we are
interested in the printed decomposable Arabic
word recognition. The proposed system uses the
analytical approach through the segmentation into
characters to succeed to a generation of letter
hypotheses as well as word hypotheses using a
lexical verification in a pre-established dictionary
of the language. Our proposed system SPARLV is
able to put valid hypotheses of words thanks to the
lexical verification.2011-01-12T00:00:00ZTowards Feature Learning for HMM-based Offline Handwriting Recognition
http://hdl.handle.net/2003/27556
Title: Towards Feature Learning for HMM-based Offline Handwriting Recognition
Authors: Fink, Gernot A.; Hammerla, Nils Y.; Plötz, Thomas; Vajda, Szilárd
Abstract: Statistical modelling techniques for automatic reading systems substantially rely on the availability of compact and meaningful feature representations. State-of-the-art feature extraction for offline handwriting recognition is usually based on heuristic approaches that describe either basic geometric properties or statistical distributions of raw pixel values. Working well on average, still fundamental insights into the nature of handwriting are desired. In this paper we present a novel approach for the automatic extraction of appearance-based representations of offline handwriting data. Given the framework of deep belief networks -- Restricted Boltzmann Machines -- a two-stage method for feature learning and optimization is developed. Given two standard corpora of both Arabic and Roman handwriting data it is demonstrated across script boundaries, that automatically learned features achieve recognition results comparable to state-of-the-art handcrafted features. Given these promising results the potential of feature learning for future reading systems is discussed.2011-01-12T00:00:00Z