Eldorado Collection:

Eldorado Collection: http://hdl.handle.net/2003/27341 2024-06-30T14:32:24Z Subword-based Stochastic Segment Modeling for Offline Arabic Handwriting Recognition http://hdl.handle.net/2003/27564 Title: Subword-based Stochastic Segment Modeling for Offline Arabic Handwriting Recognition Authors: Cao, Huaigu; Manohar, Vasant; Natarajan, Prem; Prasad, Rohit; Subramanian, Krishna Abstract: In this paper, we describe several experiments in which we use a stochastic segment model (SSM) to improve offline handwriting recognition (OHR) performance. We use the SSM to re-rank (re-score) multiple decoder hypotheses. Then, a probabilistic multi-class SVM is trained to model stochastic segments obtained from force aligning transcriptions with the underlying image. We extract multiple features from the stochastic segments that are sensitive to larger context span to train the SVM. Our experiments show that using confidence scores from the trained SVM within the SSM framework can significantly improve OHR performance. We also show that OHR performance can be improved by using a combination of character-based and parts-of-Arabic-words (PAW)-based SSMs. 2011-01-12T00:00:00Z Arabic Handwritten Alphanumeric Character Recognition using Fuzzy Attributed Turning Functions http://hdl.handle.net/2003/27563 Title: Arabic Handwritten Alphanumeric Character Recognition using Fuzzy Attributed Turning Functions Authors: Mahmoud, Sabri; Parvez, Mohammad Tanvir Abstract: In this paper, we present a novel method for recognition of unconstrained handwritten Arabic alphanumeric characters. The algorithm binarizes the character image, smoothes it and extracts its contour. A novel approach for polygonal approximation of handwritten character contours is applied. The directions and length features are extracted from the polygonal approximation. These features are used to build character models in the training phase. For the recognition purpose, we introduce Fuzzy Attributed Turning Functions (FATF) and define a dissimilarity measure based on FATF for comparing polygonal shapes. Experimental results demonstrate the effectiveness of our algorithm for recognition of handwritten Arabic characters. We have obtained around 98% accuracy for Arabic handwritten characters and more than 97% accuracy for handwritten Arabic numerals. 2011-01-12T00:00:00Z Arabic Handwriting Synthesis http://hdl.handle.net/2003/27562 Title: Arabic Handwriting Synthesis Authors: Al-Muhtaseb, Husni; Elarian, Yousef; Ghouti, Lahouari Abstract: Training and testing data for optical character recognition are cumbersome to obtain. If large amounts of data can be produced from small amounts, much time and effort can be saved. This paper presents an approach to synthesize Arabic handwriting. We segment word images into labeled characters and then use these in synthesizing arbitrary words. The synthesized text should look natural; hence, we define some criteria to decide on what is acceptable as natural-looking. The text that is synthesized by using the natural-looking constrain is compared to text that is synthesized without using the natural-looking constrain for evaluation. 2011-01-12T00:00:00Z A Lexicon of Connected Components for Arabic Optical Text Recognition http://hdl.handle.net/2003/27561 Title: A Lexicon of Connected Components for Arabic Optical Text Recognition Authors: Elarian, Yousef; Idris, Fayez Abstract: Arabic is a cursive script that lacks the ease of character segmentation. Hence, we suggest a unit that is discrete in nature, viz. the connected component, for Arabic text recognition. A lexicon listing valid Arabic connected components is necessary to any system that is to use such unit. Here, we produce and analyze a comprehensive lexicon of connected components. A lexicon can be extracted from corpora or synthesized from morphemes. We follow both approaches and merge their results. Besides, generation of a lexicon of connected components encompasses extra tokenization and point-normalization steps to make the size of the lexicon tractable. We produce a lexicon of surface-words, reduce it into a lexicon of connected components, and finally into a lexicon of point normalized connected components. The lexicon of point normalized connected components contains 684,743 entries, showing a percent decrease of 97.17% from the word-lexicon. 2011-01-12T00:00:00Z Writer Identification of Arabic Handwritten Digits http://hdl.handle.net/2003/27560 Title: Writer Identification of Arabic Handwritten Digits Authors: Awaida, Sameh; Mahmoud, Sabri Abstract: This paper addresses the identification of Arabic handwritten digits. In addition to digit identifiability, the paper presents digit recognition. The digit image is divided into grids based on the distribution of the black pixels in the image. Several types of features are extracted (viz. gradient, curvature, density, horizontal and vertical run lengths, stroke, and concavity features) from the grid segments. K-Nearest Neighbor and Nearest Mean classifiers are used. A database of 70000 of Arabic handwritten digit samples written by 700 writers is used in the analysis and experimentations. The identifiability of isolated and combined digits are tested. The analysis of the results indicates that Arabic digits 3 (٣), 4 (٤), 8 (٨), and 9 (٩) are more identifiable than other digits while Arabic digit 0 (٠) and 1 (١) are the least identifiable. In addition, the paper shows that combining the writer’s digits increases the discriminability power of Arabic handwritten digits. Combining the features of all digits, K-NN provided the best accuracy in text-independent writer identification with top-1 result of 88.14%, top-5 result of 94.81%, and top-10 results of 96.48%. 2011-01-12T00:00:00Z A new System for offline Printed Arabic Recognition for Large Vocabulary : SPARLV http://hdl.handle.net/2003/27559 Title: A new System for offline Printed Arabic Recognition for Large Vocabulary : SPARLV Authors: Dhouib, Mariem Miledi; Kanoun, Slim Abstract: This paper presents a contribution for the Arabic printed recognition. In fact, we are interested in the printed decomposable Arabic word recognition. The proposed system uses the analytical approach through the segmentation into characters to succeed to a generation of letter hypotheses as well as word hypotheses using a lexical verification in a pre-established dictionary of the language. Our proposed system SPARLV is able to put valid hypotheses of words thanks to the lexical verification. 2011-01-12T00:00:00Z Towards Feature Learning for HMM-based Offline Handwriting Recognition http://hdl.handle.net/2003/27556 Title: Towards Feature Learning for HMM-based Offline Handwriting Recognition Authors: Fink, Gernot A.; Hammerla, Nils Y.; Plötz, Thomas; Vajda, Szilárd Abstract: Statistical modelling techniques for automatic reading systems substantially rely on the availability of compact and meaningful feature representations. State-of-the-art feature extraction for offline handwriting recognition is usually based on heuristic approaches that describe either basic geometric properties or statistical distributions of raw pixel values. Working well on average, still fundamental insights into the nature of handwriting are desired. In this paper we present a novel approach for the automatic extraction of appearance-based representations of offline handwriting data. Given the framework of deep belief networks -- Restricted Boltzmann Machines -- a two-stage method for feature learning and optimization is developed. Given two standard corpora of both Arabic and Roman handwriting data it is demonstrated across script boundaries, that automatically learned features achieve recognition results comparable to state-of-the-art handcrafted features. Given these promising results the potential of feature learning for future reading systems is discussed. 2011-01-12T00:00:00Z