2010 - First International Workshop on Frontiers in Arabic Handwriting Recognition

Permanent URI for this collection

http://hdl.handle.net/2003/27341

Arabic script is of substantial influence for hundreds of millions of people worldwide. Numerous applications in different areas of automated document processing require robust recognition techniques for the analysis of Arabic handwriting. Substantial challenges lie in the high variance in character appearance, the highly idiosyncratic omittance of vowels, common ambiguities in writing, or the omnipresence of touching characters. For robust Arabic handwriting recognition, massive research efforts are demanded. In order to provide a place to meet and discuss current issues related to Arabic handwriting recognition, FAHR 2010 - the 1st International Workshop on Frontiers in Arabic Handwriting Recognition - will be held in conjunction with ICPR 2010. It aims at bringing together researchers -- both from academia and industry -- as well as practitioners in the field of Arabic handwriting recognition. Given its unique location within the city of Istanbul, Turkey, at the border between the Arabian and the western world, researchers working in the field are invited to participate in FAHR 2010.

Browse

Now showing 1 - 7 of 7

Subword-based Stochastic Segment Modeling for Offline Arabic Handwriting Recognition
(2011-01-12) Cao, Huaigu; Manohar, Vasant; Natarajan, Prem; Prasad, Rohit; Subramanian, Krishna
In this paper, we describe several experiments in which we use a stochastic segment model (SSM) to improve offline handwriting recognition (OHR) performance. We use the SSM to re-rank (re-score) multiple decoder hypotheses. Then, a probabilistic multi-class SVM is trained to model stochastic segments obtained from force aligning transcriptions with the underlying image. We extract multiple features from the stochastic segments that are sensitive to larger context span to train the SVM. Our experiments show that using confidence scores from the trained SVM within the SSM framework can significantly improve OHR performance. We also show that OHR performance can be improved by using a combination of character-based and parts-of-Arabic-words (PAW)-based SSMs.
Arabic Handwritten Alphanumeric Character Recognition using Fuzzy Attributed Turning Functions
(2011-01-12) Mahmoud, Sabri; Parvez, Mohammad Tanvir
In this paper, we present a novel method for recognition of unconstrained handwritten Arabic alphanumeric characters. The algorithm binarizes the character image, smoothes it and extracts its contour. A novel approach for polygonal approximation of handwritten character contours is applied. The directions and length features are extracted from the polygonal approximation. These features are used to build character models in the training phase. For the recognition purpose, we introduce Fuzzy Attributed Turning Functions (FATF) and define a dissimilarity measure based on FATF for comparing polygonal shapes. Experimental results demonstrate the effectiveness of our algorithm for recognition of handwritten Arabic characters. We have obtained around 98% accuracy for Arabic handwritten characters and more than 97% accuracy for handwritten Arabic numerals.
Arabic Handwriting Synthesis
(2011-01-12) Al-Muhtaseb, Husni; Elarian, Yousef; Ghouti, Lahouari
Training and testing data for optical character recognition are cumbersome to obtain. If large amounts of data can be produced from small amounts, much time and effort can be saved. This paper presents an approach to synthesize Arabic handwriting. We segment word images into labeled characters and then use these in synthesizing arbitrary words. The synthesized text should look natural; hence, we define some criteria to decide on what is acceptable as natural-looking. The text that is synthesized by using the natural-looking constrain is compared to text that is synthesized without using the natural-looking constrain for evaluation.
A Lexicon of Connected Components for Arabic Optical Text Recognition
(2011-01-12) Elarian, Yousef; Idris, Fayez
Arabic is a cursive script that lacks the ease of character segmentation. Hence, we suggest a unit that is discrete in nature, viz. the connected component, for Arabic text recognition. A lexicon listing valid Arabic connected components is necessary to any system that is to use such unit. Here, we produce and analyze a comprehensive lexicon of connected components. A lexicon can be extracted from corpora or synthesized from morphemes. We follow both approaches and merge their results. Besides, generation of a lexicon of connected components encompasses extra tokenization and point-normalization steps to make the size of the lexicon tractable. We produce a lexicon of surface-words, reduce it into a lexicon of connected components, and finally into a lexicon of point normalized connected components. The lexicon of point normalized connected components contains 684,743 entries, showing a percent decrease of 97.17% from the word-lexicon.
Writer Identification of Arabic Handwritten Digits
(2011-01-12) Awaida, Sameh; Mahmoud, Sabri
This paper addresses the identification of Arabic handwritten digits. In addition to digit identifiability, the paper presents digit recognition. The digit image is divided into grids based on the distribution of the black pixels in the image. Several types of features are extracted (viz. gradient, curvature, density, horizontal and vertical run lengths, stroke, and concavity features) from the grid segments. K-Nearest Neighbor and Nearest Mean classifiers are used. A database of 70000 of Arabic handwritten digit samples written by 700 writers is used in the analysis and experimentations. The identifiability of isolated and combined digits are tested. The analysis of the results indicates that Arabic digits 3 (٣), 4 (٤), 8 (٨), and 9 (٩) are more identifiable than other digits while Arabic digit 0 (٠) and 1 (١) are the least identifiable. In addition, the paper shows that combining the writer’s digits increases the discriminability power of Arabic handwritten digits. Combining the features of all digits, K-NN provided the best accuracy in text-independent writer identification with top-1 result of 88.14%, top-5 result of 94.81%, and top-10 results of 96.48%.
A new System for offline Printed Arabic Recognition for Large Vocabulary : SPARLV
(2011-01-12) Dhouib, Mariem Miledi; Kanoun, Slim
This paper presents a contribution for the Arabic printed recognition. In fact, we are interested in the printed decomposable Arabic word recognition. The proposed system uses the analytical approach through the segmentation into characters to succeed to a generation of letter hypotheses as well as word hypotheses using a lexical verification in a pre-established dictionary of the language. Our proposed system SPARLV is able to put valid hypotheses of words thanks to the lexical verification.
Towards Feature Learning for HMM-based Offline Handwriting Recognition
(2011-01-12) Fink, Gernot A.; Hammerla, Nils Y.; Plötz, Thomas; Vajda, Szilárd
Statistical modelling techniques for automatic reading systems substantially rely on the availability of compact and meaningful feature representations. State-of-the-art feature extraction for offline handwriting recognition is usually based on heuristic approaches that describe either basic geometric properties or statistical distributions of raw pixel values. Working well on average, still fundamental insights into the nature of handwriting are desired. In this paper we present a novel approach for the automatic extraction of appearance-based representations of offline handwriting data. Given the framework of deep belief networks -- Restricted Boltzmann Machines -- a two-stage method for feature learning and optimization is developed. Given two standard corpora of both Arabic and Roman handwriting data it is demonstrated across script boundaries, that automatically learned features achieve recognition results comparable to state-of-the-art handcrafted features. Given these promising results the potential of feature learning for future reading systems is discussed.

Browse

Recent Submissions