Devanagari Character Dataset
Santosh K. C. INRIA Nancy Grand Est Reseach Centre LORIA Campus Scientifique BP - 239, 54506 Vandoeuvre-les Nancy Cedex, FRANCE E-mail: Santosh.KC@inria.fr
Online handwriting, Devangari, On-line Character Recognition
This dataset of on-line handwriteen Devangari characters is composed of 1800 samples from 36 character classes obtained by 25 native writers. Each writer was asked to provde two samples per class.
No specific directions, constraints, or instructions were given to the users, aiming for a database of completely natural handwritings.
For data collection we used a simple Graphite tablet (WCACOM ET0405A-U), which captures the pen-tip position in the form of 2D coordinates.
Metadata and Technical Details
Each character is stored in a separate file and the files are text based comma separated values. The size of each character is approximately 4KB in average (actual size varies depending on the number and size of the strokes coprising the character).
The dataset is organised in folders that reflect the 36 classes. Inside each class folder there are 50 samples. For every writer there are two samples per class denoted by userX_1 and userX_2.
The digitizer captures a series of strokes during pen movement. A string of coordinates (pen-tip positions) from pen down to pen up movement represents a stroke.
For simplicity, we have inserted the special value [−1.0, −1.0] to indicate the termination of a stroke that makes it easier to count and separate strokes in a complete character. The following is an example for a two-stroke character. It is important to note that a series of [-1.0, -1.0] can be received when writing with tremor as well as in the case where pen-tip is just above the surface of the pad. Pre-processing is left to the end-user.
Related Ground Truth Data
- Santosh K.C., Cholwich Nattee, Bart Lamiroy, 'Spatial Similarity based Stroke Number and Order Free Clustering', IAPR, 12th International Conference on Frontiers in Handwriting Recognition (ICFHR), Kolkata, India, 2010
- Santosh K.C., Cholwich Nattee, 'Template-based Nepali Handwritten Alphanumeric Character Recognition', Thammasat International Journal of Science and Technology (TIJSAT), Thailand, Vol. 12, No. 1, pp. 20 - 30, 2007
- Santosh K.C., Cholwich Nattee, 'Stroke Number and Order Free Handwriting Recognition for Nepali', 9th Pacific Rim International Conference on Artificial Intelligence (PRICAI), Springer - Lecture Notes in Computer Science (LNCS), Subseries: Lecture Notes in Artifical Intelligence (LNAI), Guilin, China, Vol. 4099, pp. 990-994, August 7 - 11, 2006
- Santosh K.C., Cholwich Nattee, 'Structural Approach on Writer Independent Nepalese Natural Handwriting Recognition', IEEE, International Conference on Cybernetics & Intelligent Systems (CIS), Bangkok, Thailand, pp. 711-716, June 7 - 9, 2006
- Santosh K.C., Cholwich Nattee, 'Effect of Pre-processing and Feature Selection in Recognition for Nepali', International Conference on Knowledge, Information, Creativity, and Support Systems (KICSS), Ayuthya, Thailand, pp. 139-146, August 1 - 4, 2006
- Devangari Characters Dataset (1.0 Mb)
- Read-Me File (0.2 Mb)
- Sample file (character CHA) (3.0 Kb)
This page is editable only by TC11 Officers .