CVL-Database - An Off-line Database for Writer Retrieval, Writer Identification and Word
Markus Diem Stefan Fiel Florian Kleber Robert Sablatnig email@example.com
CVL Database is licensed under a Creative Commons Attribution-NonCommercial 3.0 Unported License
This database may be used for non-commercial research purpose only. If you publish material based
on this database, we request you to include a reference to the publication listed below.
Writer Identification, Word Spotting, Cursive Handwriting
The CVL Database is a public database for writer retrieval, writer identification and word
spotting. The database consists of 7 different handwritten texts (1 German and 6 Englisch Texts)
and 309 different writers. For each text a rgb color image (300 dpi) comprising the handwritten
text and the printed text sample is available as well as a cropped version (only handwritten). An
unique id identifies the writer, whereas the Bounding Boxes for each single word are stored in an
The CVL-database consists of images with cursively handwritten german and english texts which has
been choosen from literary works. All pages have an unique writer id and the text number
(separated by a dash) at the upper right corner, followed by the printed sample text. The text is
placed between two horizontal separatores. Beneath the printed text individuals have been asked
to write the text using a ruled undersheet to prevent curled text lines. The layout follows the
style of the database.
Samples of the following texts have been used:
- Edwin A. Abbot - Flatland: A Romance of Many Dimension (92 words).
- William Shakespeare - Mac Beth (49 words).
- Wikipedia - Mailüfterl (73 words, under CC Attribution-ShareALike License).
- Charles Darwin - Origin of Species (52 words).
- Johann Wolfgang von Goethe - Faust. Eine Tragödie (50 words).
- Oscar Wilde - The Picture of Dorian Gray (66 words).
- Edgar Allan Poe - The Fall of the House of Usher (78 words).
Metadata and Technical Details
All pages have a unique writer id and the text number (separated by a dash) at the upper right
corner, followed by the printed sample text. The text is placed between two horizontal
separators. The files are named according the unique writer id and the text number. In addition,
text lines and words are extracted. Their filename convention is the same with the text line
number and word number respectively added at the end. For word images, the GT entry is the last
part of the filename. The Bounding Boxes for each single word are stored in an XML file according
the unique id.
Ground Truth Data
Markus Diem, Stefan Fiel, Florian Kleber and Robert Sablatnig, CVL-Database: An Off-line Database
for Writer Retrieval, Writer Identification and Word Spotting, In Proc. of the 12th Int.
Conference on Document Analysis and Recognition (ICDAR) 2013, forthcoming.
Please refer to [http://caa.tuwien.ac.at/cvl/research/cvl-database/index.html
http://caa.tuwien.ac.at/cvl/research/cvl-database/index.html] for downloading the files from the
origninal datasets site.
This page is editable only by TC11 Officers .