Ground Truth for LRDE DBD text line localization

From TC11
Revision as of 17:16, 30 May 2013 by Liwicki (talk | contribs)
Jump to: navigation, search

Datasets -> Datasets List -> Current Page

Created: 2010-08-03
Last updated: 2013-005-30

Keywords

scanned, magazine, documents, text line localization


Description

Text Lines Localization Information has been made available by applying text line localization algorithms. The size category of the text depends on the x-height and is considered with the following rule: 0 < small <= 30 < medium <= 55 < large < +inf

  • 123 large text lines localization (clean)
  • 320 medium text lines localization (clean).
  • 9551 small text lines localization (clean).
  • 123 large text lines localization (original).
  • 320 medium text lines localization (original).
  • 9551 small text lines localization (original).
  • 123 large text lines localization (scanned).
  • 320 medium text lines localization (scanned).
  • 9551 small text lines localization (scanned).

The text lines dataset covers only a subset of the full-document dataset. It is generated from the binarization of the full-document images. Text line localizations are stored as bounding box coordinates in text files.


Purpose of the three document qualities :

  • Original : evaluate the binarization quality on perfect documents mixing text and images.
  • Clean : evaluate the binarization quality on perfect document with text only.
  • Scanned : evaluate the binarization quality on slightly degraded documents with text only.

Related Dataset

Related Tasks

  • none

Submitted Files


This page is editable only by TC11 Officers .