Table Ground Truth for the UW3 and UNLV datasets

From TC11
Jump to: navigation, search

Datasets -> Datasets List -> Current Page

Created: 2010-05-13
Last updated: 2013-007-15

Contact Author

Asif Shahab
German Research Center for Artificial Intelligence (DFKI)
Trippstadter Straße 122
D-67663 Kaiserslautern
Tel: +49 631 20575 143


Table structure recognition, Benchmarking table recognition algorithms, Table ground truth, Table recognition dataset, Evaluation framework for table structure recognition systems


Screenshot of the T-truth software.
The different levels of Ground Truth information.

This collection contains table structure ground truth data (rows, columns, cells etc) for document images containing tables in the UNLV and UW3 datasets.

The ground truth that we provide is stored in XML format which stores row, column boundaries, bounding boxes of cells and additional attributes such as row-spanning column-spanning cells.The XML ground truth files have the same basename as the name of the corresponding image in the respective dataset.

These XML files can then be used to generate color encoded ground truth images in PNG format which can be directly used by the pixel accurate benchmarking framework described in [1]. Generation of 16bit color encoded ground truth images require the ground truth XML file and the word bounding box OCR results file. We provide these OCR result files for all the images in the dataset and each file has the same name as the basename of the image file in the original dataset.

We used the T-Truth tool, also provided below, to prepare ground truth information. The tool is easy to use and is described in [1]. We trained a user to operate the T-Truth tool and asked him to prepare the ground truth for the target images from above dataset. The ground truth for each image is stored in an XML. The ground truths were manually validated by another expert using the preview edit mode of the T-Truth tool and improper ground truths were corrected. These iterations were made several times to ensure the accuracy of the ground truth.

Tables in the UNLV dataset

The original dataset contains 2889 pages of scanned document images from variety of sources (Magazines, News papers, Business Letter, Annual Report etc). The scanned images are provided at 200 and 300 DPI resolution in bitonal, grey and fax format. There is ground truth data provided alongside the original dataset which contains manually marked zones; zone types are provided in text format.

Closer examination of the dataset reveals that there are no marked table zones in the fax images, so this subset is not considered here. The grey images are all also present in bitonal format, therefore we concentrated on bitonal documents with resolution of 300 dpi for the preparation of ground truth. We selected those images for which table zones have been marked in the ground truth. There are around 427 such images. We provide table structure ground truths for these document images.

Tables in the UW-3 dataset

The original dataset consists of 1600 skew-corrected English document images with manually edited ground-truth of entity bounding boxes. These bounding boxes enclose page frame, text and non-text zones, textlines, and words. The type of each zone (text, math, table, half-tone, ...) is also marked. There are around 120 document images containing at least one marked table zone. We provide table structure ground truth for these document images.

Technical Information

<toggledisplay> Below is an example XML file that demonstrate the syntax used, with inline comments as necessary.

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<GroundTruth InputFile="0110_099.png">
Each table begins with table tag. (x0,y0) represents the top-left positiona and
(x1,y1) represents the bottom-right position.
<Table x0="270" x1="2280" y0="1653" y1="2580">
Row boundaries are stored here, y0,y1 stores the boundary
<Row x0="270" x1="2280" y0="1784" y1="1784"/>
<Row x0="270" x1="2280" y0="1916" y1="1916"/>
<Row x0="270" x1="2280" y0="1968" y1="1968"/>
<Row x0="270" x1="2280" y0="2012" y1="2012"/>
<Row x0="270" x1="2280" y0="2090" y1="2090"/>
<Row x0="270" x1="2280" y0="2168" y1="2168"/>
<Row x0="270" x1="2280" y0="2216" y1="2216"/>
<Row x0="270" x1="2280" y0="2268" y1="2268"/>
<Row x0="270" x1="2280" y0="2336" y1="2336"/>
<Row x0="270" x1="2280" y0="2416" y1="2416"/>
<Row x0="270" x1="2280" y0="2464" y1="2464"/>
<Row x0="270" x1="2280" y0="2520" y1="2520"/>
Column boundaries are stored here, x0,x1 stores the boundary
<Column x0="710" x1="710" y0="1653" y1="2580"/>
<Column x0="1152" x1="1152" y0="1653" y1="2580"/>
<Column x0="1512" x1="1512" y0="1653" y1="2580"/>
<Column x0="1992" x1="1992" y0="1653" y1="2580"/>
Cell bounding boxes are stored here. Don't care cells represent those which are merged to form a row-spanning or a column spanning cells. StartRow, endRow combination defins a row-spanning cell and startColumn,endColumn combination defines a column-spanning cell. The 16bit color coded values are stored in the (R,G,B) format in the value string of the Cell tag.  These cells are the main information used by the evaluation framework provided with the dataset.
<Cell dontCare="false" endCol="0" endRow="0" startCol="0" startRow="0" x0="270" x1="710" y0="1653" y1="1784">(64769,257,257)</Cell>
<Cell dontCare="false" endCol="1" endRow="0" startCol="1" startRow="0" x0="710" x1="1152" y0="1653" y1="1784">(64769,257,514)</Cell>
<Cell dontCare="false" endCol="2" endRow="0" startCol="2" startRow="0" x0="1152" x1="1512" y0="1653" y1="1784">(64769,257,771)</Cell>
<Cell dontCare="false" endCol="3" endRow="0" startCol="3" startRow="0" x0="1512" x1="1992" y0="1653" y1="1784">(64769,257,1028)</Cell>
<Cell dontCare="false" endCol="4" endRow="0" startCol="4" startRow="0" x0="1992" x1="2280" y0="1653" y1="1784">(64769,257,1285)</Cell>
<Cell dontCare="false" endCol="0" endRow="1" startCol="0" startRow="1" x0="270" x1="710" y0="1784" y1="1916">(64769,514,257)</Cell>
<Cell dontCare="false" endCol="1" endRow="1" startCol="1" startRow="1" x0="710" x1="1152" y0="1784" y1="1916">(64769,514,514)</Cell>
<Cell dontCare="false" endCol="2" endRow="1" startCol="2" startRow="1" x0="1152" x1="1512" y0="1784" y1="1916">(64769,514,771)</Cell>
<Cell dontCare="false" endCol="3" endRow="1" startCol="3" startRow="1" x0="1512" x1="1992" y0="1784" y1="1916">(64769,514,1028)</Cell>
<Cell dontCare="false" endCol="4" endRow="1" startCol="4" startRow="1" x0="1992" x1="2280" y0="1784" y1="1916">(64769,514,1285)</Cell>
<Cell dontCare="false" endCol="0" endRow="2" startCol="0" startRow="2" x0="270" x1="710" y0="1916" y1="1968">(64769,771,257)</Cell>
<Cell dontCare="false" endCol="1" endRow="2" startCol="1" startRow="2" x0="710" x1="1152" y0="1916" y1="1968">(64769,771,514)</Cell>
<Cell dontCare="false" endCol="2" endRow="2" startCol="2" startRow="2" x0="1152" x1="1512" y0="1916" y1="1968">(64769,771,771)</Cell>
<Cell dontCare="false" endCol="3" endRow="2" startCol="3" startRow="2" x0="1512" x1="1992" y0="1916" y1="1968">(64769,771,1028)</Cell>
<Cell dontCare="false" endCol="4" endRow="2" startCol="4" startRow="2" x0="1992" x1="2280" y0="1916" y1="1968">(64769,771,1285)</Cell>
<Cell dontCare="false" endCol="0" endRow="3" startCol="0" startRow="3" x0="270" x1="710" y0="1968" y1="2012">(64769,1028,257)</Cell>



Related Datasets

  • UNLV Dataset (currently not available online)
  • UW-3 Dataset (currently not available online)

Related Tasks


  1. Asif Shahab, Faisal Shafait, Thomas Kieninger and Andreas Dengel, "An Open Approach towards the benchmarking of table structure recognition systems", Proceedings of DAS’10, pp. 113-120, June 9-11, 2010, Boston, MA, USA

Submitted Files

Version 1.0

This page is editable only by TC11 Officers .