MSRA Text Detection 500 Database (MSRA-TD500) - Revision history

Dimos: /* Text Detection in Natural Images */

2012-10-29T16:40:29Z

‎Text Detection in Natural Images

Dimos: /* Description */

2012-10-28T13:17:31Z

‎Description

Dimos: /* Metadata and Ground Truth Data */

2012-10-28T13:17:19Z

‎Metadata and Ground Truth Data

Dimos: /* Keywords */

2012-10-28T13:15:59Z

‎Keywords

Dimos: /* Version 1.0 */

2012-10-27T12:04:25Z

‎Version 1.0

Dimos at 10:36, 27 October 2012

2012-10-27T10:36:06Z

Dimos: /* Text Detection in Natural Images */

2012-10-27T10:28:14Z

‎Text Detection in Natural Images

Dimos: Created page with "Datasets -> Datasets List -> Current Page {| style="width: 100%" |- | align="right" | {| |- | '''Created: '''2012-10-26 |- | {{Last updated}} |} |} =Contact Author=…"

2012-10-27T10:26:39Z

Created page with "Datasets -> Datasets List -> Current Page {| style="width: 100%" |- | align="right" | {| |- | '''Created: '''2012-10-26 |- | {{Last updated}} |} |} =Contact Author=…"

New page

[[Datasets]] -> [[Datasets List]] -> Current Page

{| style="width: 100%"
|-
| align="right" |

{|
|-
| '''Created: '''2012-10-26
|-
| {{Last updated}}
|}

|}

=Contact Author=
Cong Yao
Huazhong University of Science and Technology
Email: yaocong2010@gmail.com

=Current Version=
1.0

=Keywords=
OCR, Real Scene, Urban Scene, Scene Text, Word Spotting, Scene Text Recognition, Scene Text Detection, Scene Text Localization

=Description=
[[Image:MSRA-TD500 Example.jpg|400px|thumb|right| Figure 1. Typical images from MSRA-TD500. Notice the red rectangles. They indicate the texts within them are labelled as difficult (due to blur or occlusion).]]
MSRA Text Detection 500 Database (MSRA-TD500) is collected and released publicly as a benchmark to evaluate text detection algorithms, for the purpose of tracking the recent progresses in the field of text detection in natural images, especially the advances in detecting texts of arbitrary orientations.

MSRA Text Detection 500 Database (MSRA-TD500) contains 500 natural images, which are taken from indoor (office and mall) and outdoor (street) scenes using a packet camera. The indoor images are mainly signs, doorplates and caution plates while the outdoor images are mostly guide boards and billboards in complex background. The resolutions of the images vary from 1296x864 to 1920x1280.
The dataset is very challenging because of both the diversity of the texts and the complexity of the backgrounds in the images. The texts may be in different languages (Chinese, English or mixture of both), fonts, sizes, colors and orientations. The backgrounds may contain vegetation (e.g. trees and grasses) and repeated patterns (e.g. windows and bricks), which are not so distinguishable from text.

The dataset is divided into two parts: training set and test set. The training set contains 300 images randomly selected from the original dataset and the rest 200 images constitute the test set. All the images in this dataset are fully annotated. The basic unit in this dataset is text line (see Figure 1) rather than word, which is used in the ICDAR datasets, because it is hard to partition Chinese text lines into individual words based on their spacings; even for English text lines, it is non-trivial to perform word partition without high level information.

=Metadata and Ground Truth Data=
[[Image:MSRA-TD500 GT Sample1.jpg|400px|thumb|right| Figure 2. Ground truth generation. (a) Human annotations. The annotators are required to locate and bound each text line using a four-vertex polygon (red dots and yellow lines). (b) Ground truth rectangles (green). The ground truth rectangle is generated automatically by fitting a minimum area rectangle using the polygon.]]

The procedure of ground truth generation is shown in Figure 2. While current evaluation methods for text detection are designed for horizontal texts only, we proposed a new evaluation protocol (see [[#References|[1]]] for details). Minimum area rectangles are used in our protocol because they (green rectangles in Figure 2 (b)) are much tighter than axis-aligned rectangles (red rectangles in Figure 2 (b)).

In particular, to accommodate difficult texts (too small, occluded, blurry, or truncated) that are hard for text detection algorithms, each text considered to be difficult is given an additional “difficult” label (note the red rectangles in Figure 1). Detection misses of such difficult texts will not be punished.

==Format of the ground truth files==
[[Image:MSRA-TD500 GT Sample2.jpg|400px|thumb|right| Figure 3. Illustration of the ground truth file format. The index field can be ignored. The difficult label is “1” if the text is labeled as “difficult” and “0” otherwise.]]
Each image in the database corresponds to a ground truth file, in which each line records the information of one text. The format of the ground truth files is illustrated in Figure 3.
----

=Related Tasks=
==Text Detection in Natural Images==
'Purpose:' to localize the positions and estimate the extents of texts in natural images
'Importance:' Understanding text information embedded in natural scene is of great importance, as it has a large number of applications, for instance, image understanding, image and video search, geo-locating, and navigation
'Evaluation Protocol:' The evaluation protocol is stated in detail in [[#References|[1]]].

=References=
# C. Yao, X. Bai, W. Liu, Y. Ma and Z. Tu. Detecting Texts of Arbitrary Orientations in Natural Images. CVPR 2012 [http://www.iapr-tc11.org/dataset/MSRA-TD500/Detecting_Texts_of_Arbitrary_Orientations_in_Natural_Images.pdf (PDF)].

=Download=
==Version 1.0==
* [http://www.iapr-tc11.org/dataset/MSRA-TD500/MSRA-TD500.zip The complete MSRA-TD500 dataset along with ground truth files] (98 MB)

----
This page is editable only by [[IAPR-TC11:Reading_Systems#TC11_Officers|TC11 Officers ]].

@@ Line 49: / Line 49: @@
 =Related Tasks=
 ==Text Detection in Natural Images==
-''Purpose:'' to localize the positions and estimate the extents of texts in natural images
+''Purpose:'' To localize the positions and estimate the extents of texts in natural images
 ''Importance:'' Understanding text information embedded in natural scene is of great importance, as it has a large number of applications, for instance, image understanding, image and video search, geo-locating, and navigation

@@ Line 26: / Line 26: @@
 =Description=
-[[Image:MSRA-TD500 Example.jpg|400px|thumb|right| Figure 1. Typical images from MSRA-TD500. Notice the red rectangles. They indicate the texts within them are labelled as difficult (due to blur or occlusion).]]
+[[Image:MSRA-TD500 Example.jpg|600px|thumb|right| Figure 1. Typical images from MSRA-TD500. Notice the red rectangles. They indicate the texts within them are labelled as difficult (due to blur or occlusion).]]
 The MSRA Text Detection 500 Database (MSRA-TD500) is collected and released publicly as a benchmark to evaluate text detection algorithms, for the purpose of tracking the recent progresses in the field of text detection in natural images, especially the advances in detecting texts of arbitrary orientations.

@@ Line 36: / Line 36: @@
 =Metadata and Ground Truth Data=
-[[Image:MSRA-TD500 GT Sample1.jpg|400px|thumb|right| Figure 2. Ground truth generation. (a) Human annotations. The annotators are required to locate and bound each text line using a four-vertex polygon (red dots and yellow lines). (b) Ground truth rectangles (green). The ground truth rectangle is generated automatically by fitting a minimum area rectangle using the polygon.]]
+[[Image:MSRA-TD500 GT Sample1.jpg|600px|thumb|right| Figure 2. Ground truth generation. (a) Human annotations. The annotators are required to locate and bound each text line using a four-vertex polygon (red dots and yellow lines). (b) Ground truth rectangles (green). The ground truth rectangle is generated automatically by fitting a minimum area rectangle using the polygon.]]
 The procedure of ground truth generation is shown in Figure 2. While current evaluation methods for text detection are designed for horizontal texts only, we proposed a new evaluation protocol (see [[#References|[1]]] for details). Minimum area rectangles are used in our protocol because they (green rectangles in Figure 2 (b)) are much tighter than axis-aligned rectangles (red rectangles in Figure 2 (b)).
@@ Line 43: / Line 43: @@
 ==Format of the ground truth files==
-[[Image:MSRA-TD500 GT Sample2.jpg|400px|thumb|right| Figure 3. Illustration of the ground truth file format. The index field can be ignored. The difficult label is “1” if the text is labeled as “difficult” and “0” otherwise.]]
+[[Image:MSRA-TD500 GT Sample2.jpg|600px|thumb|right| Figure 3. Illustration of the ground truth file format. The index field can be ignored. The difficult label is “1” if the text is labeled as “difficult” and “0” otherwise.]]
 Each image in the database corresponds to a ground truth file, in which each line records the information of one text. The format of the ground truth files is illustrated in Figure 3.
 ----

@@ Line 23: / Line 23: @@
 =Keywords=
-OCR, Real Scene, Urban Scene, Scene Text, Word Spotting, Scene Text Recognition, Scene Text Detection, Scene Text Localization
+Text Detection, Natural Image, Arbitrary Orientation
 =Description=

@@ Line 60: / Line 60: @@
 =Download=
 ==Version 1.0==
-* [http://www.iapr-tc11.org/dataset/MSRA-TD500/MSRA-TD500.zip The complete MSRA-TD500 dataset along with ground truth files] (98 MB)
+* [http://www.iapr-tc11.org/dataset/MSRA-TD500/MSRA-TD500.zip The complete MSRA-TD500 dataset along with ground truth files] (96 MB)
 ----
 This page is editable only by [[IAPR-TC11:Reading_Systems#TC11_Officers|TC11 Officers ]].

@@ Line 27: / Line 27: @@
 =Description=
 [[Image:MSRA-TD500 Example.jpg|400px|thumb|right| Figure 1. Typical images from MSRA-TD500. Notice the red rectangles. They indicate the texts within them are labelled as difficult (due to blur or occlusion).]]
-MSRA Text Detection 500 Database (MSRA-TD500) is collected and released publicly as a benchmark to evaluate text detection algorithms, for the purpose of tracking the recent progresses in the field of text detection in natural images, especially the advances in detecting texts of arbitrary orientations.
+The MSRA Text Detection 500 Database (MSRA-TD500) is collected and released publicly as a benchmark to evaluate text detection algorithms, for the purpose of tracking the recent progresses in the field of text detection in natural images, especially the advances in detecting texts of arbitrary orientations.
-MSRA Text Detection 500 Database (MSRA-TD500) contains 500 natural images, which are taken from indoor (office and mall) and outdoor (street) scenes using a packet camera. The indoor images are mainly signs, doorplates and caution plates while the outdoor images are mostly guide boards and billboards in complex background. The resolutions of the images vary from 1296x864 to 1920x1280.
+The MSRA Text Detection 500 Database (MSRA-TD500) contains 500 natural images, which are taken from indoor (office and mall) and outdoor (street) scenes using a pocket camera. The indoor images are mainly signs, doorplates and caution plates while the outdoor images are mostly guide boards and billboards in complex background. The resolutions of the images vary from 1296x864 to 1920x1280.
-The dataset is divided into two parts: training set and test set. The training set contains 300 images randomly selected from the original dataset and the rest 200 images constitute the test set. All the images in this dataset are fully annotated. The basic unit in this dataset is text line (see Figure 1) rather than word, which is used in the ICDAR datasets, because it is hard to partition Chinese text lines into individual words based on their spacings; even for English text lines, it is non-trivial to perform word partition without high level information.
+The dataset is challenging because of both the diversity of the texts and the complexity of the background in the images. The text may be in different languages (Chinese, English or mixture of both), fonts, sizes, colors and orientations. The background may contain vegetation (e.g. trees and bushes) and repeated patterns (e.g. windows and bricks), which are not so distinguishable from text.
 =Metadata and Ground Truth Data=
@@ Line 39: / Line 40: @@
 The procedure of ground truth generation is shown in Figure 2. While current evaluation methods for text detection are designed for horizontal texts only, we proposed a new evaluation protocol (see [[#References|[1]]] for details). Minimum area rectangles are used in our protocol because they (green rectangles in Figure 2 (b)) are much tighter than axis-aligned rectangles (red rectangles in Figure 2 (b)).
-In particular, to accommodate difficult texts (too small, occluded, blurry, or truncated) that are hard for text detection algorithms, each text considered to be difficult is given an additional “difficult” label (note the red rectangles in Figure 1). Detection misses of such difficult texts will not be punished.
+In particular, to accommodate difficult text (too small, occluded, blurry, or truncated) that is hard for text detection algorithms, each text instance considered to be difficult is given an additional “difficult” label (note the red rectangles in Figure 1). Detection misses of such difficult texts will not be punished.
 ==Format of the ground truth files==

@@ Line 48: / Line 48: @@
 =Related Tasks=
 ==Text Detection in Natural Images==
-'Purpose:' to localize the positions and estimate the extents of texts in natural images
+''Purpose:'' to localize the positions and estimate the extents of texts in natural images
 'Importance:' Understanding text information embedded in natural scene is of great importance, as it has a large number of applications, for instance, image understanding, image and video search, geo-locating, and navigation
-'Evaluation Protocol:' The evaluation protocol is stated in detail in [[#References|[1]]].
+''Importance:'' Understanding text information embedded in natural scene is of great importance, as it has a large number of applications, for instance, image understanding, image and video search, geo-locating, and navigation
 =References=