Chem-Infty Dataset: A ground-truthed dataset of Chemical Structure Images

From TC11
Revision as of 18:02, 27 January 2011 by Dimos (talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

Datasets -> Datasets List -> Current Page

Created: 2010-06-28
Last updated: 2011-001-27

Contact Author

Koji Nakagawa(kn[at],
Faculty of Mathematics, 
Kyushu University, 
Akio Fujiyoshi(fujiyosi[at], 
Department of Computer and Information Sciences, 
Ibaraki University, 
Masakazu Suzuki(suzuki[at], 
Faculty of Mathematics, 
Kyushu University, 


Creative Commons License

This work is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 2.1 Japan License

Current Version



Optical Chemical Structure Recognition, Graphical Documents, Symbols


ChemInfty Thumb.png

This dataset consists of chemical images (dataset) and their chemical meaning (see ground truth section). The 5727 chemical images were randomly collected from Japanese published patent applications in the year 2008.

  • Number of samples in the dataset: 869
  • File format: TIFF format images including binary and greyscale.
  • File Name Convention: The file names of image files and the meta data have the following name convention:
    • 2008XXXXXX_N_chem.tif: a TIFF file
    • 2008XXXXXX_N_chem.sdf: the meta data of 2008XXXXXX_NNN_chem.tif
    • The string '2008XXXXXX' expresses the patent ID and 'N' expresses the ‘N’-th elements of the multi-tiff file (See Reference [1]).

When you use or distribute this dataset, please inform the authors of your contact information (Name, Affiliation, E-mail address).

Disclaimer: Although the authors tried their best to provide an error-free dataset, there might be some incorrect data. If you encounter any such errors, please report them back to the authors so that the data can be updated.

Related Datasets

Related Ground Truth Data

Related Tasks

None defined


  1. Koji Nakagawa, Akio Fujiyoshi, and Masakazu Suzuki. Ground-Truthed Dataset of Chemical Structure Images in Japanese Published Patent Applications. In the proceedings of the 9th International Workshop on Document Analysis Systems (DAS'2010), pp 455-462, June 9-11, 2010, Boston, MA, USA.
  2. Akio Fujiyoshi, Koji Nakagawa, and Masakazu Suzuki. Robust Recognition Method of Chemical Structure Images for Japanese Published Patent Applications. Available as a short paper in the web page of the 9th International Workshop on Document Analysis Systems (DAS'2010), June 9-11, 2010, Boston, MA, USA.
  3. CTfile Formats Specification

Submitted Files

Version 1.0

This page is editable only by TC11 Officers .