DAS-Discussion: Datasets, Benchmarks, Competition, and Continuity of Research

DAS Working Subgroup Meeting: Datasets and Benchmarks

Authors:

Bart Lamiroy (Secretary) – Université de Lorraine

Further Participants:

Elisa Barney Smith – Boise State University
Abdel Belaïd – Université de Lorraine
John Fletcher – Canon
Liangcai Gao – Peking University
Albert Gordo – CVC Barcelona
Masakazu Iwamura – Osaka Prefecture University
Dan Lopresti – Lehigh University
Tomohsa Matsushita – Tokyo University of Agriculture and Technology
Jean-‐Yves Ramel – Université de Tours
Marc-‐Peter Schambach – Siemens
Ray Smith (Moderator) – Google Inc.

Context

The goal of this discussion group is to address the availability, use and dissemination of benchmarks, datasets and ground truth in order to promote subjective and reproducible assessment of document analysis methods, collaboration and exchange of research results in the document analysis domain. The main idea is that “what you measure is what improves”, and that it is difficult to obtain reliable measures expressing the global progress of the state‐of‐the‐art.

Topic Discussion History

As a brief reminder of the evolution of this topic as discussed during other DAS editions, we refer the interested reader to the TC‐11 website. In 2010 the main focus of discussion essentially related to making datasets other reference material available to the community and how to provide centralized access to it, how to credit and value contributors and how to maintain a level of control (data curation, availability over time, …) that would insure that the data and algorithms remain usable and useful over an as long as possible period of time. The reported discussions were essentially concerned with feasibility of these concepts, rather than impact, and focused on the TC‐11 initiative of data collection and the DAE platform (http://dae.cse.lehigh.edu).

Discussion Topics

During the DAS 1012 edition, the following potential discussion topics were identified after a short brainstorming session, ranked by order of (subjectively) perceived importance: 1. When is a problem stated? Should CFPs be more specific to what topics to address and how they should (could) be measured? How does this relate to hosting competitions? Interaction with whole or end‐to‐end evaluation systems. 2. What are the fundamental reasons to the perceived difficulties to sharing data sets? (public vs. copyright vs. privacy) 3. Would it be a good idea to more formally integrate the availability of data sets and reports of benchmarking into the acceptance criteria for publications. 4. Is there a risk of data sets directing research? Is this good or bad? 5. Open binaries/open source?

When is a Problem Stated?

This question is considered by the discussion panel members as an essential preliminary step to D. Lopresti and G. Nagy's paper “When is a Problem Solved” in ICDAR 2011, and relates to the initially identified issue concerning the difficulty of measuring the overall contributions of individual research results to the improvement of the global state‐of‐the‐art. Stating a problem is related to measuring some level of achievement, and therefore directly correlated to expressing ground truth. One may conjecture that a problem is stated when there is consensus on the ground truth on the one hand and there is a data set collection of statistically proven significance. Measurement of advancement toward solving a stated problem would then consist of:

track record of results over time,
defined best practices by the community,

This means that the evolution of the best practices (and the track record of the results) could give a more precise view of the improvement of the agreed upon state‐the‐art. This also means that there is a need of commenting and annotating the reference data sets by the community and also that there may be a need to evaluate individual research results within the scope of broader criteria (e.g. contribution in end‐to‐end application evaluation) The general consensus of the discussion panel is that there might be an interest in experimenting a more formal approach to managing tracks in conferences and acceptance criteria to particular events or publications, by clearly stating (at the time of the CFP) the benchmark to which contributions need to measured. This could consist of:

specific problem statements,
hosting competitions in direct relation with the track or conference and creating strong incentives for all submissions to compete,
ensuring continuity of both data sets, ground truth, and algorithm availability year after year,
requiring that reviewers have reasonable access to the data sets and have the means of checking the reported results.

However, it is extremely important to stress that this should never be the sole criteria for acceptance and publication of papers since there is a significant risk of limiting innovating non‐mainstream approaches and the emergence of investigations into new (previously not considered, or considered uninteresting) problems. This is discussed in one of items developed below.

Navigation menu