ETCHA Sketches: Lessons Learned from Collecting Sketch Data

Mike Oltmans, Christine Alvarado, and Randall Davis

We present ETCHA Sketches—an Experimental Test Corpus of Hand Annotated Sketches—with the goal of facilitating the development of a standard test corpus for sketch understanding research. To date we have collected sketches from four domains: circuit diagrams, family trees, floor plans and geometric configurations. We have also labeled many of the strokes in these data sets with geometric primitive labels (e.g., line, arc, polyline, polygon, and ellipse). We found accurate labeling of data to be a more complex task than may be anticipated. The complexity arises because labeled data can be used for different purposes with different requirements, and because some strokes are ambiguous and can legitimately be put into multiple categories. We discuss several different labeling methods and some properties of the sketches that became apparent from the process of collecting and labeling the data. The data sets are available online at

