Sung-Hyuk Cha and Sargur N. Srihari, State University of New York at Buffalo
The writer identification problem is stated as follows. There are m writing exemplars of each of n people (n = very large). Given a writing exemplar, x, of an unknown writer, the task is to determine whether x was written by any of the n writers and if so, identify the writer. A writer identifier that uses inductive hypothesis must engage statistical proof; it is necessary to determine the statistical validity of individuality in handwriting based on measurement of features, quantification, and statistical analysis. There exist various parametric and non-parametric techniques to solve the multiple category classification problem or simply called polychotomizer where the number of classes is finite and small. As the number of classes is enormously large and almost infinite, these techniques are of no use and the problem is seemingly insurmountable. For this reason, we suggest to transform a large and intractable polychotomizer to a simple dichotomizer, a classifier that places a pattern in one of only two categories: distance data between two writings of the same author and those of two different authors. In this model, we state the problem as follows; given two randomly selected handwritten documents, the writer identification problem is to determine whether the two documents were written by the same person with two types of confusion error probabilities. Experimental results with 571 writers with three sample documents per writer, using 11 feature distances, results in 97% accuracy, 3.5% type I and 2.1% type II errors.