Philip M. McCarthy, Gwyneth A. Lewis, David F. Dufty, Danielle S. McNamara
Computer scientists, linguists, stylometricians, and cognitive scientists have successfully divided corpora into modes, domains, genres, registers, and authors. The limitations for these successes, however, often result from insufficient indices with which their corpora are analyzed. In this paper, we use Coh-Metrix, a computational tool that analyzes text on over 200 indices of cohesion and difficulty. We demonstrate how, with the benefit of statistical analysis, texts can be analyzed for subtle, yet meaningful differences. In this paper, we report evidence that authors within the same register can be computationally distinguished despite evidence that stylistic markers can also shift significantly over time.
Subjects: 13. Natural Language Processing; 13.1 Discourse
Submitted: Feb 9, 2006