AAAI Publications, Twenty-Second International FLAIRS Conference

Font Size: 
Computational Considerations in Correcting User-Language
Adam M. Renner, Philip M. McCarthy, Danielle S. McNamara

Last modified: 2009-03-17


This study evaluates the robustness of established computational indices used to assess text relatedness in user-language. The original User-Language Paraphrase Corpus (ULPC) was compared to a corrected version, in which each paraphrase was corrected for typographical and grammatical errors. Error correction significantly affected values for each of five computational indices, indicating greater similarity of the target sentence to the corrected paraphrase than to the original paraphrase. Moreover, misspelled target words accounted for a large proportion of the differences. This study also evaluated potential effects on correlations between computational indices and human ratings of paraphrases. The corrections did not yield assessments that were any more or less comparable to trained human raters than were the original paraphrases containing typographical or grammatical errors. The results suggest that although correcting for errors may optimize certain computational indices, the corrections are not necessary for comparing the indices to expert ratings.

Full Text: PDF