Smokey: Automatic Recognition of Hostile Messages

Ellen Spertus

Abusive messages (flames) can be both a source of frustration and a waste of time for Internet users. This paper describes some approaches to flame recognition, including a prototype system, Smokey. Smokey builds a 47-element feature vector based on the syntax and semantics of each sentence, combining the vectors for the sentences within each message. A training set of 720 messages was used by Quinlan’s C4.5 decision-tree generator to determine feature-based rules that were able to correctly categorize 64% of the flames and 98% of the non-flames in a seperate test set of 460 messages. Additional techniques for greater accuracy and user customization are also discussed.


This page is copyrighted by AAAI. All rights reserved. Your use of this site constitutes acceptance of all of AAAI's terms and conditions and privacy policy.