Peifeng Li, Junhui Li, Qiaoming Zhu
This paper puts forward a hierarchical approach for categorizing emails with the ME model based on its contents and properties. This approach categorizes emails in a two-phase way. First, it divides emails into two sets: legitimate set and Spam set; then it categorizes emails in two different sets with different feature selection methods. In addition, the pre-processing, the construction of features and the ME model suitable for the email categorization are also described in building the categorizer. Experimental results show that the hierarchical approach is more efficient than the previous approach and the feature selection is an important factor that affects the precision of email categorization.
Subjects: 13. Natural Language Processing; 1. Applications
Submitted: Feb 8, 2007