Ping Chen, Hisham Al-Mubaid
The huge volumes of unstructured texts available online drives the increasing need for automated techniques to analyze and extract knowledge from these repositories of information. Resolving the ambiguity in these texts is an important step for any following analysis tasks. In this paper, we present a new method for one type of ambiguity resolving -- term disambiguation. The method is based on machine learning and can be viewed as a context-based classification approach. In our experiments we apply it to gene and protein name disambiguation. We have extensively evaluated our method using around 600,000 Medline abstracts and three different classifiers. The results show that our technique is effective in achieving impressive accuracy, precision, and recall rates, and outperforms the recently published results on this problem. The paper includes the details of the method and the experimental design. We plan to apply our technique to the general domain of word sense disambiguation in the future.
Subjects: 12. Machine Learning and Discovery; 13. Natural Language Processing
Submitted: Jan 23, 2006