Hsi-Chuan Hung, Richard Tzong-Han Tsai, Wen-Lian Hsu
In this paper, we focus on the identification of biomedical abstracts related to protein-protein interactions. We propose a novel feature representation, contextual-bag-of-words, to exploit named entity information. Our method outperforms well-known methods that use named entity information as additional features. Furthermore, we have improved the performance by extracting reliable and informative instances from unlabeled and likely-positive data to provide additional training data.
Subjects: 1.10 Information Retrieval; 13. Natural Language Processing
Submitted: Apr 11, 2007