Yoshihiro Ohta, Yasunori Yamamoto, Tomoko Okazaki, Ikuo Uchiyama, and Toshihisa Takagi
We designed a system that acquires domain specific knowledge from human written biological papers, and we call this system IFBP (Information Finding from Biological Papers). IFBP is divided into three phases, Information Retrieval (IR), Information Extraction (IE) and Dictionary Construction (DC). We propose a query modification method using automatically constructed thesaurus for IR and a statistical keyword prediction method for IE. A dictionary of domain specific terms, which is one of the central knowledge sources for the task of knowledge acquisition, is also constructed automatically in the DC phase. IFBP is currently used for constructing the Transcription Factor DataBase (TFDB) and shows good performance. Since the model of knowledge base construction that is adopted into IFBP is carried out entirely automatically, this system can be easily ported across domains.