Abstract:
Information extraction (IE) systems have been tailored to extract fixed target information from documents in a fixed language. In order to be truly useful for information analysts, the target information must be user-definable and the source documents should cover multiple languages. We will map out the path toward such open-target multilingual IE systems, identifying necessary technological breakthroughs along the path. We also discuss a Japanese-English named entity extraction system under development, which represents a case of the next step along the path.