Abstract:
Despite the tremendous amount of information on Wikipedia, only a very small amount is structured. Most of the information is embedded in unstructured text and extracting it is a non trivial challenge. In this paper, we propose a full pipeline built on top of DeepDive to successfully extract meaningful relations from the Wikipedia text corpus. We evaluated the system by extracting company-founders and family relations from the text. As a result, we extracted more than 140,000 distinct relations with an average precision above 90%.
DOI:
10.1609/icwsm.v10i2.14836