Proceedings:
Book One
Volume
Issue:
Proceedings of the AAAI Conference on Artificial Intelligence, 19
Track:
Learning
Downloads:
Abstract:
A given entity — representing a person, a location or an organization — may be mentioned in text in multiple, ambiguous ways. Understanding natural language requires identifying whether different mentions of a name, within and across documents, represent the same entity. We present two machine learning approaches to this problem, which we call the "Robust Reading" problem. Our first approach is a discriminative approach, trained in a supervised way. Our second approach is a generative model, at the heart of which is a view on how documents are generated and how names (of different entity types) are "sprinkled" into them. In its most general form, our model assumes: (1) a joint distribution over entities (e.g., a document that mentions President Kennedy is more likely to mention Oswald or White House than Roger Clemens), (2) an author model, that assumes that at least one mention of an entity in a document is easily identifiable, and then generates other mentions via (3) an appearance model, governing how mentions are transformed from the representative mention. We show that both approaches perform very accurately, in the range of $90%-95% F
AAAI
Proceedings of the AAAI Conference on Artificial Intelligence, 19