Abstract:
Performance evaluations of NLP systems have been designed and conducted that require systems to extract certain prespecified information about events and entities. A single text may describe multiple events and entities, and the evaluation task requires the system to resolve references to produce the expected output.We describe an early attempt to use the results from an information extraction evaluation to provide insight notice relationship between the difficulty of discourse processing and performance on the information extraction task. We then discuss an upcoming noun phrase coreference evaluation that has been designed independently of any other evaluation task in order to establish a clear performance benchmark on a small set of discourse phenomena.