A significant obstacle to scientific progress in machine reading is an objective evaluation method. Precision and recall, while for the most part quantitative, are often measured with respect to some gold standard or ground truth — itself typically a human annotated corpus. For more complex tasks, such as inter-document coreference resolution, or open ended tasks such as machine reading, relying on a ground truth is often (if not always) impractical. Yet a data-driven approach still requires techniques for evaluation. To address this, we present here a new approach to evaluation of linguistic analysis implemented in a tool we have developed called COALA. The approach basically requires establishing a baseline system that produces some results, and evaluations are performed by incrementally changing that system and comparing the results manually. In order to reduce the load on the human evaluator, our tool implements basically an intelligent and task-specific diff between the two results, allowing the evaluator to focus only on the changes and evaluate them.