Abstract:
The ability of digital storytelling agents to evaluate their output is important for ensuring high-quality human-agent interactions. However, evaluating stories remains an open problem. Past evaluative techniques are either model-specific--- which measure features of the model but do not evaluate the generated stories ---or require direct human feedback, which is resource-intensive. We introduce a number of story features that correlate with human judgments of stories and present algorithms that can measure these features. We find this approach results in a proxy for human-subject studies for researchers evaluating story generation systems.
DOI:
10.1609/aiide.v14i1.13021