AAAI Publications, Third Annual Symposium on Combinatorial Search

Font Size: 
The Logic of Benchmarking: A Case Against State-of-the-Art Performance
Wheeler Ruml

Last modified: 2010-08-25


This note marshals arguments for three points. First, it is better to test on small benchmark instances than to solve the largest possible ones. This eases replication and allows a more diverse set of instances to be tested. There are few conclusions that one can draw from running on large benchmarks that can't also be drawn from running on small benchmarks. Second, experimental evaluation should focus on understanding algorithm behavior and forming predictive models, rather than on achieving state-of-the-art performance on toy problems. Third, it is more important to develop search techniques that are robust across multiple domains than ones that only give state-of-the-art performance in a single domain. Robust techniques are more likely be useful to others.


heuristic search, empirical methods, benchmarking, methodology

Full Text: PDF