Standing on the Feet of Giants — Reproducibility in AI

  • Odd Erik Gundersen Norwegian University of Science and Technology

Abstract

A recent study implies that research presented at top artificial intelligence conferences is not documented well enough for the research to be reproduced. My objective was to investigate whether the quality of the documentation is the same for industry and academic research or if differences actually exist. My hypothesis is that industry and academic research presented at top artificial intelligence conferences is equally well documented. A total of 325 International Joint Conferences on Artificial Intelligence and Association for the Advancement of Artificial Intelligence research papers reporting empirical studies have been surveyed. Of these, 268 were conducted by academia, 47 were collaborations, and 10 were conducted by the industry. A set of 16 variables, which specifies how well the research is documented, was reviewed for each paper and each variable was analyzed individually. Three reproducibility metrics were used for assessing the documentation quality of each paper. The findings indicate that academic research does score higher than industry and collaborations on all three reproducibility metrics. Academic research also scores highest on 15 out of the 16 surveyed variables. The result is statistically significant for 3 out of the 16 variables, but none of the reproducibility metrics. The conclusion is that the results are not statistically significant, but still indicate that my hypothesis probably should be refuted. This is surprising, as the conferences use double-blind peer review and all research is judged according to the same standards.

Published
2019-12-20
How to Cite
Gundersen, O. E. (2019). Standing on the Feet of Giants — Reproducibility in AI. AI Magazine, 40(4), 9-23. https://doi.org/10.1609/aimag.v40i4.5185
Section
Special Topic Articles