Making Evaluation Robust but Robust to What?

Authors

Chris Drummond

Track:

Contents

Downloads:

Abstract:

Generalization is at the core of evaluation, we estimate the performance of a model on data we have never seen but expect to encounter later on. Our current evaluation procedures assume that the data already seen is a random sample of the domain from which all future data will be drawn. Unfortunately, in practical situations this is rarely the case. Changes in the underlying probabilities will occur and we must evaluate how robust our models to such differences. This paper takes the position that models should be robust in two senses. Firstly, that any small changes in the joint probabilities should not cause large changes in performance. Secondly, that when the dependencies between attributes and the class are constant and only the marginals change, simple adjustments should be sufficient to restore a model's performance. This paper is intended to generate debate on how measures of robustness might become part of our normal evaluation procedures. Certainly some clear demonstrations of robustness would improve our confidence in our models' practical merits.

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.