When Does Imbalanced Data Require more than Cost-Sensitive Learning?

Authors

Dragos Margineantu

Track:

Contents

Downloads:

Abstract:

Most classification algorithms expect the frequency of examples form each class to be roughly the same. However, this is rarely the case for real-world data where very often the class probability distribution is nonuniform (or, imbalanced). For these applications, the main problem is usually the fact that the costs of misclassifying examples belonging to rare classes differ significantly from the costs of misclasifying examples from classes represented in a higher proportion in the data. Cost-sensitive learning studies and provides methods for the design and evaluation of classification algorithms for arbitrary cost functions. This paper outlines an issue that can occur in the imbalanced data setting but has not been studied, according to our knowledge, in the cost-sensitive learning literature---the situation when the class probability distribution on the training data differs significantly from the class probability distribution test data. We will present a brief overview of cost-sensitive learning methods applied on imbalanced data and we will extend the existing theoretical results for the setting in which training and test class priors are different.

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.