Feature Extraction for Massive Data Mining

Authors

V. Seshadri and Raguram Sasisekharan

AT&T Bell Laboratories; Sholom M. Weiss

Rutgers University

Track:

All Contents

Downloads:

Download PDF

Abstract:

Techniques for learning from data typically require data to be in standard form. Measurements must be encoded in a numerical format such as binary true-or-false features, numerical features, or possibly numerical codes. In addition, for classification, a clear goal for learning must be specified. While some databases may readily be arranged in standard form, many others may be combinations of numerical fields or text, with thousands of possibilities for each data field, and multiple instances of the same field specification. A significant portion of the effort in real-world data mining applications involves defining, identifying, and encoding the data into suitable features. In this paper, we describe an automatic feature extraction procedure, adapted from modern text categorization techniques, that maps very large databases into manageable datasets in standard form. We describe a commercial application of this procedure to mining a collection of very large databases of home appliance service records for a major international retailer.

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.