Maximal Association Rules: A New Tool for Mining for Keyword Co-Occurrences in Document Collections

Authors

Ronen Feldman

Yonatan Aumann

Amihood Amir

Amir Zilberstein

Willi Kloesgen

Track:

All Contents

Downloads:

Download PDF

Abstract:

Knowledge Discovery in Databases (KDD) focuses on the computerized exploration of large amounts of data and on the discovery of interesting patterns within them. While most work on KDD has been concerned with structured databases, there has been little work on handling the huge amount of information that is available only in unstructured document collections. This paper describes a new method for computing co-ocurrence frequencies of the various keywords labeling the documents. This method is based on computing maximal association rules. Regular associations are based on the notion of frequent sets: sets of attributes, which appear in many records. In analogy, maximal associations are based on the notion of frequent maximal sets. Conceptually, a frequent maximal set is a set of attributes, which appear alone, or maximally, in many records. For the definition of "maximality" we use an underlying taxonomy, T, of the attributes. This allows us to obtain the "interesting" correlations between attributes from different categories. Frequent maximal sets are useful for efficiently finding association rules that include negated attributes. We provide an experimental evaluation of our methodology on the Reuters-21578 document collection.

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.