Genre Classification of Web Documents

Authors

Elizabeth S Boese

Adele Howe

Proceedings:

Book One

Volume

Issue:

Proceedings of the AAAI Conference on Artificial Intelligence, 20

Track:

Student Abstracts

Downloads:

Download PDF

Abstract:

Retrieving relevant documents over the Web is an overwhelming task when search engines return thousands of Web documents. Sifting through these documents is time-consuming and sometimes leads to an unsuccessful search. One problem is that most search engines rely on matching a query to documents based solely on topical keywords. However, many users of search engines have a particular genre in mind for the desired documents. The genre of a document concerns aspects of the document such as the style or readability, presentation layout, and meta-content such as words in the title or the existence of graphs or photos. By including genre in Web searches, we hypothesize that Web document retrieval could greatly improve accuracy by better matching documents to the user’s information needs. Before implementing a search engine capable of discriminating on both genre and topic, a feasibility analysis of genre classification is needed. Our previous research achieved 91% classification accuracy across ten genres, whereas similar research range between 60 and 85% accuracy. However, the ten genres used in our research were mostly distinct and only exemplar Web documents (consisting of only one genre) were chosen. This paper discusses our current work which involves an in-depth analysis of maintaining high accuracy rates among genres that are very similar.

AAAI

Proceedings of the AAAI Conference on Artificial Intelligence, 20

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.