Computationally Efficient Cluster Representation in Molecular Sequence Megaclassification

Authors

D. J. States

N. Harris

and L. Hunter

Proceedings:

Proceedings of the First International Conference on Intelligent Systems for Molecular Biology, 1993

Volume

Issue:

Proceedings of the First International Conference on Intelligent Systems for Molecular Biology, 1993

Track:

Contents

Downloads:

Download PDF

Abstract:

Molecular sequence megaclassitication is a technique for automated protein sequence analysis and annotation. Implementation of the method has been limited by the need to store and randomly access a database of all the sequence pair similarities. More than 80,000 protein sequences are now present in the public _databases, and the pair similarity data table for the full protein sequence database requires over 1 gigabyte of storage. In this paper we present a com- Imtationally efficient representation of groups based on a graph theory approach where sequence clusters are described by a minimal spanning tree of highest scoring similarity pairs. This representation allows a classification of N proteins to be stored in order(N) memory. The use this minimal spanning tree representation simplifies analysis of groups, the description of group characteristics and the manual correction of artifacts resulting from false hits. The new tree representation also introduces new possibilities for artifact generation in sequence classiftcation. Metheds for detecting and removing these artifacts are discussed.

ISMB

Proceedings of the First International Conference on Intelligent Systems for Molecular Biology, 1993

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.