Predicting Location and Structure Of [[beta]]-Sheet Regions Using Stochastic Tree Grammars

Authors

Hiroshi Mamitsuka and Naoki Abe

Proceedings:

Proceedings Of The Second International Conference On Intelligent Systems For Molecular Biology

Volume

Issue:

Proceedings Of The Second International Conference On Intelligent Systems For Molecular Biology

Track:

Contents

Downloads:

Download PDF

Abstract:

We describe and demonstrate the effectiveness of a method of predicting protein secondary structures, flsheet regions in particular, using a class of stochastic tree grammars aa representational language for their amino acid sequence patterns. The family of stochastic tree grammars we use, the Stochastic Ranked Node Rewriting Grammars (SRNRG), is one of the rare families of stochastic grammars that are expressive enough to capture the kind of long-distance dependencies exhibited by the sequences of fl-sheet regions, and at the same time enjoy relatively efficient processing. We applied our method on real data obtained from the HSSP database and the results obtained are encouraging: Using an SRNRG trained by data of a particular protein, our method was actually able to predict the location and structure of fl-sheet regions in a number of different proteins, whose sequences are less than 25 per cent homologous to the training sequences. The learning algorithm we use is an extension of the Inside- Outside algorithm for stochastic context free grammars, but with a number of significant modifications. First, we restricted the grammars used to be members of the linear subclass of SRNRG, and devised simpler and faster algorithms for this subclass. Secondly, we reduced the alphabet size (i.e. the number of amino acids) by clustering them using their physicochemical properties, gradually through the iterations of the learning algorithm. Finally, we parallelized our parsing algorithm to run on a highly parallel computer, a 32-processor CM-5, and were able to obtain a nearly linear speed-up. We emphasize that our prediction method already goes beyond what is possible by the homology-based approaches. We also stress that our method can predict the structure as well as the location of fl-sheet regions, which was not possible by previous inverse protein folding methods.

ISMB

Proceedings Of The Second International Conference On Intelligent Systems For Molecular Biology

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.