Gene Recognition in Cyanobacterium Genomic Sequence Data Using the Hidden Markov Model

Authors

Tetsushi Yada and Makoto Hirosawa

Proceedings:

Proceedings Of The Fourth International Conference On Intelligent Systems For Molecular Biology

Volume

Issue:

Proceedings Of The Fourth International Conference On Intelligent Systems For Molecular Biology

Track:

Contents

Downloads:

Download PDF

Abstract:

We have developed a hidden Markov model (HMM) to detect the protein coding regions within one megabase contiguous sequence data, registered in a database called GenBank in eight entries, of the genome of cyanobacterium, Sgnechocystis sp. strain PCC6803. Detection of the coding regions in the database entry was performed by using HMM whose parameters were determined by taking the statistics from the rests of the entries. This HMM has states modeling the di-codons asld their frequencies within coding regions and those modeling its base contents in the intergenic regions. Results of the cross--validation showed that the HMM recognized 92.1% of coding regions assigned in sequence annotation. In addition, it suggested 9.t potential new coding regions whose length are longer than 90 bases. The recognition accuracy calculated at the level of individual bases was 90.7% for the coding regions and 88.1% for the intergenic regions. This corresponds to a correlation coefficient for coding region recognition of 0.784. Comparison with its prediction accuracy with that by GeneMark showed that the HMM has the same level of prediction accuracy as GeneMark on average. Since we can extend the HMM to utilize information such as SD sequences, the prediction accuracy of the HMM will be enhanced. It was observed that correlation was positive between the prediction rate of the coding regions and the G+C content at the ttfird position of the eodon. This suggests the possibility that the prediction rate of coding regions in the cyanobacteria sequence can be enhanced by improving the present HMM into that reflects the classification of coding regions based on the G+C content.

ISMB

Proceedings Of The Fourth International Conference On Intelligent Systems For Molecular Biology

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.