Literature DB >> 16613911

Using hidden Markov models and observed evolution to annotate viral genomes.

Stephen McCauley1, Jotun Hein.   

Abstract

MOTIVATION: ssRNA (single stranded) viral genomes are generally constrained in length and utilize overlapping reading frames to maximally exploit the coding potential within the genome length restrictions. This overlapping coding phenomenon leads to complex evolutionary constraints operating on the genome. In regions which code for more than one protein, silent mutations in one reading frame generally have a protein coding effect in another. To maximize coding flexibility in all reading frames, overlapping regions are often compositionally biased towards amino acids which are 6-fold degenerate with respect to the 64 codon alphabet. Previous methodologies have used this fact in an ad hoc manner to look for overlapping genes by motif matching. In this paper differentiated nucleotide compositional patterns in overlapping regions are incorporated into a probabilistic hidden Markov model (HMM) framework which is used to annotate ssRNA viral genomes. This work focuses on single sequence annotation and applies an HMM framework to ssRNA viral annotation. A description of how the HMM is parameterized, whilst annotating within a missing data framework is given. A Phylogenetic HMM (Phylo-HMM) extension, as applied to 14 aligned HIV2 sequences is also presented. This evolutionary extension serves as an illustration of the potential of the Phylo-HMM framework for ssRNA viral genomic annotation.
RESULTS: The single sequence annotation procedure (SSA) is applied to 14 different strains of the HIV2 virus. Further results on alternative ssRNA viral genomes are presented to illustrate more generally the performance of the method. The results of the SSA method are encouraging however there is still room for improvement, and since there is overwhelming evidence to indicate that comparative methods can improve coding sequence (CDS) annotation, the SSA method is extended to a Phylo-HMM to incorporate evolutionary information. The Phylo-HMM extension is applied to the same set of 14 HIV2 sequences which are pre-aligned. The performance improvement that results from including the evolutionary information in the analysis is illustrated.

Entities:  

Mesh:

Substances:

Year:  2006        PMID: 16613911     DOI: 10.1093/bioinformatics/btl092

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


  4 in total

1.  Is there a twelfth protein-coding gene in the genome of influenza A? A selection-based approach to the detection of overlapping genes in closely related sequences.

Authors:  Niv Sabath; Jeffrey S Morris; Dan Graur
Journal:  J Mol Evol       Date:  2011-12-21       Impact factor: 2.395

Review 2.  Functional viral metagenomics and the next generation of molecular tools.

Authors:  Thomas Schoenfeld; Mark Liles; K Eric Wommack; Shawn W Polson; Ronald Godiska; David Mead
Journal:  Trends Microbiol       Date:  2009-11-05       Impact factor: 17.079

Review 3.  Overlapping genes in natural and engineered genomes.

Authors:  Bradley W Wright; Mark P Molloy; Paul R Jaschke
Journal:  Nat Rev Genet       Date:  2021-10-05       Impact factor: 59.581

4.  A method for the simultaneous estimation of selection intensities in overlapping genes.

Authors:  Niv Sabath; Giddy Landan; Dan Graur
Journal:  PLoS One       Date:  2008-12-22       Impact factor: 3.240

  4 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.