| Literature DB >> 27693569 |
Marwa Al Arab1, Christian Höner Zu Siederdissen2, Kifah Tout3, Abdullah H Sahyoun4, Peter F Stadler5, Matthias Bernt6.
Abstract
Mitochondrial genome sequences are available in large number and new sequences become published nowadays with increasing pace. Fast, automatic, consistent, and high quality annotations are a prerequisite for downstream analyses. Therefore, we present an automated pipeline for fast de novo annotation of mitochondrial protein-coding genes. The annotation is based on enhanced phylogeny-aware hidden Markov models (HMMs). The pipeline builds taxon-specific enhanced multiple sequence alignments (MSA) of already annotated sequences and corresponding HMMs using an approximation of the phylogeny. The MSAs are enhanced by fixing unannotated frameshifts, purging of wrong sequences, and removal of non-conserved columns from both ends. A comparison with reference annotations highlights the high quality of the results. The frameshift correction method predicts a large number of frameshifts, many of which are unknown. A detailed analysis of the frameshifts in nad3 of the Archosauria-Testudines group has been conducted.Entities:
Keywords: Annotation; Hidden Markov models; Metazoa; Mitochondrial DNA; Protein coding genes
Mesh:
Substances:
Year: 2016 PMID: 27693569 DOI: 10.1016/j.ympev.2016.09.024
Source DB: PubMed Journal: Mol Phylogenet Evol ISSN: 1055-7903 Impact factor: 4.286