Literature DB >> 14583187

A "polyORFomic" analysis of prokaryote genomes using disabled-homology filtering reveals conserved but undiscovered short ORFs.

Paul M Harrison1, Nicholas Carriero, Yang Liu, Mark Gerstein.   

Abstract

Prokaryote gene annotation is complicated by large numbers of short open reading frames (ORFs) that arise naturally from genetic code design. Historically, many hypothetical ORFs have been annotated as genes in microbes, usually with an arbitrary length threshold (e.g. greater than 100 codons). Given the use of such thresholds, what is the extent of genuine undiscovered short genes in the current sampling of prokaryote genomes? To assess rigorously the potential under-annotation of short ORFs with homology, we exhaustively compared the polyORFome--all possible ORFs in 64 prokaryotes (53 bacteria and 11 archaea) plus budding yeast--to itself and to all known proteins. The novelty of our analysis is that, firstly, sequence comparisons to/between both annotated and un-annotated ORFs are considered, and secondly a two-step disabled-homology filter is applied to set aside putative pseudogenes and spurious ORFs. We find that un-annotated homologous short ORFs (uhORFs) correspond to a small but non-negligible fraction of the annotated prokaryote proteomes (0.5-3.8%, depending on selection criteria). Moreover, the disabled-homology filter indicates that about a third of uhORFs correspond to putative pseudogenes or spurious ORFs. Our analysis shows that the use of annotation length thresholds is unnecessary, as there are manageable numbers of short ORF homologies conserved (without disablements) across microbial genomes. Data on uhORFs are available from http://pseudogene.org/polyo

Entities:  

Mesh:

Year:  2003        PMID: 14583187     DOI: 10.1016/j.jmb.2003.09.016

Source DB:  PubMed          Journal:  J Mol Biol        ISSN: 0022-2836            Impact factor:   5.469


  13 in total

1.  A high productivity/low maintenance approach to high-performance computation for biomedicine: four case studies.

Authors:  Nicholas Carriero; Michael V Osier; Kei-Hoi Cheung; Perry L Miller; Mark Gerstein; Hongyu Zhao; Baolin Wu; Scott Rifkin; Joseph Chang; Heping Zhang; Kevin White; Kenneth Williams; Martin Schultz
Journal:  J Am Med Inform Assoc       Date:  2004-10-18       Impact factor: 4.497

2.  Dynamic behavior of an intrinsically unstructured linker domain is conserved in the face of negligible amino acid sequence conservation.

Authors:  Gary W Daughdrill; Pranesh Narayanaswami; Sara H Gilmore; Agniezka Belczyk; Celeste J Brown
Journal:  J Mol Evol       Date:  2007-08-25       Impact factor: 2.395

3.  Computational Methods for Pseudogene Annotation Based on Sequence Homology.

Authors:  Paul M Harrison
Journal:  Methods Mol Biol       Date:  2021

4.  The evolutionary fate of MULE-mediated duplications of host gene fragments in rice.

Authors:  Nikoleta Juretic; Douglas R Hoen; Michael L Huynh; Paul M Harrison; Thomas E Bureau
Journal:  Genome Res       Date:  2005-09       Impact factor: 9.043

5.  Genomic evidence for non-random endemic populations of decaying exons from mammalian genes.

Authors:  David Delima Morais; Paul M Harrison
Journal:  BMC Genomics       Date:  2009-07-13       Impact factor: 3.969

6.  Regulation of gene expression by macrolide-induced ribosomal frameshifting.

Authors:  Pulkit Gupta; Krishna Kannan; Alexander S Mankin; Nora Vázquez-Laslop
Journal:  Mol Cell       Date:  2013-11-14       Impact factor: 17.970

7.  Gene discovery by genome-wide CDS re-prediction and microarray-based transcriptional analysis in phytopathogen Xanthomonas campestris.

Authors:  Lian Zhou; Frank-Jörg Vorhölter; Yong-Qiang He; Bo-Le Jiang; Ji-Liang Tang; Yuquan Xu; Alfred Pühler; Ya-Wen He
Journal:  BMC Genomics       Date:  2011-07-12       Impact factor: 3.969

8.  Similarity-based gene detection: using COGs to find evolutionarily-conserved ORFs.

Authors:  Bradford C Powell; Clyde A Hutchison
Journal:  BMC Bioinformatics       Date:  2006-01-19       Impact factor: 3.169

9.  The abundance of short proteins in the mammalian proteome.

Authors:  Martin C Frith; Alistair R Forrest; Ehsan Nourbakhsh; Ken C Pang; Chikatoshi Kai; Jun Kawai; Piero Carninci; Yoshihide Hayashizaki; Timothy L Bailey; Sean M Grimmond
Journal:  PLoS Genet       Date:  2006-04-28       Impact factor: 5.917

10.  Genome mining for methanobactins.

Authors:  Grace E Kenney; Amy C Rosenzweig
Journal:  BMC Biol       Date:  2013-02-26       Impact factor: 7.431

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.