Literature DB >> 21551138

Identification of prokaryotic small proteins using a comparative genomic approach.

Josue Samayoa1, Fitnat H Yildiz, Kevin Karplus.   

Abstract

MOTIVATION: Accurate prediction of genes encoding small proteins (on the order of 50 amino acids or less) remains an elusive open problem in bioinformatics. Some of the best methods for gene prediction use either sequence composition analysis or sequence similarity to a known protein coding sequence. These methods often fail for small proteins, however, either due to a lack of experimentally verified small protein coding genes or due to the limited statistical significance of statistics on small sequences. Our approach is based upon the hypothesis that true small proteins will be under selective pressure for encoding the particular amino acid sequence, for ease of translation by the ribosome and for structural stability. This stability can be achieved either independently or as part of a larger protein complex. Given this assumption, it follows that small proteins should display conserved local protein structure properties much like larger proteins. Our method incorporates neural-net predictions for three local structure alphabets within a comparative genomic approach using a genomic alignment of 22 closely related bacteria genomes to generate predictions for whether or not a given open reading frame (ORF) encodes for a small protein.
RESULTS: We have applied this method to the complete genome for Escherichia coli strain K12 and looked at how well our method performed on a set of 60 experimentally verified small proteins from this organism. Out of a total of 11 407 possible ORFs, we found that 6 of the top 10 and 27 of the top 100 predictions belonged to the set of 60 experimentally verified small proteins. We found 35 of all the true small proteins within the top 200 predictions. We compared our method to Glimmer, using a default Glimmer protocol and a modified small ORF Glimmer protocol with a lower minimum size cutoff. The default Glimmer protocol identified 16 of the true small proteins (all in the top 200 predictions), but failed to predict on 34 due to size cutoffs. The small ORF Glimmer protocol made predictions for all the experimentally verified small proteins but only contained 9 of the 60 true small proteins within the top 200 predictions. CONTACT: jsamayoa@jhu.edu

Entities:  

Mesh:

Substances:

Year:  2011        PMID: 21551138      PMCID: PMC3117347          DOI: 10.1093/bioinformatics/btr275

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


  28 in total

1.  Efficiency of the neighbor-joining method in reconstructing deep and shallow evolutionary relationships in large phylogenies.

Authors:  S Kumar; S R Gadagkar
Journal:  J Mol Evol       Date:  2000-12       Impact factor: 2.395

2.  GeneMarkS: a self-training method for prediction of gene starts in microbial genomes. Implications for finding sequence motifs in regulatory regions.

Authors:  J Besemer; A Lomsadze; M Borodovsky
Journal:  Nucleic Acids Res       Date:  2001-06-15       Impact factor: 16.971

3.  On the total number of genes and their length distribution in complete microbial genomes.

Authors:  M Skovgaard; L J Jensen; S Brunak; D Ussery; A Krogh
Journal:  Trends Genet       Date:  2001-08       Impact factor: 11.639

Review 4.  Endogenous production of antimicrobial peptides in innate immunity and human disease.

Authors:  Richard L Gallo; Victor Nizet
Journal:  Curr Allergy Asthma Rep       Date:  2003-09       Impact factor: 4.806

5.  Distinguishing the ORFs from the ELFs: short bacterial genes and the annotation of genomes.

Authors:  Howard Ochman
Journal:  Trends Genet       Date:  2002-07       Impact factor: 11.639

6.  Aligning multiple genomic sequences with the threaded blockset aligner.

Authors:  Mathieu Blanchette; W James Kent; Cathy Riemer; Laura Elnitski; Arian F A Smit; Krishna M Roskin; Robert Baertsch; Kate Rosenbloom; Hiram Clawson; Eric D Green; David Haussler; Webb Miller
Journal:  Genome Res       Date:  2004-04       Impact factor: 9.043

7.  Improved microbial gene identification with GLIMMER.

Authors:  A L Delcher; D Harmon; S Kasif; O White; S L Salzberg
Journal:  Nucleic Acids Res       Date:  1999-12-01       Impact factor: 16.971

8.  Degenerative minimalism in the genome of a psyllid endosymbiont.

Authors:  M A Clark; L Baumann; M L Thao; N A Moran; P Baumann
Journal:  J Bacteriol       Date:  2001-03       Impact factor: 3.490

9.  Small membrane proteins found by comparative genomics and ribosome binding site models.

Authors:  Matthew R Hemm; Brian J Paul; Thomas D Schneider; Gisela Storz; Kenneth E Rudd
Journal:  Mol Microbiol       Date:  2008-12       Impact factor: 3.501

10.  EasyGene--a prokaryotic gene finder that ranks ORFs by statistical significance.

Authors:  Thomas Schou Larsen; Anders Krogh
Journal:  BMC Bioinformatics       Date:  2003-06-03       Impact factor: 3.169

View more
  18 in total

1.  The Escherichia coli CydX protein is a member of the CydAB cytochrome bd oxidase complex and is required for cytochrome bd oxidase activity.

Authors:  Caitlin E VanOrsdel; Shantanu Bhatt; Rondine J Allen; Evan P Brenner; Jessica J Hobson; Aqsa Jamil; Brittany M Haynes; Allyson M Genson; Matthew R Hemm
Journal:  J Bacteriol       Date:  2013-06-07       Impact factor: 3.490

Review 2.  Emerging evidence for functional peptides encoded by short open reading frames.

Authors:  Shea J Andrews; Joseph A Rothnagel
Journal:  Nat Rev Genet       Date:  2014-02-11       Impact factor: 53.242

3.  SearchDOGS bacteria, software that provides automated identification of potentially missed genes in annotated bacterial genomes.

Authors:  Seán S Óhéigeartaigh; David Armisén; Kevin P Byrne; Kenneth H Wolfe
Journal:  J Bacteriol       Date:  2014-03-21       Impact factor: 3.490

Review 4.  Identifying (non-)coding RNAs and small peptides: challenges and opportunities.

Authors:  Andrea Pauli; Eivind Valen; Alexander F Schier
Journal:  Bioessays       Date:  2014-10-24       Impact factor: 4.345

Review 5.  Small proteins can no longer be ignored.

Authors:  Gisela Storz; Yuri I Wolf; Kumaran S Ramamurthi
Journal:  Annu Rev Biochem       Date:  2014-03-03       Impact factor: 23.643

6.  Complete Genome Sequence of Bifidobacterium longum GT15: Identification and Characterization of Unique and Global Regulatory Genes.

Authors:  Natalia V Zakharevich; Olga V Averina; Ksenia M Klimina; Anna V Kudryavtseva; Artem S Kasianov; Vsevolod J Makeev; Valery N Danilenko
Journal:  Microb Ecol       Date:  2015-04-17       Impact factor: 4.552

Review 7.  Alternative ORFs and small ORFs: shedding light on the dark proteome.

Authors:  Mona Wu Orr; Yuanhui Mao; Gisela Storz; Shu-Bing Qian
Journal:  Nucleic Acids Res       Date:  2020-02-20       Impact factor: 16.971

Review 8.  Escherichia coli Small Proteome.

Authors:  Matthew R Hemm; Jeremy Weaver; Gisela Storz
Journal:  EcoSal Plus       Date:  2020-05

9.  Exploring the Peptide Potential of Genomes.

Authors:  Chris Papadopoulos; Nicolas Chevrollier; Anne Lopes
Journal:  Methods Mol Biol       Date:  2022

10.  REPARATION: ribosome profiling assisted (re-)annotation of bacterial genomes.

Authors:  Elvis Ndah; Veronique Jonckheere; Adam Giess; Eivind Valen; Gerben Menschaert; Petra Van Damme
Journal:  Nucleic Acids Res       Date:  2017-11-16       Impact factor: 16.971

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.