Literature DB >> 7984428

Intrinsic and extrinsic approaches for detecting genes in a bacterial genome.

M Borodovsky1, K E Rudd, E V Koonin.   

Abstract

The unannotated regions of the Escherichia coli genome DNA sequence from the EcoSeq6 database, totaling 1,278 'intergenic' sequences of the combined length of 359,279 basepairs, were analyzed using computer-assisted methods with the aim of identifying putative unknown genes. The proposed strategy for finding new genes includes two key elements: i) prediction of expressed open reading frames (ORFs) using the GeneMark method based on Markov chain models for coding and non-coding regions of Escherichia coli DNA, and ii) search for protein sequence similarities using programs based on the BLAST algorithm and programs for motif identification. A total of 354 putative expressed ORFs were predicted by GeneMark. Using the BLASTX and TBLASTN programs, it was shown that 208 ORFs located in the unannotated regions of the E. coli chromosome are significantly similar to other protein sequences. Identification of 182 ORFs as probable genes was supported by GeneMark and BLAST, comprising 51.4% of the GeneMark 'hits' and 87.5% of the BLAST 'hits'. 73 putative new genes, comprising 20.6% of the GeneMark predictions, belong to ancient conserved protein families that include both eubacterial and eukaryotic members. This value is close to the overall proportion of highly conserved sequences among eubacterial proteins, indicating that the majority of the putative expressed ORFs that are predicted by GeneMark, but have no significant BLAST hits, nevertheless are likely to be real genes. The majority of the putative genes identified by BLAST search have been described since the release of the EcoSeq6 database, but about 70 genes have not been detected so far. Among these new identifications are genes encoding proteins with a variety of predicted functions including dehydrogenases, kinases, several other metabolic enzymes, ATPases, rRNA methyltransferases, membrane proteins, and different types of regulatory proteins.

Entities:  

Mesh:

Year:  1994        PMID: 7984428      PMCID: PMC308528          DOI: 10.1093/nar/22.22.4756

Source DB:  PubMed          Journal:  Nucleic Acids Res        ISSN: 0305-1048            Impact factor:   16.971


  51 in total

1.  A conserved family of nuclear proteins containing structural elements of the finger protein encoded by Krüppel, a Drosophila segmentation gene.

Authors:  R Schuh; W Aicher; U Gaul; S Côté; A Preiss; D Maier; E Seifert; U Nauber; C Schröder; R Kemler
Journal:  Cell       Date:  1986-12-26       Impact factor: 41.582

2.  Codon preference and primary sequence structure in protein-coding regions.

Authors:  S Tavaré; B Song
Journal:  Bull Math Biol       Date:  1989       Impact factor: 1.758

3.  Heuristic informational analysis of sequences.

Authors:  J M Claverie; L Bougueleret
Journal:  Nucleic Acids Res       Date:  1986-01-10       Impact factor: 16.971

4.  Statistical method for predicting protein coding regions in nucleic acid sequences.

Authors:  G Fichant; C Gautier
Journal:  Comput Appl Biosci       Date:  1987-11

5.  Large scale bacterial gene discovery by similarity search.

Authors:  K Robison; W Gilbert; G M Church
Journal:  Nat Genet       Date:  1994-06       Impact factor: 38.330

6.  Computer methods to locate signals in nucleic acid sequences.

Authors:  R Staden
Journal:  Nucleic Acids Res       Date:  1984-01-11       Impact factor: 16.971

7.  The codon preference plot: graphic analysis of protein coding sequences and prediction of gene expression.

Authors:  M Gribskov; J Devereux; R R Burgess
Journal:  Nucleic Acids Res       Date:  1984-01-11       Impact factor: 16.971

8.  Method to determine the reading frame of a protein from the purine/pyrimidine genome sequence and its possible evolutionary justification.

Authors:  J C Shepherd
Journal:  Proc Natl Acad Sci U S A       Date:  1981-03       Impact factor: 11.205

9.  Recognition of protein coding regions in DNA sequences.

Authors:  J W Fickett
Journal:  Nucleic Acids Res       Date:  1982-09-11       Impact factor: 16.971

10.  Distantly related sequences in the alpha- and beta-subunits of ATP synthase, myosin, kinases and other ATP-requiring enzymes and a common nucleotide binding fold.

Authors:  J E Walker; M Saraste; M J Runswick; N J Gay
Journal:  EMBO J       Date:  1982       Impact factor: 11.598

View more
  29 in total

1.  Role in cell permeability of an essential two-component system in Staphylococcus aureus.

Authors:  P K Martin; T Li; D Sun; D P Biek; M B Schmid
Journal:  J Bacteriol       Date:  1999-06       Impact factor: 3.490

2.  Identification and characterization of a new lipoprotein, NlpI, in Escherichia coli K-12.

Authors:  M Ohara; H C Wu; K Sankaran; P D Rick
Journal:  J Bacteriol       Date:  1999-07       Impact factor: 3.490

Review 3.  Current methods of gene prediction, their strengths and weaknesses.

Authors:  Catherine Mathé; Marie-France Sagot; Thomas Schiex; Pierre Rouzé
Journal:  Nucleic Acids Res       Date:  2002-10-01       Impact factor: 16.971

4.  Active-site mutations in the Xrn1p exoribonuclease of Saccharomyces cerevisiae reveal a specific role in meiosis.

Authors:  J A Solinger; D Pascolini; W D Heyer
Journal:  Mol Cell Biol       Date:  1999-09       Impact factor: 4.272

Review 5.  Computational methods for exon detection.

Authors:  J M Claverie
Journal:  Mol Biotechnol       Date:  1998-08       Impact factor: 2.695

6.  Distribution of metabolic activity and phosphate starvation response of lux-tagged Pseudomonas fluorescens reporter bacteria in the barley rhizosphere.

Authors:  L Kragelund; C Hosbond; O Nybroe
Journal:  Appl Environ Microbiol       Date:  1997-12       Impact factor: 4.792

Review 7.  Proton-dependent multidrug efflux systems.

Authors:  I T Paulsen; M H Brown; R A Skurray
Journal:  Microbiol Rev       Date:  1996-12

8.  Compilation of DNA sequences of Escherichia coli K12: description of the interactive databases ECD and ECDC (update 1996).

Authors:  M Kröger; R Wahl
Journal:  Nucleic Acids Res       Date:  1997-01-01       Impact factor: 16.971

9.  Two domains of superfamily I helicases may exist as separate proteins.

Authors:  E V Koonin; K E Rudd
Journal:  Protein Sci       Date:  1996-01       Impact factor: 6.725

10.  VIGOR, an annotation program for small viral genomes.

Authors:  Shiliang Wang; Jaideep P Sundaram; David Spiro
Journal:  BMC Bioinformatics       Date:  2010-09-07       Impact factor: 3.169

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.