| Literature DB >> 16397293 |
Monica Riley1, Takashi Abe, Martha B Arnaud, Mary K B Berlyn, Frederick R Blattner, Roy R Chaudhuri, Jeremy D Glasner, Takashi Horiuchi, Ingrid M Keseler, Takehide Kosuge, Hirotada Mori, Nicole T Perna, Guy Plunkett, Kenneth E Rudd, Margrethe H Serres, Gavin H Thomas, Nicholas R Thomson, David Wishart, Barry L Wanner.
Abstract
The goal of this group project has been to coordinate and bring up-to-date information on all genes of Escherichia coli K-12. Annotation of the genome of an organism entails identification of genes, the boundaries of genes in terms of precise start and end sites, and description of the gene products. Known and predicted functions were assigned to each gene product on the basis of experimental evidence or sequence analysis. Since both kinds of evidence are constantly expanding, no annotation is complete at any moment in time. This is a snapshot analysis based on the most recent genome sequences of two E.coli K-12 bacteria. An accurate and up-to-date description of E.coli K-12 genes is of particular importance to the scientific community because experimentally determined properties of its gene products provide fundamental information for annotation of innumerable genes of other organisms. Availability of the complete genome sequence of two K-12 strains allows comparison of their genotypes and mutant status of alleles.Entities:
Mesh:
Substances:
Year: 2006 PMID: 16397293 PMCID: PMC1325200 DOI: 10.1093/nar/gkj405
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Information gathered on genes of E.coli K-12 and their sources
| Column heading | Column content | Sources of information |
|---|---|---|
| Feature | Type of genetic element (e.g. CDS, RNA, pseudogenes) | WP |
| Locustag K-12 | New K-12 specific gene identifier (ECK number) | WP |
| Gene Name K-12 | Name in Demerec format | WP, CGSC |
| Locus Name K-12 | Name, including non-Demerec conforming format | WP, CGSC, EcoGene, Entrez, GenoBase, GenProtEC, personal communications |
| Synonyms of Locus Name | Other names of same locus | CGSC, EcoGene, Entrez, GenProtEC |
| Locus Tag MG1655 | Identifier in MG1655 (b number) | WP, GenBank™ |
| Left nucleotide MG1655 | Left boundary of gene | WP, GenBank™ |
| Right nucleotide MG1655 | Right boundary of gene | WP, GenBank™ |
| Direction of transcription MG1655 | Direction described as clockwise (+) or counterclockwise (−) | WP, GenBank™ |
| Comment on gene boundary MG1655 | WP | |
| Locus Tag W3110 | Identifier in W3110 (JW number) | WP, GenoBase |
| Left nucleotide W3110 | Left boundary of gene | WP |
| Right nucleotide W3110 | Right boundary of gene | WP |
| Direction of transcription W3110 | Direction described as clockwise (+) or counterclockwise (−) | WP |
| Comment on gene boundary W3110 | WP | |
| Type of gene product | Code for class of molecule in | GenProtEC |
| Gene product description | Name of encoded protein, RNA or site | WP, ASAP, BLAST |
| Comment gene product description | More detail on description and function of gene product | WP |
| Evidence | Basis for assignment of function, E (experimental) or C (computational prediction) | WP |
| Literature | Literature citations, PMID or abbreviated format if unavailable | GenProtEC, CCDB, EcoGene, PubMed |
| Cell location | Location of gene product based on evaluation of literature and computational predictions | WP, EchoBASE, HMMTOP |
| Context (genetic element) | Location of gene within a genetic element such as prophage, IS | WP, Entrez |
| Enzyme nomenclature | EC number | IUBMB |
| Cofactor | EcoCyc | |
| Protein complex | Name of complex with component units listed | EcoCyc |
| Transporter classification | Superfamily assignment from Transport Classification Database | TCDB |
| Transcription regulator family | Self explanatory | EcoCyc, RegulonDB |
| Proteases | Known and predicted in MEROPS database | MEROPS |
| Signal peptide predictions | SignalP | |
| Signal peptide cleavage sites | EcoGene | |
| No. of transmembrane segments 1 | Predicted with HMMTOP | HMMTOP |
| No. of transmembrane segments 2 | Predicted with TMHMM | TMHMM |
| TM protein C-term location | Experimentally based determination of location of the C-terminal end of transmembrane proteins as in or out of the cytoplasm | Publication |
| Transcriptional unit(s) regulated | Gene(s) transcriptionally regulated, known and predicted | EcoCyc, RegulonDB |
| Operons with attenuation regulation | Genes predicted to be regulated by transcriptional attenuation | Attenuator website |
| Fused genes | Genes identified as encoding more than one function as a result of gene fusion | GenProtEC |
| Structure (PDB) id | Structure identifier from the Protein Data Bank | PDB |
| COG assignment | Sequence similarity to cluster of orthologous groups | COG |
| SCOP assignment | Sequence similarity to SCOP superfamily structural domains | Superfamily |
| PFAM assignment | Sequence similarity to PFAM families and domains | Pfam |
| TIGRFAM assignment | Sequence similarity to TIGRFAM protein families | TIGRFAM |
| GO cellular component | Mapping of location prediction to GO terms (this study) | WP, GO |
| GO cellular process | Mapping of function to GO terms | WP, MultiFun2GO |
| GO molecular function | Mapping of function to GO terms | WP, MultiFun2GO |
1Workshop participants.
2 (6).
3 (7).
4 (8).
5 (9,10).
6 (11).
7 (12).
8 (13).
9 (11).
10 (14).
11 (15).
12 (16).
13 (17).
14 (18).
15 (19).
16 (20).
17 (21).
18 (22).
19 (23).
20 (11).
21 (24).
22 (25).
23 (26).
24 (27).
25 (28).
26 (29).
27 (30).
28 (31).
29 (32).
30 (11).
31 (33).
32 (34).
33 (35).
34 (36).
35 (37).
Figure 1Gene fissions, fusions and an inversion resulting from 1 nt indel corrections. Of 78 frameshift corrections, two 1 nt indels led to fissions (splitting) of genes (A and B), 23 resulted in gene fusions, similar to the example in (C), and 1 led to an inversion (D) (4). (D) The original annotation of the rpiB region showed a gene called phnQ, whose sequence is not conserved. A 1 nt insertion created a CDS for a conserved protein (yjdP) in the opposite orientation. While phnQ was originally thought to be a downstream gene in the large phosphonate (phn) operon (39), mutational studies later revealed no role for it in phosphonate metabolism (40).
Genes not common to strains MG1655 and W3110
| Type of gene product | W3110 | MG1655 |
|---|---|---|
| Pseudogene | 6 | |
| IS | 17 | 2 |
| Prophage genes | 9 | |
| Total | 23 | 11 |
Numbers and types of known and predicted gene products of E.coli K-121
| Code | Gene product type | Number | Percentage |
|---|---|---|---|
| e | Enzyme | 1094 | 33.3 |
| pe | Enzyme, predicted | 390 | |
| t | Transporter | 337 | 13.3 |
| pt | Transporter, predicted | 254 | |
| r | Regulator | 241 | 9.1 |
| pr | Regulator, predicted | 164 | |
| m | Membrane | 43 | 5.7 |
| pm | Membrane, predicted | 210 | |
| f | Factor | 150 | 4.7 |
| pf | Factor, predicted | 60 | |
| s | Structural component | 89 | 2.8 |
| ps | Structural component, predicted | 37 | |
| c | Carrier | 77 | 2.7 |
| pc | Carrier, predicted | 42 | |
| n | RNA | 156 | 3.5 |
| lp | Lipoprotein | 46 | 1.0 |
| cp | Cell process | 56 | 1.3 |
| l | Leader peptide | 11 | 0.3 |
| su | Pseudogenes in common | 74 | 1.6 |
| i | Site ( | 1 | <0.1 |
| h | Phage/IS in common (including 15 pseudogenes) | 304 | 6.8 |
| d | Partial information | 146 | 3.3 |
| o | Unknown function | 471 | 10.6 |
| Total | 4453 | 100.0 |
1Genes in common to strains MG1655 and W3110.
2The percentage is calculated from the sum of known and predicted gene types.
Figure 2Status of annotation of E.coli gene products. The total number of gene products present in both MG1655 and W3110, 4452 excluding oriC, are categorized according to their function assignment. Evidence code and gene type assignments available in the Supplementary Table 1 were used to group the gene products. The annotation groups include gene products whose function is experimentally determined (2403, 54.1%), predicted by computational analysis (1425, 32%), or unknown (616, 13.9%). The gene products of unknown function are further separated into those containing a conserved domain (145, 3.3%), those with (233, 5.3%) or without (238, 5.3%) a detectable homolog in the sequence databases.