| Literature DB >> 26664654 |
Yan-Cong Zhang1, Yan Zhang2, Bi-Ru Zhu1, Bo-Wen Zhang1, Chuan Ni3, Da-Yong Zhang1, Ying Huang4, Erli Pang1, Kui Lin1.
Abstract
Escherichia coli lab strains K-12 GM4792 Lac(+) and GM4792 Lac(-) carry opposite lactose markers, which are useful for distinguishing evolved lines as they produce different colored colonies. The two closely related strains are chosen as ancestors for our ongoing studies of experimental evolution. Here, we describe the genome sequences, annotation, and features of GM4792 Lac(+) and GM4792 Lac(-). GM4792 Lac(+) has a 4,622,342-bp long chromosome with 4,061 protein-coding genes and 83 RNA genes. Similarly, the genome of GM4792 Lac(-) consists of a 4,621,656-bp chromosome containing 4,043 protein-coding genes and 74 RNA genes. Genome comparison analysis reveals that the differences between GM4792 Lac(+) and GM4792 Lac(-) are minimal and limited to only the targeted lac region. Moreover, a previous study on competitive experimentation indicates the two strains are identical or nearly identical in survivability except for lactose utilization in a nitrogen-limited environment. Therefore, at both a genetic and a phenotypic level, GM4792 Lac(+) and GM4792 Lac(-), with opposite neutral markers, are ideal systems for future experimental evolution studies.Entities:
Keywords: Escherichia coli K12; Experimental evolution; GM4792; Genome comparison; Gram-negative; Lactose; Variant analysis
Year: 2015 PMID: 26664654 PMCID: PMC4675052 DOI: 10.1186/s40793-015-0114-x
Source DB: PubMed Journal: Stand Genomic Sci ISSN: 1944-3277
Fig. 1Scanning-electron micrograph of strain E. coli GM4792 Lac+
Classification and general features of Escherichia coli strain K-12 GM4792 according to the MIGS recommendations [58]
| MIGS ID | Property | Term | Evidence codea |
|---|---|---|---|
| Classification | Domain | TAS [ | |
| Phylum | TAS [ | ||
| Class | TAS [ | ||
| Order | TAS [ | ||
| Family | TAS [ | ||
| Genus | TAS [ | ||
| Species | TAS [ | ||
| Strain: GM4792 | TAS [ | ||
| Gram stain | Negative | IDA, TAS [ | |
| Cell shape | Rod | TAS [ | |
| Motility | Motile | TAS [ | |
| Sporulation | None | IDA, TAS [ | |
| Temperature range | 10 °C ~ 45 °C | NAS | |
| Optimum temperature | 37 °C | IDA, TAS [ | |
| pH range; Optimum | 5.5–8.0; 7 | IDA, TAS [ | |
| Carbon source | peptides | IDA, TAS [ | |
| MIGS-6 | Habitat | Not reported | |
| MIGS-6.3 | Salinity | Not reported | |
| MIGS-22 | Oxygen requirement | Facultative anaerobe | TAS [ |
| MIGS-15 | Biotic relationship | Human specimen | NAS |
| MIGS-14 | Pathogenicity | Non-pathogenic | NAS |
| MIGS-4 | Geographic location | Not reported | |
| MIGS-5 | Sample collection | October 7, 2007 | |
| MIGS-4.1 | Latitude | Not reported | |
| MIGS-4.2 | Longitude | Not reported | |
| MIGS-4.4 | Altitude | Not reported |
aEvidence codes - IDA: Inferred from Direct Assay; TAS: Traceable Author Statement (i.e., a direct report exists in the literature); NAS: Non-traceable Author Statement (i.e., not directly observed for the living, isolated sample, but based on a generally accepted property for the species, or anecdotal evidence). These evidence codes are from the Gene Ontology project [46]. Some missing key taxonomic references are shown in Additional file 3
Project information
| MIGS ID | Property | Term |
|---|---|---|
| MIGS 31 | Finishing quality | High-quality draft |
| MIGS-28 | Libraries used | Two paired-end libraries of 180 bp, 380 bp and two mate-pair libraries of 2,000 bp, 6,000 bp, respectively |
| MIGS 29 | Sequencing platforms | Illumina HiSeq 2000 |
| MIGS 31.2 | Fold coverage | ~330× for GM4792 Lac+ and ~370× for GM4792 Lac- (180 bp); ~100x (other libraries) |
| MIGS 30 | Assemblers | ALLPATHS-LG Release 42411 [ |
| MIGS 32 | Gene calling method | RATT, Prodigal v2.5 [ |
| Locus Tag | U068 for Lac+ and U069 for Lac- | |
| Genbank ID | CP011342 for Lac+ and CP011343 for Lac- | |
| GenBank Date of Release | Jun 6, 2015 | |
| GOLD ID | Gi0059689 for GM4792 Lac+ and Gi0059688 for GM4792 Lac- | |
| BIOPROJECT | PRJNA224130 for GM4792 Lac+ and PRJNA224131 for GM4792 Lac- | |
| SRA IDs | GM4792 Lac+ : SRR2596368, SRR2537294, | |
| SRR2619692, SRR2619693 | ||
| GM4792 Lac- : SRR2529478, SRR1039666, | ||
| SRR2529494, SRR2533204 | ||
| MIGS 13 | Source Material Identifier | GM4792 |
| Project relevance | Experimental evolution, Tree of Life |
Genome statistics
| Attribute | Valueb | % of Totala,b | Valuec | % of Totala,c |
|---|---|---|---|---|
| Genome size (bp) | 4,622,342 | 100.00 | 4,621,656 | 100.00 |
| DNA coding (bp) | 3,888,159 | 84.12 | 3,873,721 | 83.82 |
| DNA G + C (bp) | 2,348,605 | 50.81 | 2,348,022 | 50.80 |
| DNA scaffolds | 1 | 1 | 0.00 | |
| Total genes | 4,144 | 100.00 | 4,117 | 100.00 |
| Protein coding genes | 4,061 | 98.00 | 4,043 | 98.20 |
| RNA genes | 83 | 2.00 | 74 | 1.80 |
| Pseudo genes | 0 | 0.00 | 0 | 0.00 |
| Genes in internal clusters | 2,036 | 49.13 | 2,027 | 49.23 |
| Genes with function prediction | 3,922 | 94.64 | 3,900 | 94.73 |
| Genes assigned to COGs | 3,592 | 88.45 | 3,580 | 88.55 |
| Genes with Pfam domains | 3,838 | 92.62 | 3,818 | 92.74 |
| Genes with signal peptides | 410 | 9.89 | 408 | 9.91 |
| Genes with transmembrane helices | 1,058 | 25.53 | 1,048 | 25.46 |
| CRISPR repeats | 2 | 2 |
aThe total based on either the size of the genome in base pairs or the total number of genes in the annotated genome
bThe genome statistics for GM4792 Lac+
cThe genome statistics for GM4792 Lac-
Number of genes associated with general COG functional categories
| Code | Valueb | % agea,b | Valuec | % agea,c | Description |
|---|---|---|---|---|---|
| J | 258 | 6.35 | 259 | 6.41 | Translation, ribosomal structure and biogenesis |
| A | 2 | 0.05 | 2 | 0.05 | RNA processing and modification |
| K | 335 | 8.25 | 337 | 8.34 | Transcription |
| L | 160 | 3.94 | 160 | 3.96 | Replication, recombination and repair |
| B | 0 | 0.00 | 0 | 0.00 | Chromatin structure and dynamics |
| D | 50 | 1.23 | 50 | 1.24 | Cell cycle control, cell division, chromosome partitioning |
| Y | 0 | 0.00 | 0 | 0.00 | Nuclear structure |
| V | 107 | 2.63 | 107 | 2.65 | Defense mechanisms |
| T | 251 | 6.18 | 250 | 6.18 | Signal transduction mechanisms |
| M | 286 | 7.04 | 285 | 7.05 | Cell wall/membrane/envelope biogenesis |
| N | 116 | 2.86 | 114 | 2.82 | Cell motility |
| Z | 0 | 0.00 | 0 | 0.00 | Cytoskeleton |
| W | 38 | 0.94 | 36 | 0.89 | Extracellular structures |
| U | 63 | 1.55 | 61 | 1.51 | Intracellular trafficking, secretion, and vesicular transport |
| O | 171 | 4.21 | 170 | 4.20 | Posttranslational modification, protein turnover, chaperones |
| X | 31 | 0.76 | 31 | 0.77 | Mobilome: prophages, transposons |
| C | 317 | 7.81 | 316 | 7.82 | Energy production and conversion |
| G | 437 | 10.76 | 438 | 10.83 | Carbohydrate transport and metabolism |
| E | 397 | 9.78 | 397 | 9.82 | Amino acid transport and metabolism |
| F | 108 | 2.66 | 108 | 2.67 | Nucleotide transport and metabolism |
| H | 189 | 4.65 | 188 | 4.65 | Coenzyme transport and metabolism |
| I | 133 | 3.28 | 133 | 3.29 | Lipid transport and metabolism |
| P | 263 | 6.48 | 260 | 6.43 | Inorganic ion transport and metabolism |
| Q | 77 | 1.90 | 77 | 1.90 | Secondary metabolites biosynthesis, transport and catabolism |
| R | 319 | 7.86 | 321 | 7.94 | General function prediction only |
| S | 216 | 5.32 | 210 | 5.19 | Function unknown |
| - | 469 | 11.55 | 463 | 11.45 | Not in COGs |
aThe total is based on the total number of protein coding genes in the genome
bThe genome statistics for GM4792 Lac+
cThe genome statistics for GM4792 Lac-
Fig. 2Graphical circular map of the chromosome of Escherichia coli K-12 GM4792 Lac+. The circles from outside to the inside represent: genes on forward strand (colored by COG categories), genes on reverse strand (colored by COG categories), RNA genes (tRNAs red and rRNAs purple), G + C content (peaks out/inside the circle indicate values higher or lower than the average G + C content, respectively), GC skew (calculated as (G-C)/(G + C), green/purple peaks out/inside the circle indicates values higher or lower than 1, respectively)
Fig. 3Whole-genome phylogeny highlighting the position of Escherichia coli GM4792 relative to the other E. coli K-12 strains. Up to 10 October 2015, totally 279 genomes of Escherichia coli K-12 strains were released on NCBI. In order to analyze the phylogenetic relationship between GM4792 with other K-12 strains, we downloaded all the genomes of completely assembled and well-annotated K-12 strains. Totally, 44 Escherichia coli K-12 strains together with Escherichia albertii KF1 as outgroup were used to infer the whole-genome phylogeny using collinear genomic segments [53, 54] (Additional file 1: Table S6). The collinear regions were identified by Sibelia v3.0.6 [55], which can efficiently find LCBs among a large number of microbial genomes without alignment, and then collinear regions shared by all strains were concatenated as supermatrix. The Maximum-likelihood (ML) [56] tree was inferred from the data matrices with FastTree v2.1.8 [57]. Local SH-like support was assessed using Shimodaira-Hasegawa (SH) test with 1000 bootstrap replicates, and the support values are given as names for the internal nodes (values below 60 % have been hidden). Escherichia albertii KF1 was used as outgroup to root the tree