| Literature DB >> 23384204 |
Eric J de Muinck1, Karin Lagesen, Jan Egil Afset, Xavier Didelot, Kjersti S Rønningen, Knut Rudi, Nils Chr Stenseth, Pål Trosvik.
Abstract
BACKGROUND: Despite being one of the most intensely studied model organisms, many questions still remain about the evolutionary biology and ecology of Escherichia coli. An important step toward achieving a more complete understanding of E.coli biology entails elucidating relationships between gene content and adaptation to the ecological niche.Entities:
Mesh:
Substances:
Year: 2013 PMID: 23384204 PMCID: PMC3637554 DOI: 10.1186/1471-2164-14-81
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
List of strains used in this study with corresponding genome information
| EDM1c | 1360 | Healthy | 10 days | B2 | 712 | 7,55 | early | EMBL:ERS155053 |
| EDM3c | 1360 | Healthy | 1 year | B1 | 306 | 12,4 | late | EMBL:ERS155049 |
| EDM16c | 1870 | Healthy | 7 days | B1 | 204 | 13,75 | early | EMBL:ERS155051 |
| EDM70c | 1997 | Healthy | 10 days | B2 | 562 | 8,5 | early | EMBL:ERS155055 |
| EDM49c | 1891 | Healthy | 4 days | B2 | 163 | 14,5 | early | EMBL:ERS155056 |
| EDM101c | 1891 | Healthy | 11 days | B2 | 169 | 16 | early | EMBL:ERS155057 |
| EDM106c | 123 | Healthy | 4 days | B2 | 585 | 8 | early | EMBL:ERS155058 |
| EDM116c | 123 | Healthy | 1 year | A | 864 | 8,2 | late | EMBL:ERS155052 |
| EDM123c | 1360 | Healthy | 4 months | B2 | 669 | 8,5 | late | EMBL:ERS155054 |
| EDM530c | 123 | Healthy | 2 years | NA | 198 | 17,5 | late | EMBL:ERS155050 |
| JEA117c (Trh9)* | 117c | Healthy | 1 year | B2 | 284 | 10,1 | late | EMBL:ERS178156 |
| JEA242p (Trh52)* | 242p | Diarrhoea | 3 years | B2 | 140 | 13,2 | NC | EMBL:ERS178157 |
| JEA297p (Trh58)* | 297p | Diarrhoea | 2 years | D | 521 | 11,1 | NC | EMBL:ERS178158 |
| JEA179p (Trh39)* | 179p | Diarrhoea | 4 years | B1 | 848 | 7,8 | NC | EMBL:ERS178159 |
| JEA160c (Trh12)* | 160c | Healthy | 2 years | A | 188 | 20,1 | late | EMBL:ERS178160 |
| JEA124p (Trh29)* | 124p | Diarrhoea | 2 years | A | 800 | 8,9 | NC | EMBL:ERS178161 |
c = commensal isolate, isolated from healthy child.
p = pathogenic isolate, isolated from child with diarrhoea.
* Isolation based on positive PCR analysis for intimin gene eae.
NC = not categorized.
The genome assembly statistics are results of Newbler de novo assembly. The colonization categorizations are the ones used for gene enrichment comparison between early and late colonizers. The 10 strains named EDM were isolated from samples originating from the IMPACT study [8,10,13]. JEA strains and alternative strain IDs are from [14].
Figure 1Overview of relative and cumulative proportions of shared gene families as the number of included genomes increases. Each pair of bars along the x-axis represents the relative (dark grey) and cumulative (light grey) proportions of gene families that are shared among the indicated number of strains. E.g. the first pair of bars indicates the proportion of gene families that were found in only one of the 16 strains, the second pair indicates the proportion shared by two of the 16 strains, while the final pair indicates the proportion of the pan-genome that is common to all 16 strains. All duplicated gene annotations were removed for this analysis. 11.5% of annotated genes are unique to one strain while 52.4% are common to all. The total number of gene families in the pan-genome is 6152.
Figure 2Heat map of total gene content comparisons. Gene presence is shown in blue and gene absence in yellow. The number of genes is depicted on the x-axis. Strains are listed in the order following hierarchical clustering created using a Manhattan distance matrix based on the gene presence/absence gene content matrix.
Figure 3Comparison of genome trees generated by core and pan-genomes. a. The core genome phylogeny was created using ClonalFrame. b. The pan-genome tree was created using hierarchical clustering of a Manhattan distance matrix based on the gene presence/absence matrix. The scale below the pan-genome tree indicates Manhattan distances. Both methods separated the strains into two main clades (1 and 2).
Criteria used for gene enrichment analyses
| Criteria I, cladistic comparison | Clade1 (8) | ≥7 | ≥7 |
| Clade2 (8) | ≥7 | ≥7 | |
| Criteria II, pathogen/commensal comparison | Pathogen (4) | ≥3 | ≥9 |
| Commensal (12) | ≥9 | ≥3 | |
| Criteria III, growth rate comparison | Fast (2) | 2 | ≥6 |
| Medium (4) | ≥3 | ≥4 | |
| Slow (4) | ≥3 | ≥4 | |
| Criteria IV, colonization time comparison | Early (6) | ≥5 | ≥4 |
| Late (6) | ≥4 | ≥5 | |
| Criteria V, pathogen/commensal comparison | Pathogen (23) | ≥17 | ≥13 |
| Commensal (17) | ≥13 | ≥17 | |
| Criteria VI, pathogen/commensal comparison | Pathogen (5) | ≥4 | ≥13 |
| Commensal (17) | ≥13 | ≥4 |
Criteria I: Criteria used for discriminating cladistic gene content enrichments. Each of the two clades contained 8 strains and enrichment required a gene to be present in at least 7 strains of one clade (focal group) while being absent from at least 7 strains in the other clade (non-focal group).
Criteria II: Criteria used for discriminating pathogen vs. commensal gene content enrichments. Since the two groups are of unequal size, a pathogen enriched gene had to present in at least 3 of 4 pathogenic strains and absent from at least 9 of 12 commensal strains. A commensal enriched gene had to be present in at least 9 of 12 commensal strains and absent from at least 3 of 4 pathogenic strains.
Criteria III: Criteria used for discriminating growth rate related gene content enrichments. The three growth rate categories (slow, medium and fast) contained 2,4, and 4 strains respectively. For a gene to be considered enriched in the fast category, a gene had to be preset in both fast strains and absent from at least 6 of 8 of the combined slow and medium strains. For a gene to be considered enriched in the medium category, a gene had to be preset in at least 3 of 4 medium strains and absent from at least 4 of 6 of the combined slow and fast strains. For a gene to be considered enriched in the slow category, a gene had to be present in at least 3 of 4 slow strains and absent from at least 4 of 6 of the combined medium and fast strains.
Criteria IV: Criteria used for discriminating early vs. late colonizer gene content enrichments. The two groups contain 6 strains each. Since one of the strains in the early group was also isolated in the late group (EDM123c). An asymmetrical enrichment profile was designed which required an early enriched gene to be present in at least 5 of 6 early strains and absent in at least 4 of 6 late strains. A gene enriched in the late colonizer group had to be present in at least 4 of 6 late strains and absent from at least 5 of 6 early strains.
Criteria V: Criteria used for discriminating pathogen vs. commensal gene content enrichments including 24 additional published E.coli genomes (Table 3). Since the two groups are of unequal size, a pathogen enriched gene had to present in at least 17 of 23 pathogenic strains and absent from at least 13 of 17 commensal strains. A commensal enriched gene had to be present in at least 13 of 17 commensal strains and absent from at least 17 of 23 pathogenic strains.
Criteria VI: Criteria used for discriminating pathogen vs. commensal gene content enrichments including one additional published enteropathogenic E.coli (EPEC) genome and 5 additional published commensal isolate genomes (Table 3). Since the two groups are of unequal size, a pathogen enriched gene had to present in at least 4 of 5 pathogenic strains and absent from at least 13 of 17 commensal strains. A commensal enriched gene had to be present in at least 13 of 17 commensal strains and absent from at least 4 of 5 pathogenic strains.
Figure 4Gene content enrichment comparing main clades 1 and 2. Enrichment analysis was carried out using criteria I (Table 2). Both enrichments were highly significant (p > 0.001). The distribution of possible strain group permutations is presented in Additional file 1: Figure S2.
Figure 5General comparison of the enrichment profiles of the strain categories. Each column is created from the gene enrichment list for each grouping (Additional file 16, Additional file 17, Additional file 18, Additional file 19, Additional file 20, Additional file 21, Additional file 22, Additional file 23 and Additional file 24). Each list of gene sequences was evaluated for ontology level 3 biological process categorization using Blast2GO for SEED assignments. The coloring scheme corresponds to enrichment scores assigned by Blast2GO. Grouping categories are shown on the x-axis, and the different comparisons are separated by white lines. Enrichment comparisons were performed between clade1 and clade2; pathogen and commensal; slow, medium and fast growth rates; early and late colonization. The color key indicates the enrichments scores for the biological processes.
List of publically available genomes used in gene content comparisons
| 1 | K12MG1655 | NC_000913.2 | | laboratory | [ |
| 2 | O42 | NC_017626.1 | EAEC | pathogenic | [ |
| 3 | 536 | NC_008253.1 | UPEC | pathogenic | [ |
| 4 | 55989 | NC_011748.1 | EAEC | pathogenic | [ |
| 5 | CFT073 | NC_004431.1 | UPEC | pathogenic | [ |
| 6 | E24377A | NC_009801.1 | ETEC | pathogenic | [ |
| 7 | H10407 | NC_017633.1 | ETEC | pathogenic | [ |
| 8 | HS | NC_009800.1 | | commensal | [ |
| 9 | IAI1 | NC_011741.1 | | commensal | [ |
| 10 | IAI39 | NC_011750.1 | UPEC | pathogenic | [ |
| 11 | LF82 | NC_011993.1 | AIEC | pathogenic | Direct Submission |
| 12 | UTI89 | NC_007946.1 | UPEC | pathogenic | [ |
| 13 | UMNO26 | NC_011751.1 | UPEC | pathogenic | [ |
| 14 | UMI46 | NC_017632.1 | AIEC | pathogenic | [ |
| 15 | SE15 | NC_013654.1 | | commensal | [ |
| 16 | SE11 | NC_011415.1 | | commensal | [ |
| 17 | ED1a | NC_011745.1 | | commensal | [ |
| 18 | S88 | NC_011742.1 | ExPEC | pathogenic | [ |
| 19 | O157:H7 | NC_002695.1 | EHEC | pathogenic | [ |
| 20 | O127:H5 | NC_011601.1 | EPEC | pathogenic | [ |
| 21 | O104:H4 | NC_018658.1 | EHEC | pathogenic | [ |
| 22 | O83:H1 | NC_017634.1 | AIEC | pathogenic | [ |
| 23 | O26:H11 | NC_013361.1 | EHEC | pathogenic | [ |
| 24 | O7:K1 | NC_017646.1 | ExPEC | pathogenic | [ |
| 25 | NA114 | NC_017644.1 | UPEC | pathogenic | Direct Submission |
EAEC – enteroaggregative E.coli, UPEC – uropathogenic E.coli, ETEC – enterotoxigenic E.coli, EPEC – enteropathogenic E.coli, AIEC – adherent-invasive E.coli, ExPEC – extraintestinal pathogenic E.coli, EHEC – enterohaemorrhagic E.coli.
Figure 6Multiple correspondence analysis of the gene content matrix. The plot shows principal coordinates along the two main components. Each point on the graph represents a gene and the color of the point relates the number of genomes in which it is present. The positions of the genome labels represent the relative distances of the genomes along the respective components.
Figure 7Gene content profiles of pathogenic and commensal strains. Enrichment analysis was carried out following criteria II (Table 2). 164 genes (p = 0.02) were found to be enriched in the pathogenic group while only 33 genes (p = 0.18) were enriched in the commensal group. The complete distributions of possible gene enrichments are presented in Additional file 1: Figure S3
Figure 8Ratio of anaerobic/aerobic generation times related to anaerobic generation times of IMPACT isolates (de Muinck et al. submitted). Circled strains are the ones for which we present genome sequences in this study. Blue circled strains are categorized as fast growers, green have a medium growth rate, and red circled strains are slow growing strains. R2 = 0.51, p < 0.0001.
Figure 9Gene content profiles of slow, medium and fast growing strains. Enrichment analysis was carried out using criteria III (Table 2). The fast category had 227 (p = 0.09) genes enriched. The medium growth rate category had only 46 (p = 0.56) genes enriched while the slow category had 324 (p = 0.04) genes. Distributions of possible enrichment profiles are shown in Additional file 1: Figure S4.
Figure 10Gene content enrichment profiles of early and late colonizer strains. Enrichment analysis was carried out using criteria IV (Table 2). Both early and late colonizers show significant enrichments (p = 0.02 and p = 0.05, respectively). The complete distributions of possible enrichment profiles are shown in Additional file 1: Figure S5. EDM1c is an early colonizer that is clonally related to the late colonizer EDM123c. EDM123c maintains the early colonizer genomic profile but has lost genes found in the early colonizer profile.
Figure 11Pan-genome tree of the 16 IMPACT isolates and 25 publicly available pathogenic and non-pathogenic isolates. Pathogenic isolates are labeled in red, commensals are labeled in black, K12MG1655 is labeled in green. Leaves labeled with triangles represent strains genome sequenced as part of this study (Table 1). Leaves labeled with circles represent publicly available E.coli genomes downloaded from Genbank (Table 3). The pan-genome tree was created using hierarchical clustering of a Manhattan distance matrix based on the gene presence/absence matrix. The scale below the pan-genome tree indicates Manhattan distances.
Figure 12Gene content profiles of enteropathogenic (EPEC) and commensal strains. This enrichment analysis includes 6 additional genomes, 5 commensals and 1 EPEC, downloaded from Genbank (Table 3). Enrichment analysis was carried out following criteria VI (Table 2). 86 genes (p < 0.01) were found to be enriched in the pathogenic group while only 3 genes (p = 0.35) were enriched in the commensal group. The complete distributions of possible gene enrichments are presented in Additional file 1: Figure S9.