| Literature DB >> 25522143 |
Val F Lanza1, María de Toro1, M Pilar Garcillán-Barcia1, Azucena Mora2, Jorge Blanco2, Teresa M Coque3, Fernando de la Cruz1.
Abstract
Bacterial whole genome sequence (WGS) methods are rapidly overtaking classical sequence analysis. Many bacterial sequencing projects focus on mobilome changes, since macroevolutionary events, such as the acquisition or loss of mobile genetic elements, mainly plasmids, play essential roles in adaptive evolution. Existing WGS analysis protocols do not assort contigs between plasmids and the main chromosome, thus hampering full analysis of plasmid sequences. We developed a method (called plasmid constellation networks or PLACNET) that identifies, visualizes and analyzes plasmids in WGS projects by creating a network of contig interactions, thus allowing comprehensive plasmid analysis within WGS datasets. The workflow of the method is based on three types of data: assembly information (including scaffold links and coverage), comparison to reference sequences and plasmid-diagnostic sequence features. The resulting network is pruned by expert analysis, to eliminate confounding data, and implemented in a Cytoscape-based graphic representation. To demonstrate PLACNET sensitivity and efficacy, the plasmidome of the Escherichia coli lineage ST131 was analyzed. ST131 is a globally spread clonal group of extraintestinal pathogenic E. coli (ExPEC), comprising different sublineages with ability to acquire and spread antibiotic resistance and virulence genes via plasmids. Results show that plasmids flux in the evolution of this lineage, which is wide open for plasmid exchange. MOBF12/IncF plasmids were pervasive, adding just by themselves more than 350 protein families to the ST131 pangenome. Nearly 50% of the most frequent γ-proteobacterial plasmid groups were found to be present in our limited sample of ten analyzed ST131 genomes, which represent the main ST131 sublineages.Entities:
Mesh:
Substances:
Year: 2014 PMID: 25522143 PMCID: PMC4270462 DOI: 10.1371/journal.pgen.1004766
Source DB: PubMed Journal: PLoS Genet ISSN: 1553-7390 Impact factor: 5.917
Figure 1Phylogenetic tree of ST131 E. coli.
The tree is based on a 3,629,034 bp core genome (3,734 orthologous genes: 90% identity and 90% coverage) and 100 bootstrapping replicates. ST131 clades are named according to [16] and further subdivided and colored according to virotypes [36]: virotype A (blue), virotype B (yellow), virotype C (pink), virotype D (green). Virotype classification is based on the presence/absence of four putative virulence factors: afaFM955459 (encoding an Afa/Dr adhesin), sat (secreted autotransporter toxin, present in PAI-CFT073-pheV), ibeA (invasion of brain endothelium) and iroN (salmochelin siderophore receptor). The commensal ST131 strain SE15 was used to root the tree (virotype non typable; serotype O150 in the original publication [91] but lying within the H41 cluster in the phylogenomic study of [16]). Given SNP numbers are approximate averages of individual comparisons.
Figure 2PLACNET plasmid reconstruction of ST131 genome E61BA (ST9/H324/virotype D).
The network contains nodes of two different colors (blue for contigs, grey for reference genomes). The size of reference nodes is always the same. The size of the contig nodes is proportional to the contig length. Besides, outlines are yellow for contigs containing RIP proteins, red for relaxases and green for both proteins. Edges are either solid (scaffold links) of dotted (homologous references). The length of the edges is arbitrarily selected by Cytoscape algorithm. In the upper left, the network output (original network) is shown, which resulted from automatic reference search, scaffold links and protein tagging rules. The original network was converted to a pruned network by eliminating contigs smaller than 200 bp and duplicating specific hubs (red arrows). Two contigs could not be assigned for lack of scaffold links: a 2,953 bp contig (putative DNA primase + lytic transglycosylase) and a 1,301 bp contig (TrbI + TraB-partial). Closed plasmids (e.g., pE61BA_2, size: 24,447 bp) are shown with a black outline in the final PLACNET network.
Summary of plasmid content.
| Genome |
| Strain virotype | MOBF12/IncF | Phage-related/RepFIB | MOBP12/IncI | MOBP6/IncI2 | MOBF11/IncN | MOBP3/IncX | MOBP11/IncP1 | MOBC12
| MOBP5/ColE1-like | MOBQu
| MOBQ12
| no-MOB |
|
|
| A | pFV9873_5 (91.4 Kb; ΔTraI) | pFV9873_4 (33.3 Kb) | pFV9873_1 (4.1 Kb) | pFV9873_6 (5.2 Kb) | pFV9873_2 (2.2 Kb); pFV9873_3 (4.6 Kb) | |||||||
|
|
| A | pBIDMC38_5 (123 Kb) | pBIDMC38_1 (11.8 Kb) | pBIDMC38_4 (4.2 Kb) | pBIDMC38_2 (5.3 Kb) | pBIDMC38_3 (1.6 Kb) | |||||||
|
|
| B | p35BA_2+3 (211 Kb) | IME_E35BA (14.2 Kb) | pE35BA_1 (4.1 Kb) | |||||||||
|
|
| C | pE2022_2 (103 Kb) | pE2022_1 (98.3 Kb) | pE2022_3 (35.0 Kb) | pE2022_4 (4.1 Kb) | pE2022_5 (2.2 Kb) | |||||||
|
|
| C | pBIDMC20B_1 (128 Kb, ΔTraI | pBIDMC20B_2 (109 Kb) | ||||||||||
|
|
| C | pBWH24_1 (123 Kb, ΔTraI | pBWH24_2 (109 Kb) | pBWH24_3 (60.3 Kb) | |||||||||
|
|
| C | pJJ1886-5 (110 Kb ΔTraI) | pJJ1886-4 (55.9 Kb) | pJJ1886-3 (5.6 Kb) | pJJ1886-2 (5.2 Kb) | pJJ1886-1 (1.6 Kb) | |||||||
|
|
| D | pE61BA_1 (137 Kb) | pE61BA_7 (37.9 Kb) | pE61BA_4 (18.3 Kb) | pE61BA_2 (24.5 Kb) | pE61BA_5 (6.5 Kb); pE61BA_6 (6.9 Kb) | pE61BA_3 (5.5 Kb) | ||||||
|
|
| D | pHVH177_1 (78.6 Kb) | |||||||||||
|
|
| Commensal | pECSF1 (122 Kb) | |||||||||||
|
| pEK516 (64.5 Kb, ΔTra); pEK499 (117 Kb, ΔTra); pJIE186-2 | pEK204 (93.7 Kb) | pKC394 (53.2 Kb); pKC396 (44.2 Kb); pNDM-ECS01 (41.2 Kb); pECN580 (64.9 Kb) | pJIE143 (34.3 Kb) | ||||||||||
Plasmid references: pEK516 ([93]; EU935738); pEK499 ([93]; EU935739); pJIE186-2 ([56]; NC_020271); pGUE-NDM (in [119]; JQ364967); pEK204 ([93]; EU935740); pKC394 ([120]; HM138652); pKC396 ([120], HM138653); pNDM-ECS01 (Unpublished; KJ413946); pECN580 ([121]; KF914891); pJIE143 ([94]; JN194214).
pBIDMC20B_1 and pBWH24_1 plasmids lacked the REL domain of the TrwC protein.
Plasmid pJIE186-2 was isolated from strain JIE186 [94], although GenBank acc. n° NC_020271 specifies it is located at EC958 strain. Strain JIE186 also contains plasmid pJIE186-1, not included in this study as it is not available at public DBs.
No correlation with RIP typing methods.
Figure 3Hierarchical clustering dendrogram of ST131 plasmids.
The UPGMA dendrogram was based on protein cluster analysis using 60% sequence identity and 80% coverage. Plasmid names are colored according to their clade, taking into account ST, fimH allele and virotype, following the color code shown at the upper right. The five plasmid names in black correspond to previously sequenced plasmids from ST131 strains. Different color backgrounds are shown to emphasize branches of related plasmids. To the right of the dendrogram, four columns show, respectively, plasmid size, MOB type, RIP type and Inc type.
Figure 4Hierarchical clustering dendrogram of ST131 plasmids and relevant references.
The left dendrogram shows the complete tree, with references. Dendrogram construction and color codes are as in Fig. 4. The right dendrogram expands the MOBF12/IncF branch, with new background colors highlighting plasmid groups within this branch that are mentioned in the text.
Figure 5MOBF12/IncF plasmid analysis.
Protein cluster analysis was performed with kClust software (parameters: 30% identity, 50% coverage) on the set of 14 plasmids shown in Table 4. Plasmid pGUE-NDM [119] was excluded from this comparison since it is only distantly related to the others (see dendrogram in Fig. 5). A total of 354 protein clusters were obtained and annotated versus the NCBI protein database (Blastp). Manual inspection was carried out to classify the reference proteins of each cluster into one of these three groups (comparative analysis shown with BRIG): (i) Backbone and metabolic proteins (panel A); (ii) Virulence and Antibiotic resistance proteins (panel B); and (iii) ISs and hypothetical proteins (not shown).
Resistance genes and virulence determinants in MOBF12 plasmids.
| Plasmid | Strain Virotype | Size (Kb) | RIP (FAB formula) | Antibiotic resistance genes | Virulence genes |
| pFV9873_5 | A | 91.4 | RepFIIA-RepFIA (F2:A1:B-) |
| None-detected |
| pBIDMC38_5 | A | 123 | RepFIIA-RepFIA (F2:A1:B-) |
|
|
| pE2022_2 | C | 103 | RepFIIA-RepFIA (F2:A1:B-) |
|
|
| pBIDMC20B_1 | C | 128 | RepFIIA-RepFIA (F2:A1:B-) |
|
|
| pBWH24_1 | C | 123 | RepFIIA-RepFIA (F2:A1:B-) |
|
|
| pJJ1886-5 | C | 110 | RepFIIA-RepFIA (F2:A1:B-) |
| None-detected |
| pEK499 | - | 117 | RepFIIA-RepFIA (F2:A1:B-) |
| None-detected |
| pEK516 | - | 64.5 | RepFIIA (F2:A-:B-) |
| None-detected |
| pGUE-NDM | - | 87.0 | RepFIIA (F2:A-:B-) |
|
|
| pE61BA_1 | D | 137 | RepFIIA-RepFIB (F2:A-:B1) |
|
|
| pECSF1 | Commensal | 122 | RepFIIA-RepFIB (F29:A-:B10) | None-detected |
|
| pE35BA_2+3 | B | 211 | RepFIA-RepFIB (F-:A2:B1) |
|
|
| pJIE186-2 | - | 138 | RepFIB (F-:A-:B1) | None-detected |
|
| pHVH177_1 | D | 78.6 | RepFIB (F-:A-:B31) | None-detected |
|
FAB formula according to http://pubmlst.org/plasmid/classification scheme [9].
According to the ARG-annot database (>90% amino acid identity) [http://en.mediterranee-infection.com].
According to our in-house database (>90% amino acid identity).
aac(6′)-Ib-cr-like presents the Glu72Gly additional mutation.
In the original paper [93] dfrA7 is reported, instead of dfrA17. However, inspection of its amino acid sequence indicates it is a DfrA17 protein.
Genomes assembled in this study.
| Strain | ID | N° Libraries | Read length | N°Contigs | Total bp | N50 | Kmer |
| HVH177 | SRS399685 | 2 | 101 | 81 | 5035548 | 242711 | 83 |
| BIDMC20B | SRS420795 | 3 | 101 | 115 | 5311918 | 209342 | 91 |
| BWH24 | SRS420803 | 2 | 101 | 114 | 5369063 | 192138 | 83 |
| BIDMC38 | SRS420798 | 4 | 101 | 135 | 5226831 | 190988 | 91 |
| FV9873 | ERS450218 | 3 | 71 | 262 | 5160060 | 153685 | 55 |
| E35BA | ERS450219 | 3 | 71 | 419 | 5243070 | 159702 | 57 |
| E2022 | ERS450220 | 2 | 71 | 346 | 5296607 | 159635 | 53 |
| E61BA | ERS450221 | 3 | 71 | 246 | 5168482 | 198396 | 57 |
Human E. coli ST131 genomes analyzed in this work.
| Strain | Accession | Location | Collection date | Isolation source | Plasmid name (Accession number) | Reference |
| HVH-177 | PRJNA186205 | Denmark | 2003 | Blood | pHVH177_1 | PRJNA186413 |
| BIDMC20B | PRJNA202031 | USA | - | Urine | pBIDMC20B_1and _2 | PRJNA202876 |
| BWH24 | PRJNA201983 | USA | - | - | pBWH24_1 to _3 | PRJNA202876 |
| BIDMC38 | PRJNA202050 | USA | 2012 | - | pBIDMC38_1 to _5 | PRJNA202876 |
| FV9873 | PRJEB6262 | Spain | 2007 | Urine | pFV9873_1 to _6 | This study |
| E35BA | PRJEB6262 | Spain | 2008 | Urine | pE35BA_1 to _3, IME_E35BA | This study |
| E2022 | PRJEB6262 | Spain | 2006 | Urine | pE2022_1 to _5 | This study |
| E61BA | PRJEB6262 | Spain | 2008 | Abscess | pE61BA_1 to _7 | This study |
| SE15 | AP009378 | Japan | - | Feces | pECSF1 (AP009379) |
|
| JJ1886 | NC_022648.1 | USA | - | Urine | pJJ1886_1 to _5 (NC_022649; NC_022650; NC_022651; NC_022661; NC_022662) |
|
PRJNA and PRJEB6262 accession numbers correspond to SRA datasets. AP009378 and NC_022648.1 correspond to finished genomes.
Plasmids derived from this study are named according to Table 1.
Figure 6PLACNET flow diagram.
The diagram represents the PLACNET workflow to analyze an Illumina bacterial genome dataset. It can be separated in two sub-process: network delineation and plasmid analysis. Network delineation consists on contig assembly, determination of scaffold interactions, reference search of homologous genomes and plasmid protein prediction. Plasmid analysis basically consists in the construction of a dendrogram of plasmid protein profiles, which identifies the most relevant reference sequences, followed by plasmid cluster analysis, which compares query plasmids with its closest references. Plasmid analysis is a feedback process that helps to resolve uncertainties and results in a final definition of plasmid and chromosome content.