| Literature DB >> 34332528 |
Santiago Redondo-Salvo1, Roger Bartomeus-Peñalver1, Luis Vielva2, Kaitlin A Tagg3,4, Hattie E Webb3,4, Raúl Fernández-López1, Fernando de la Cruz5.
Abstract
BACKGROUND: Plasmids are mobile genetic elements, key in the dissemination of antibiotic resistance, virulence determinants and other adaptive traits in bacteria. Obtaining a robust method for plasmid classification is necessary to better understand the genetics and epidemiology of many pathogens. Until now, plasmid classification systems focused on specific traits, which limited their precision and universality. The definition of plasmid taxonomic units (PTUs), based on average nucleotide identity metrics, allows the generation of a universal plasmid classification scheme, applicable to all bacterial taxa. Here we present COPLA, a software able to assign plasmids to known and novel PTUs, based on their genomic sequence.Entities:
Keywords: Antibiotic resistance genes; Average nucleotide identity; Horizontal gene transfer; Plasmid; Plasmid epidemiology
Mesh:
Substances:
Year: 2021 PMID: 34332528 PMCID: PMC8325299 DOI: 10.1186/s12859-021-04299-x
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
COPLA accuracy for a set of 1000 plasmids
| Outcome | All samples | |||
|---|---|---|---|---|
| PTU correctly assigned | 230 (89%) | 88 (92%) | 107 (94%) | 935 (94%) |
| Cluster reconstructeda | 4 (2%) | 5 (5%) | 3 (3%) | 31 (3%) |
| Total correct predictions | 234 (91%) | 93 (97%) | 110 (96%) | 966 (97%) |
1000 plasmids were randomly removed from the reference dataset (RefSeq84), consisting on 9894 plasmids. COPLA was run using each of these plasmids as query. The table indicates the number of cases (and percentage) for each prediction outcome for all samples and selected bacterial orders. The individual results can be found in Additional file 2
aAs a result of the elimination of 1000 plasmids from the RefSeq84 sHSBM network, some PTUs containing 4–5 members fell below the threshold for PTU definition (at least four members). Thus, when a plasmid was assigned to one of these clusters, a PTU assignation did not follow (or resulted in a “new PTU assignation”), but in fact the result was correct, since the plasmid was assigned to the correct cluster
Benchmark for 1000 new plasmids of RefSeq200 dataset for the most abundant bacterial orders
| Outcome | All samples | |||
|---|---|---|---|---|
| PTU assigned | ||||
| Cases | 259 (63%) | 40 (46%) | 25 (30%) | 408 (41%) |
| Score | [0.98 ± 0.06] | [0.94 ± 0.11] | [0.96 ± 0.1] | [0.97 ± 0.08] |
| New PTU | ||||
| Cases | 19 (5%) | 6 (7%) | 2 (2%) | 41 (4%) |
| Score | [0.84 ± 0.22] | [0.89 ± 0.17] | [0.75 ± 0.35] | [0.83 ± 0.24] |
| Not assigned | ||||
| Cases | 131 (32%) | 41 (47%) | 55 (67%) | 551 (55%) |
| Score | [0.99 ± 0.05] | [1 ± 0.01] | [1 ± 0.0] | [1 ± 0.05] |
Number of cases (and percentage) for each prediction outcome. Mean and standard deviation of the prediction scores for each outcome class are additionally provided (in square brackets). More detailed results in Additional file 4
Fig. 1Score distribution for 1000 plasmids sampled from RefSeq200, not present in the COPLA reference database (RefSeq84). The figure displays a semilogarithmic plot of the number of plasmids resulting in each given score
Fig. 2Representative prediction outcomes. The query plasmid is represented by the node with the red inner circle. For all other nodes, the color of the inner circle represents the PTU assigned in the reference database (i.e. using only RefSeq84 plasmids). The outer ring colors represent the PTU assigned by COPLA. Yellow represents the PTU assigned to the query, green corresponds to nodes belonging to a different PTU, and grey represents not assigned PTUs. Case 1: the query represents a singleton. Case 2: the query belongs to a cluster with one or two members. A PTU cannot be assigned. Case 3: the query belongs to a cluster with three members. COPLA predicts a “new putative PTU”. Case 4: the query links together isolated plasmids to organize a 4-member cluster. COPLA predicts a “new putative PTU”. Case 5: the query clusters with the members of a known PTU. COPLA predicts that query belongs to that PTU. Case 6: the query links peripherally to a cluster corresponding to a known PTU. However, either the number of connections is not enough to fulfill the intercluster density rule, or the size of the query is not compatible to that of the PTU (see “Building the PTU reference catalog” in Implementation). COPLA output indicates that no PTU can be assigned to the query. Case 7: the query links peripherally to a cluster corresponding to a known PTU. The query organizes a subcluster of four members that does not fulfill the rules to integrate in the PTU. COPLA output predicts a “new putative PTU”. Case 8: As in case 7, the query organizes a subcluster that does not fulfill the rules to integrate in the PTU. Furthermore, it drags one member of the PTU to the new cluster. COPLA output predicts a “new putative PTU”. Case 9: the query significantly alters the structure of a known PTU. COPLA output predicts a “new putative PTU”. It also warns that “query is related to PTU-… plasmids”. See additional details and explanations in the main text (Discussion)