| Literature DB >> 29922519 |
Ignacio Ferrés1, Gregorio Iraola1,2.
Abstract
Multilocus sequence typing (MLST) is a standard tool in population genetics and bacterial epidemiology that assesses the genetic variation present in a reduced number of housekeeping genes (typically seven) along the genome. This methodology assigns arbitrary integer identifiers to genetic variations at these loci which allows us to efficiently compare bacterial isolates using allele-based methods. Now, the increasing availability of whole-genome sequences for hundreds to thousands of strains from the same bacterial species has allowed us to apply and extend MLST schemes by automatic extraction of allele information from the genomes. The PubMLST database is the most comprehensive resource of described schemes available for a wide variety of species. Here we present MLSTar as the first R package that allows us to (i) connect with the PubMLST database to select a target scheme, (ii) screen a desired set of genomes to assign alleles and sequence types, and (iii) interact with other widely used R packages to analyze and produce graphical representations of the data. We applied MLSTar to analyze more than 2,500 bacterial genomes from different species, showing great accuracy, and comparable performance with previously published command-line tools. MLSTar can be freely downloaded from http://github.com/iferres/MLSTar.Entities:
Keywords: Bacterial genomes; MLST; Microbial genomics; Multilocus genotyping; PubMLST; R package
Year: 2018 PMID: 29922519 PMCID: PMC6005169 DOI: 10.7717/peerj.5098
Source DB: PubMed Journal: PeerJ ISSN: 2167-8359 Impact factor: 2.984
Figure 1Main steps in MLSTar workflow.
Accuracy of MLSTar against reference alleles and STs obtained from BIGSdb, measured as the percentage of correct calls in seven-locus MLST schemes from 11 different pathogens comprising a total of 3,021 genomes.
| Species | Genomes | Scheme | |||||||
|---|---|---|---|---|---|---|---|---|---|
| 66 | ST | ||||||||
| 96.7 | 96.7 | 96.7 | 96.7 | 96.7 | 95 | 96.7 | 95 | ||
| 72 | ST | ||||||||
| 94.4 | 94.4 | 94.5 | 95.3 | 94.4 | 95.2 | 99.4 | 93.1 | ||
| 79 | ST | ||||||||
| 97.5 | 96.2 | 98.7 | 97.5 | 98.7 | 97.5 | 97.5 | 93.7 | ||
| 115 | ST | ||||||||
| 98.3 | 100 | 100 | 100 | 100 | 96.5 | 98.2 | 93.9 | ||
| 176 | ST | ||||||||
| 100 | 99 | 100 | 100 | 100 | 100 | 100 | 99 | ||
| 225 | ST | ||||||||
| 98.7 | 96 | 93 | 96 | 96.9 | 95.6 | 96 | 93 | ||
| 258 | ST | ||||||||
| 99.2 | 99.6 | 99.2 | 99.2 | 99.2 | 99.6 | 99.6 | 98.1 | ||
| 284 | ST | ||||||||
| 100 | 100 | 100 | 100 | 100 | 100 | 100 | 100 | ||
| 604 | ST | ||||||||
| 96.4 | 98.8 | 98.1 | 98.3 | 98.1 | 98.3 | 98.8 | 95.9 | ||
| 847 | ST | ||||||||
| 98.6 | 97.4 | 99.3 | 99.2 | 97.3 | 99.1 | 98.7 | 94.9 | ||
Figure 2Phylogeny based on ribosomal alleles.
Staphylococcus aureus (red) and Streptococcus agalactiae (blue) genomes from the BIGSdb (n = 356) were characterized using the universal rMLST scheme (based on 53 ribosomal genes). The phylogenetic tree was automatically generated with the plot.mlst() function using the Neighbor-Joining algorithm from a distance matrix obtained from allele patterns.
Figure 3Comparison of MLSTar performance.
(A) Comparison of MLSTar, MLSTcheck and mlst softwares using a dataset of 10 Salmonella genomes de novo assembled at variable coverage depths. (B) Comparison of MLSTar, MLSTcheck, and mlst running times on a single CPU using increasing number of genomes.