| Literature DB >> 27602279 |
Rediat Tewolde1, Timothy Dallman2, Ulf Schaefer1, Carmen L Sheppard3, Philip Ashton2, Bruno Pichon4, Matthew Ellington4, Craig Swift2, Jonathan Green1, Anthony Underwood1.
Abstract
Multilocus sequence typing (MLST) is an effective method to describe bacterial populations. Conventionally, MLST involves Polymerase Chain Reaction (PCR) amplification of housekeeping genes followed by Sanger DNA sequencing. Public Health England (PHE) is in the process of replacing the conventional MLST methodology with a method based on short read sequence data derived from Whole Genome Sequencing (WGS). This paper reports the comparison of the reliability of MLST results derived from WGS data, comparing mapping and assembly-based approaches to conventional methods using 323 bacterial genomes of diverse species. The sensitivity of the two WGS based methods were further investigated with 26 mixed and 29 low coverage genomic data sets from Salmonella enteridis and Streptococcus pneumoniae. Of the 323 samples, 92.9% (n = 300), 97.5% (n = 315) and 99.7% (n = 322) full MLST profiles were derived by the conventional method, assembly- and mapping-based approaches, respectively. The concordance between samples that were typed by conventional (92.9%) and both WGS methods was 100%. From the 55 mixed and low coverage genomes, 89.1% (n = 49) and 67.3% (n = 37) full MLST profiles were derived from the mapping and assembly based approaches, respectively. In conclusion, deriving MLST from WGS data is more sensitive than the conventional method. When comparing WGS based methods, the mapping based approach was the most sensitive. In addition, the mapping based approach described here derives quality metrics, which are difficult to determine quantitatively using conventional and WGS-assembly based approaches.Entities:
Keywords: Assembly-based approach; Mapping-based approach; Multilocus sequence typing; Whole genome sequencing
Year: 2016 PMID: 27602279 PMCID: PMC4991843 DOI: 10.7717/peerj.2308
Source DB: PubMed Journal: PeerJ ISSN: 2167-8359 Impact factor: 2.984
WGS-based MLST results derived from Salmonella isolates mixed with other bacteria.
| K-mer identification of primary sample | K-mer identification of secondary sample | ST derived from | |
|---|---|---|---|
| WGS-mapping approach (MOST) | WGS-assembly based (BIGSdb) | ||
| 48 | Undetermined | ||
| 15 | Undetermined | ||
| 19 | 19 | ||
| 198 | Undetermined | ||
| 897 | 897 | ||
| 46 | 46 | ||
| 16 | Undetermined | ||
| 414 | 414 | ||
| Novel allele | 11 | ||
| 515 | 515 | ||
| 16 | Undetermined | ||
| 543 | 543 | ||
| 34 | Undetermined | ||
| 34 | Undetermined | ||
WGS-based MLST results derived from DNA of different S. pneumonaie types mixed in different ratios.
| Max percentage non-consensus base values derived from MOST software | ST derived from | ||
|---|---|---|---|
| WGS-mapping approach (MOST) | WGS-assembly based (BIGSdb) | ||
| 90% ST 4149: 10% ST 5006 | 17.2 | 4149 | Undetermined |
| 80% ST 4149: 20% ST 5006 | 31.0 | 4149 | 4149 |
| 70 % ST 4149: 30% ST 5006 | 40.5 | 4149 | Undetermined |
| 60% ST 4149: 40% ST 5006 | 49.4 | 4149 | Undetermined |
| 50 % ST 4149: 50% ST 5006 | 50.3 | Novel allele | Undetermined |
| 75% ST 1012: 25% ST 2865 | 37.9 | 1012 | Undetermined |
| 50% ST 1012: 50% ST 2865 | 48.2 | Novel allele | Undetermined |
| 75% ST 7181: 25% ST 7219 | 31.7 | 7181 | 7181 |
| 50% ST 7181: 50% ST 7219 | 47.4 | 7219 | 7219 |
| 50% ST 7219: 25% ST 2865: 25% ST 5316 | 49.6 | Novel allele | Undetermined |
| 50% ST 5316: 50% ST 574 | 49.4 | Novel allele | Undetermined |
| 25% ST 5316: 25% ST 123: 25% ST 7219: 25% ST 574 | 46.7 | *NOVEL ST. (no SLV) | Undetermined |
MLST results derived using conventional method and WGS.
| Workflow names | Number of samples | Total number of full MLST results derived from | ||
|---|---|---|---|---|
| WGS-mapping approach (MOST) | WGS-assembly based (BIGSdb) | Conventional method | ||
| 120 | 119 | 112 | 99 | |
| 98 | 98 | 98 | 96 | |
| 105 | 105 | 105 | 105 | |
| Intra species | 12 | 7 | 3 | nt |
| Mixed | 14 | 13 | 7 | nt |
| Low coverage genomic | 29 | 29 | 27 | nt |
Notes.
nt indicates samples not tested.