| Literature DB >> 28674428 |
Alan K L Tsang1, Hwei Huih Lee1, Siu-Ming Yiu2, Susanna K P Lau3,4,5,6,7, Patrick C Y Woo8,9,10,11,12.
Abstract
Although multilocus sequence typing (MLST) is highly discriminatory and useful for outbreak investigations and epidemiological surveillance, it has always been controversial whether clustering and phylogeny inferred from the MLST gene loci can represent the real phylogeny of bacterial strains. In this study, we compare the phylogenetic trees constructed using three approaches, (1) concatenated blocks of homologous sequence shared between the bacterial genomes, (2) genome single-nucleotide polymorphisms (SNP) profile and (3) concatenated nucleotide sequences of gene loci in the corresponding MLST schemes, for 10 bacterial species with >30 complete genome sequences available. Major differences in strain clustering at more than one position were observed between the phylogeny inferred using genome/SNP data and MLST for all 10 bacterial species. Shimodaira-Hasegawa test revealed significant difference between the topologies of the genome and MLST trees for nine of the 10 bacterial species, and significant difference between the topologies of the SNP and MLST trees were present for all 10 bacterial species. Matching Clusters and R-F Clusters metrics showed that the distances between the genome/SNP and MLST trees were larger than those between the SNP and genome trees. Phylogeny inferred from MLST failed to represent genome phylogeny with the same bacterial species.Entities:
Mesh:
Year: 2017 PMID: 28674428 PMCID: PMC5495804 DOI: 10.1038/s41598-017-04707-4
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Information of outgroups and models used to construct maximum likelihood phylogenetic trees for each bacterial species in this study.
| Bacteria | Outgroup (GenBank accession no.) | Substitution models for MLST treesa |
|---|---|---|
|
|
| TIM1 + I + G |
|
|
| TIM1 + I + G |
|
|
| TIM3 + I + G |
|
|
| TIM3 + I + G |
|
|
| GTR + I + G |
|
|
| GTR + I + G |
|
| ||
|
| ||
|
|
| TIM1 + I + G |
|
|
| TIM3 + I + G |
|
|
| GTR + I + G |
|
|
| TrN + I + G |
|
|
| TrN + I + G |
aG, gamma distributed rate of heterogeneity; I, proportion of invariant sites.
Figure 1Comparison of phylogenetic trees constructed using genome data (left), SNP data (middle), and MLST data (right) for Staphylococcus aureus. Clusters that were manually selected based on the genome trees are illustrated in different colors. The unresolved polytomies are shaded in blue. A new sequence type is represented by a dash (“ST-”).
Figure 10Comparison of phylogenetic trees constructed using genome data (left), SNP data (middle), and MLST data (right) for Helicobacter pylori. Clusters that were manually selected based on the genome trees are illustrated in different colors. A new sequence type is represented by a dash (“ST-”).
Figure 9Comparison of phylogenetic trees constructed using genome data (left), SNP data (middle), and MLST data (right) for Streptococcus pyogenes. Clusters that were manually selected based on the genome trees are illustrated in different colors. The unresolved polytomies are shaded in blue. A new sequence type is represented by a dash (“ST-”).
Comparison by Shimodaira-Hasegawa test of log-likelihood scores between genome/SNP and MLST trees of the 10 bacterial species.
| Species | MLST Tree –ln L | Genome vs MLST | SNP vs MLST | ||||
|---|---|---|---|---|---|---|---|
| Genome Tree –ln L | Diff –ln L | P value | SNP –ln L | Diff –ln L | P value | ||
|
| 5994.26511 | 6164.70585 | 170.44074 | 0.002* | 6190.62314 | 196.35803 | 0.002* |
|
| 7762.65452 | 8088.5789 | 325.40337 | 0.000* | 8112.14513 | 349.49061 | 0.000* |
|
| 7810.81726 | 7844.19200 | 33.37474 | 0.065 | 7857.47013 | 46.65286 | 0.032* |
|
| 9412.75447 | 9845.31931 | 432.56484 | 0.000* | 9836.13458 | 423.38011 | 0.000* |
|
| 12615.10225 | 13363.74929 | 748.64704 | 0.000* | 13428.07040 | 812.96814 | 0.000* |
|
| 23840.02481 | 24459.54406 | 619.51925 | 0.000* | 24423.92980 | 583.90499 | 0.000* |
|
| 5660.92269 | 5770.77356 | 109.85087 | 0.004* | 5767.36723 | 106.44455 | 0.003* |
|
| 8418.29236 | 8563.19518 | 144.90282 | 0.002* | 8541.07333 | 122.78098 | 0.006* |
|
| 8146.48997 | 8269.76377 | 123.27381 | 0.000* | 8288.49580 | 142.00584 | 0.000* |
|
| 11447.21256 | 11769.73347 | 322.52092 | 0.000* | 11787.43639 | 340.2283 | 0.000* |
|
| 7481.09682 | 7682.21034 | 201.11352 | 0.000* | 7622.24838 | 141.15156 | 0.001* |
*P < 0.05.
Tree distances among phylogenies inferred using different approaches.
| Species | Genome vs MLST | SNP vs MLST | Genome vs SNP | |||
|---|---|---|---|---|---|---|
| Matching cluster | R-F cluster | Matching cluster | R-F cluster | Matching cluster | R-F cluster | |
|
| 143 | 28 | 176 | 31 | 55 | 9 |
|
| 156 | 25 | 176 | 24.5 | 62 | 7.5 |
|
| 436 | 57.5 | 466 | 58.5 | 102 | 18 |
|
| 564 | 76.5 | 491 | 72.5 | 307 | 27 |
|
| 531 | 74.5 | 564 | 70.5 | 307 | 27 |
|
| 257 | 45 | 212 | 42 | 171 | 32 |
|
| 147 | 26 | 161 | 27 | 60 | 9 |
|
| 112 | 28 | 124 | 28 | 48 | 8 |
|
| 444 | 48 | 372 | 46.5 | 114 | 11.5 |
|
| 1287 | 98.5 | 1242 | 97 | 214 | 31.5 |
|
| 95 | 17 | 127 | 17 | 72 | 7 |