| Literature DB >> 31842753 |
Guo Liang Gan1, Elijah Willie1, Cedric Chauve2,3, Leonid Chindelevitch4.
Abstract
BACKGROUND: Bacterial pathogens exhibit an impressive amount of genomic diversity. This diversity can be informative of evolutionary adaptations, host-pathogen interactions, and disease transmission patterns. However, capturing this diversity directly from biological samples is challenging.Entities:
Keywords: Bacterial diversity; Integer Linear Programming; Multi-Locus Sequence Typing
Mesh:
Year: 2019 PMID: 31842753 PMCID: PMC6915855 DOI: 10.1186/s12859-019-3204-8
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Fig. 1A dataset with two samples and an MLST scheme of three loci (genes clpA, clpX, nifS). The strain type distributions require 5 different strains as the strain (clpA_1,clpX_1, nifS_7) appears in both distributions
Average and standard deviation of precision, recall and TVD for each gene of the Borellia MLST scheme (B-MLST) and Kallisto, across all parameters combination
| Precision | clpA | clpX | nifS | pepX |
| B-MLST | 0.99 ±0.009 | 0.98 ±0.012 | 0.96 ±0.024 | 0.96 ±0.016 |
| Kallisto | 0.97 ±0.014 | 0.94 ±0.014 | 0.89 ±0.027 | 0.93 ±0.03 |
| Recall | ||||
| B-MLST | 0.95 ±0.022 | 0.94 ±0.027 | 0.90 ±0.05 | 0.94 ±0.034 |
| Kallisto | 0.99 ±0.004 | 0.99 ±0.005 | 0.99 ±0.003 | 0.99 ±0.006 |
| TVD | ||||
| B-MLST | 0.077 ±0.015 | 0.080 ±0.01 | 0.119 ±0.039 | 0.087 ±0.024 |
| Kallisto | 0.029 ±0.011 | 0.041 ±0.015 | 0.085 ±0.028 | 0.046 ±0.022 |
| Precision | pyrG | recG | rplB | uvrA |
| B-MLST | 0.97 ±0.024 | 0.98 ±0.013 | 0.99 ±0.007 | 0.98 ±0.011 |
| Kallisto | 0.93 ±0.02 | 0.89 ±0.021 | 0.95 ±0.012 | 0.93 ±0.023 |
| Recall | ||||
| B-MLST | 0.92 ±0.032 | 0.95 ±0.028 | 0.94 ±0.043 | 0.96 ±0.026 |
| Kallisto | 0.98 ±0.006 | 0.99 ±0.011 | 0.99 ±0.006 | 0.99 ±0.005 |
| TVD | ||||
| B-MLST | 0.110 ±0.019 | 0.082 ±0.028 | 0.089 ±0.03 | 0.069 ±0.02 |
| Kallisto | 0.0047 ±0.018 | 0.068 ±0.018 | 0.032 ±0.011 | 0.05 ±0.022 |
Average and standard deviation of different statistics for each evolutionary mechanisms
| Soft-Precision | Soft-Recall | EMD | Precision | Recall | TVD | |
| EM1 | 0.98 ±0.11 | 0.96 ±0.13 | 0.64 ±1.7 | 0.85 ±0.28 | 0.86 ±0.23 | 0.15 ±0.29 |
| EM2 | 0.96 ±0.12 | 0.98 ±0.076 | 0.71 ±1.18 | 0.81 ±0.21 | 0.88 ±0.14 | 0.17 ±0.22 |
| EM2e | 0.98 ±0.11 | 0.97 ±0.1 | 0.34 ±0.81 | 0.91 ±0.20 | 0.92 ±0.17 | 0.1 ±0.23 |
| EM2n | 0.96 ±0.13 | 0.95 ±0.12 | 0.6 ±1.35 | 0.86 ±0.23 | 0.88 ±0.16 | 0.14 ±0.25 |
| EM3 | 0.90 ±0.17 | 0.88 ±0.13 | 4.6 ±7.58 | 0.76 ±0.21 | 0.76 ±0.17 | 0.22 ±0.24 |
| ADP-Precision | ADP-Recall | ADP-TVD | ||||
| EM1 | 0.96 ±0.07 | 0.91 ±0.09 | 0.07 ±0.058 | |||
| EM2 | 0.93 ±0.07 | 0.91 ±0.07 | 0.26 ±0.16 | |||
| EM2e | 0.93 ±0.08 | 0.91 ±0.08 | 0.34 ±0.25 | |||
| EM2n | 0.92 ±0.09 | 0.9 ±0.09 | 0.34 ±0.25 | |||
| EM3 | 0.94 ±0.07 | 0.92 ±0.08 | 0.29 ±0.15 | |||
| Soft-Precision | Soft-Recall | EMD | Precision | Recall | TVD | |
| EM1 | 0.96 ±0.14 | 0.99 ±0.079 | 4.1 ±7.0 | 0.44 ±0.34 | 0.58 ±0.40 | 0.62 ±0.37 |
| EM2 | 0.79 ±0.21 | 0.91 ±0.16 | 68.8 ±74.6 | 0.32 ±0.19 | 0.44 ±0.27 | 0.78 ±0.2 |
| EM2e | 0.72 ±0.24 | 0.88 ±0.22 | 98.9 ±89.4 | 0.36 ±0.26 | 0.5 ±0.30 | 0.72 ±0.26 |
| EM2n | 0.76 ±0.23 | 0.9 ±0.19 | 98.6 ±90 | 0.36 ±0.25 | 0.52 ±0.30 | 0.71 ±0.24 |
| EM3 | 0.68 ±0.20 | 0.79 ±0.2 | 83.7 ±64 | 0.29 ±0.2 | 0.35 ±0.22 | 0.83 ±0.16 |
(Top) SDP simulation (Middle/Bottom) Full pipeline simulation: (Middle) ADP statistics, (Bottom) SDP statistics
Fig. 2Distribution of the number of existing and novel strains per tick sample
Fig. 3(Left) Cumulative proportion of the 10 existing strains in all 24 samples (within each bar, different colors represent different samples). (Right) Similar graph for the 60 novel strains