| Literature DB >> 30107785 |
Benjamin Sobkowiak1, Judith R Glynn2, Rein M G J Houben2,3, Kim Mallard4, Jody E Phelan4, José Afonso Guerra-Assunção4,5, Louis Banda6, Themba Mzembe6, Miguel Viveiros7, Ruth McNerney4, Julian Parkhill8, Amelia C Crampin2,6, Taane G Clark4,2.
Abstract
BACKGROUND: Mixed, polyclonal Mycobacterium tuberculosis infection occurs in natural populations. Developing an effective method for detecting such cases is important in measuring the success of treatment and reconstruction of transmission between patients. Using whole genome sequence (WGS) data, we assess two methods for detecting mixed infection: (i) a combination of the number of heterozygous sites and the proportion of heterozygous sites to total SNPs, and (ii) Bayesian model-based clustering of allele frequencies from sequencing reads at heterozygous sites.Entities:
Keywords: Bioinformatics; Epidemiology; Genomic analysis; Mixed infection; Mycobacterium tuberculosis; Tuberculosis
Mesh:
Substances:
Year: 2018 PMID: 30107785 PMCID: PMC6092779 DOI: 10.1186/s12864-018-4988-z
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Detection of artificially mixed infections using the number of heterozygous SNPs and Bayesian model-based clustering analysis methods. Strain information, known mixture proportions and average coverage across the genome are also shown. The number of heterozygous SNPs in each sample is presented with the total number of different distinct coding and non-coding regions in which the SNPs are present
| Sample information | Heterozygous sites to total SNP proportion | Bayesian model-based clustering | |||||||
|---|---|---|---|---|---|---|---|---|---|
| Sample identifier | Major strain proportion | Major/minor strain lineage | Major/minor strain spoligotype | Average coverage | No. heterozygous SNPs (Total no. gene regions) | No. total SNPs | Proportion het-total SNPs (%) | No. of strains | Major strain proportion |
| ERR221663 |
|
|
|
|
|
|
|
|
|
| ERR221662 |
|
|
|
|
|
|
|
|
|
| ERR221641 |
|
|
|
|
|
|
|
|
|
| ERR221643 |
|
|
|
|
|
|
|
|
|
| ERR221656 |
|
|
|
|
|
|
|
|
|
| ERR221659 |
|
|
|
|
|
|
|
|
|
| ERR221623 |
|
|
|
|
|
|
|
|
|
| ERR221627 |
|
|
|
|
|
|
|
|
|
| ERR221647 |
|
|
|
|
|
|
|
|
|
| ERR221651 |
|
|
|
|
|
|
|
|
|
| ERR221630 |
|
|
|
|
|
|
|
|
|
| ERR221628 |
|
|
|
|
|
|
|
|
|
| ERR221660 |
|
|
|
|
|
|
|
|
|
| ERR221620 |
|
|
|
|
|
|
|
|
|
| ERR221636 |
|
|
|
|
|
|
|
|
|
| ERR221665 |
|
|
|
|
|
|
|
|
|
| ERR221640 |
|
|
|
|
|
|
|
|
|
| ERR221654 |
|
|
|
|
|
|
|
|
|
| ERR221644 |
|
|
|
|
|
|
|
|
|
| ERR221625 |
|
|
|
|
|
|
|
|
|
| ERR221652 |
|
|
|
|
|
|
|
|
|
| ERR221634 |
|
|
|
|
|
|
|
|
|
| ERR221629 |
|
|
|
|
|
|
|
|
|
| ERR221649 |
|
|
|
|
|
|
|
|
|
| ERR221635 |
|
|
|
|
|
|
|
|
|
| ERR221632 |
|
|
|
|
|
|
|
|
|
| ERR221653 |
|
|
|
|
|
|
|
|
|
| ERR221650 |
|
|
|
|
|
|
|
|
|
| ERR221637 |
|
|
|
|
|
|
|
|
|
| ERR221645 | 0.95 | 4/4 | T/LAM11-ZWE |
| 6 (4) | 241 |
| 1 |
|
| ERR221655 |
|
|
|
|
|
|
|
|
|
| ERR221664 |
|
|
|
|
|
|
|
|
|
| ERR221666 |
|
|
|
|
|
|
|
|
|
| ERR221626 |
|
|
|
|
|
|
|
|
|
| ERR221638 | 0.95 | 3/4 | LAM11-ZWE/CAS1-Delhi |
| 6 (6) | 678 |
| 1 |
|
| ERR221621 | 0.95 | 1/1 | EAI6-BGD1/EAI1-SOM |
| 12 (11) | 1792 |
| 1 |
|
| ERR221646 | 1.00 | 4 | T |
| 4 (2) | 242 |
| 1 |
|
| ERR221642 | 1.00 | 4 | CAS1-Delhi |
| 12 (5) | 1144 |
| 1 |
|
| ERR221624 | 1.00 | 1 | EAI1-SOM |
| 18 (7) | 1765 |
| 1 |
|
| ERR221648 | 1.00 | 4 | LAM11-ZWE |
| 7 (6) | 685 |
| 1 |
|
| ERR221657 | 1.00 | 4 | LAM11-ZWE |
| 7 (5) | 687 |
| 1 |
|
| ERR221661 | 1.00 | 3 | CAS1-Kili |
| 11 (7) | 1185 |
| 1 |
|
| ERR221633 | 1.00 | 2 | Beijing |
| 5 (5) | 1110 |
| 1 |
|
| ERR221658 | 1.00 | 2 | Beijing |
| 5 (5) | 1113 |
| 1 |
|
| ERR221631 | 1.00 | 2 | Beijing |
| 5 (5) | 1129 |
| 1 |
|
| ERR221639 | 1.00 | 3 | LAM11-ZWE |
| 2 (2) | 671 |
| 1 |
|
| ERR221667 | 1.00 | 1 | EAI6-BGD1 |
| 4 (3) | 1769 |
| 1 |
|
| ERR221622 | 1.00 | 1 | EAI6-BGD1 |
| 3 (3) | 1789 |
| 1 |
|
The samples are ordered by the known major strain proportion and then by number of heterozygous sites. Samples identified as mixed infections are shown in bold
Fig. 1Heterozygous SNP plots for two clinical Malawi samples, illustrating the difference between clonal heterogeneity (a) and the signals of mixed infections (b). The x-axis represents contiguous SNPs across the genome (numbered sequentially) with heterozygous SNP calls, and the y-axis represents the proportion of non-reference alleles at that SNP. a shows no evidence of mixed infection, with read frequencies at heterozygous sites randomly distributed between 0 and 1. b demonstrates the characteristic pattern of mixed infection with two different strains, with the read frequencies clustering into two distinct clusters with means around 0.90 and 0.10, implying a 0.9/0.1 mixture
Fig. 2The plotted allele frequencies of reads at heterozygous sites in samples misidentified as pure strains in artificial mixtures of two strains using the Bayesian model-based clustering approach. The majority/minority strain proportions are 0.90 and 0.10 in sample ERR221649 and 0.95 and 0.05 in the remaining samples). The characteristic pattern of mixed infection that would be expected in samples of more than two non-clonal strains, e.g. Fig 1b, is not clear
Fig. 3A comparison of the major strain proportion estimated through Bayesian model- based clustering (blue) against the known majority strain proportion (red) in all in vitro artificial mixture samples (N = 48). The standard deviation of allele frequencies of heterozygous sites around the mean of the estimated major proportion is shown by the error bars in black
The sensitivity and specificity of the heterozygous sites and Bayesian model-based clustering approaches for detecting mixed infection in artificial mixture and replicate samples. Calculations assume that the 4 technical replicates of one sample that were classified as mixed by the heterozygous sites method came from a pure sample. True positives were taken as the known artificially mixed Malawi samples (Table 1), and true negatives as the known pure Malawi samples (Table 1), and all H37Rv and Portuguese replicate strains (Additional file 1)
| Number of mixed samples detected | ||
|---|---|---|
| Heterozygous sites method | Bayesian model-based clustering | |
| Artificial mixed Malawi samples | 33/36 | 27/36 |
| Pure Malawi samples | 0/12 | 0/12 |
| Technical replicates | 0/18 | 0/18 |
| Biological replicates | 4/32 | 0/32 |
|
|
|
|
|
|
|
|
Fig. 4A comparison of the major strain proportion estimated through Bayesian model- based clustering against the known majority strain proportion in the in silico two-strain mixture samples (N = 168). The between-lineage samples are shown in red while the within-lineage samples are shown in blue. The standard deviation of allele frequencies of heterozygous sites around the mean of the estimated major proportion is shown by the grey crosses
Fig. 5A closer inspection of samples identified as pure with the Bayesian clustering approach but mixed with the heterozygous sites approach. a A frequency histogram of heterozygous sites in Malawi samples identified as mixed infection or pure strains with the Bayesian clustering approach. Sample ERR323056, classified as a pure strain with 69 heterozygous sites, is highlighted. b The plotted allele frequencies of reads at heterozygous sites for samples identified as mixed using heterozygous sites approach but as pure strains with the Bayesian clustering approach, with sample ERR323056 shown first. Although there is some evidence of the characteristic pattern of mixed infection in some samples, the signal from heterozygous sites is insufficient to identify these strains as mixed infections
Tuberculosis disease characteristics associated with mixed infection. Nine individuals with mixed infections based on heterozygous sites but not with the Bayesian clustering method were excluded
| Characteristic | Mixed / Total | % mixed | |
|---|---|---|---|
| Year | |||
| 1995–1999 | 46/539 | 8.5 | |
| 2000–2004 | 82/662 | 12.4 | |
| 2005–2009 | 44/506 | 8.7 | |
| 2010–2014 | 14/247 | 5.7 | 0.009 |
| Age group (years) | |||
| < 15 | 2/34 | 5.9 | |
| 15–29 | 61/586 | 10.4 | |
| 30–44 | 87/866 | 10.1 | |
| 45+ | 36/468 | 7.7 | 0.4 |
| Sex | |||
| Female | 94/995 | 9.5 | |
| Male | 92/959 | 9.6 | 0.9 |
| HIV status | |||
| Negative | 52/636 | 8.2 | |
| Positive | 69/754 | 9.2 | |
| Positive on ART | 10/120 | 8.3 | 0.8 |
| Unknown | 55/444 | 12.4 | |
| Previous TB | |||
| No | 173/1830 | 9.5 | |
| Yes | 13/124 | 10.5 | 0.7 |
| Lineage | |||
| 1 | 35/314 | 11.2 | |
| 2 | 14/80 | 17.5 | |
| 3 | 15/212 | 7.1 | |
| 4 | 122/1343 | 9.1 | |
| | 0/5 | 0.0 | 0.08 |
| Type of TB | |||
| Smear + | 125/1462 | 8.6 | |
| Smear - | 53/400 | 13.3 | |
| Extra-pulmonary | 8/92 | 8.7 | 0.02 |
| Outcome | |||
| Completed | 116/1275 | 9.1 | |
| Died | 41/405 | 10.1 | |
| Lost/transferred | 29/274 | 10.6 | 0.7 |
| Isoniazid resistance | |||
| Resistant | 15/130 | 11.5 | |
| Sensitive | 169/1757 | 9.3 | 0.5 |
| Rifampicin resistance | |||
| Resistant | 2/17 | 11.8 | |
| Sensitive | 182/1870 | 9.7 | 0.7 |
P values from chi-square test except for associations with rifampicin resistance for which the Fisher exact test was used because of small numbers; ART Antiretroviral therapy