| Literature DB >> 32014110 |
Robyn S Lee1,2,3, Jean-François Proulx4, Fiona McIntosh5, Marcel A Behr5, William P Hanage2,3.
Abstract
Tuberculosis disproportionately affects the Canadian Inuit. To address this, it is imperative we understand transmission dynamics in this population. We investigate whether 'deep' sequencing can provide additional resolution compared to standard sequencing, using a well-characterized outbreak from the Arctic (2011-2012, 50 cases). Samples were sequenced to ~500-1000x and reads were aligned to a novel local reference genome generated with PacBio SMRT sequencing. Consensus and heterogeneous variants were identified and compared across genomes. In contrast with previous genomic analyses using ~50x depth, deep sequencing allowed us to identify a novel super-spreader who likely transmitted to up to 17 other cases during the outbreak (35% of the remaining cases that year). It is increasingly evident that within-host diversity should be incorporated into transmission analyses; deep sequencing may facilitate more accurate detection of super-spreaders and transmission clusters. This has implications not only for TB, but all genomic studies of transmission - regardless of pathogen.Entities:
Keywords: Tuberculosis; epidemiology; genomic epidemiology; global health; outbreaks; transmission; within-host diversity
Mesh:
Year: 2020 PMID: 32014110 PMCID: PMC7012596 DOI: 10.7554/eLife.53245
Source DB: PubMed Journal: Elife ISSN: 2050-084X Impact factor: 8.140
Comparison of alignments to H37Rv and MT-0080_PB.
Based on these filters: Phred < 50, Root Mean Square Mapping Quality (RMS-MQ) ≤ 30, depth (DP) < 20, Fisher Strand Bias (FS) ≥ 60 and read position strand bias (ReadPos) < −8 and an allelic fraction of ≥ 95% for cSNPs, with hSNPs classified when 5% < ALT < 95%. Quality metrics for the individual cSNPs/hSNPs identified in each sample are given in Source data 2.
| H37Rv (4,411,532 bp) | MT-0080_PB (4,426,525 bp) | P value | |
|---|---|---|---|
| Number of positions according to reference genome | |||
| Invariant reference across all samples, n (%) | 4,018,786 (91·10%) | 4,084,195 (92·27%) | <0·00005 a |
| Position was missing/low quality in at least one sample, n (%) | 391,761 (8·88%) | 342,179 (7·73%) | <0·00005 a |
| Position was an c/hSNP in at least one sample, n (%) | 985 (0·22%) | 152 (0·00%) | <0·00005 a |
| Shared cSNPs across all samples, n (%) | 764 (0·02%) | 1 (0·00%) | <0·00005 a |
| Shared hSNPs across all samples, n (%) | 42 (0·00%) | 0 (0%) | <0·00005 a |
| Core pairwise distances | |||
| Core cSNPs vs. reference, median (range) | 791 (790–792) | 3 (1–65) | <0·00005 b |
| Core cSNPs between samples, median (range) | 3 (0–64) | 3 (0–66) | <0·00005 b |
a Two sample test for difference in proportions.
b Wilcoxin Signed Rank test.
Figure 1.Pileup of reads showing hSNPs suspected to be due to alignment error as listed in Source data 3, with MT-4942 used as an example and zoomed on position 2,255,171 to 2,280,170 in H37Rv (National Center for Biotechnology Information RefSeq Database Accession NC_000962.3).
Binary Alignment Map (BAM) file were loaded into Tablet (v.1.17.08.17, Milne et al., 2013) to visualize the pileup compared to H37Rv.
Figure 2.Transmission of M. tuberculosis in village K.
Maximum likelihood tree of 62/65 cases diagnosed between 2007–2012 in village K based on consensus single nucleotide polymorphisms (cSNPs). After aligning to a local reference, MT-0080_PB, cSNPs were identified based on a minimum threshold of ≥95% of reads supporting the alternative allele. A core cSNP alignment was then produced with 90 positions.and IQ-Tree (v.1.6.8 Nguyen et al., 2015) was used to generate the tree using a KP3 model with correction for ascertainment bias. Model selection was based on the lowest Bayesian Information Criterion. 1000 bootstrap replicates were done; only p values > 60% are shown. Clusters were identified using hierarchical Bayesian Analysis of Population Structure (Cheng et al., 2013). These clusters were consistent with the sub-lineages previously identified in Lee et al. (2015a); Lee et al. (2015b), thus only sub-lineage names are indicated (Major sub-lineages [Mj]-IIIA, B, C, and Mj-VA). Only Mj-IIIA/B/C were present in 2011–2012; Mj-IIIA was first seen in village K in 2007, IIIB was first seen in 2009, and IIIC was first seen in 2012. Alleles informative for transmission in Mj-IIIB, identified using deep sequencing, are indicated. Between 2007–2012, there were two individuals who had a second episode of TB; stars are used to highlight these samples, with a different colour for each patient. MT-0080 is included in the alignment as the deep sequencing data from a sweep of all colonies identified a cSNP compared to the MT-0080_PB reference, which itself was generated from a single colony pick.
(A) Routine sequencing (to ~40–50x) had indicated there were three major clusters present in village K in 2011–2012, Mj-IIIA, B and C. Further analysis of sequencing and epidemiological data indicated that the 2011–2012 cases in Mj-IIIB could be divided into two subgroups of transmission – comprised of five and 13 patients, respectively (Lee et al., 2015b). MT-504 was the suspected source case for the subgroup of five, which all shared a ‘C’ allele at position 276,685 in H37Rv (position 276,544 in MT-0080_PB). In contrast, all members of the subgroup of 13 shared an ‘A’ at this position. Previously, MT-2474 was the suspected source case for this subgroup; this case was the first person with smear-positive (SS+) cavitary disease diagnosed in this subgroup. The epidemiologic curve for the 2011–2012 Mj-IIIB cases is shown, stratified by sub-group based on the alleles detected with routine sequencing. MT-504 and MT-2474, the putative sources within each sub-group are indicated. Sputum smear grade is given where cases were positive on microscopy. Dates of diagnosis are classified into biweekly intervals. (B) In contrast to standard sequencing, deep sequencing data revealed that, in fact, MT-504 – the presumed source for the subgroup of five cases and the first highly contagious case diagnosed in Mj-IIIB during the outbreak year – had both ‘C’ and ‘A’ alleles at this position (563:133 of reads, respectively), suggesting this was in fact the most probable source for both subgroups.
Figure 2—figure supplement 1.Comparison of epidemiologic inferences using 'routine' versus 'deep' sequencing.
(A) Routine sequencing (to ~40–50x) had indicated there were three major clusters present in village K in 2011–2012, Mj-IIIA, B and C. Further analysis of sequencing and epidemiological data indicated that the 2011–2012 cases in Mj-IIIB could be divided into two subgroups of transmission – comprised of five and 13 patients, respectively (Lee et al., 2015b). MT-504 was the suspected source case for the subgroup of five, which all shared a ‘C’ allele at position 276,685 in H37Rv (position 276,544 in MT-0080_PB). In contrast, all members of the subgroup of 13 shared an ‘A’ at this position. Previously, MT-2474 was the suspected source case for this subgroup; this case was the first person with smear-positive (SS+) cavitary disease diagnosed in this subgroup. The epidemiologic curve for the 2011–2012 Mj-IIIB cases is shown, stratified by sub-group based on the alleles detected with routine sequencing. MT-504 and MT-2474, the putative sources within each sub-group are indicated. Sputum smear grade is given where cases were positive on microscopy. Dates of diagnosis are classified into biweekly intervals. (B) In contrast to standard sequencing, deep sequencing data revealed that, in fact, MT-504 – the presumed source for the subgroup of five cases and the first highly contagious case diagnosed in Mj-IIIB during the outbreak year – had both ‘C’ and ‘A’ alleles at this position (563:133 of reads, respectively), suggesting this was in fact the most probable source for both subgroups.