Literature DB >> 32014110

Previously undetected super-spreading of Mycobacterium tuberculosis revealed by deep sequencing.

Robyn S Lee^1,2,3, Jean-François Proulx⁴, Fiona McIntosh⁵, Marcel A Behr⁵, William P Hanage^2,3.

Abstract

Tuberculosis disproportionately affects the Canadian Inuit. To address this, it is imperative we understand transmission dynamics in this population. We investigate whether 'deep' sequencing can provide additional resolution compared to standard sequencing, using a well-characterized outbreak from the Arctic (2011-2012, 50 cases). Samples were sequenced to ~500-1000x and reads were aligned to a novel local reference genome generated with PacBio SMRT sequencing. Consensus and heterogeneous variants were identified and compared across genomes. In contrast with previous genomic analyses using ~50x depth, deep sequencing allowed us to identify a novel super-spreader who likely transmitted to up to 17 other cases during the outbreak (35% of the remaining cases that year). It is increasingly evident that within-host diversity should be incorporated into transmission analyses; deep sequencing may facilitate more accurate detection of super-spreaders and transmission clusters. This has implications not only for TB, but all genomic studies of transmission - regardless of pathogen.

Entities: Chemical Disease Gene Species

Keywords: Tuberculosis; epidemiology; genomic epidemiology; global health; outbreaks; transmission; within-host diversity

Mesh：

Year: 2020 PMID： 32014110 PMCID： PMC7012596 DOI： 10.7554/eLife.53245

Source DB: PubMed Journal: Elife ISSN： 2050-084X Impact factor: 8.140

Introduction

Tuberculosis (TB) in Canada is highest among the Inuit, an Indigenous population with a rate over 300 times that of the non-Indigenous Canadian-born population in 2016 (Inuit Tapiriit Kanatami, 2018). Canada recently set a goal of TB elimination in the Inuit by 2030, (Inuit Tapiriit Kanatami, 2018) which will not be achieved without halting ongoing transmission. Previous studies have used genomic data either alone or in conjunction with classical epidemiology to investigate TB transmission dynamics in the Canadian North, (Tyler et al., 2017; Lee et al., 2015a; Lee et al., 2015b) with the aim of identifying clusters to help guide public health interventions. Thus far, such studies have relied on identifying consensus single nucleotide polymorphisms (cSNPs), consistent with prevailing methodology in this field. Recent studies suggest that incorporation of within-host diversity into genomic analyses may provide greater resolution of transmission than cSNP-based approaches alone (Worby et al., 2017; Martin et al., 2018; Meehan et al., 2019; Séraphin et al., 2019). This may be particularly important for investigation of outbreaks occurring over short time scales and/or in settings such as the Canadian North, where the genetic diversity of circulating strains is especially low. In both of these circumstances, it is common to find many samples separated by zero cSNPs, hindering accurate source ascertainment. To investigate this hypothesis, we used deep sequencing (i.e., to ~10-20 fold more than standard, or 500-1000x) to re-evaluate transmission in a densely-sampled outbreak in Nunavik, Québec. This outbreak, which has been previously described, (Lee et al., 2015b; Lee et al., 2016) comprised 50 microbiologically-confirmed cases of TB who were diagnosed in a single Inuit community between 2011–2012 - a rate of 5,359/100,000 for that year. Genomic epidemiology analyses using sequencing depths of ~50x that are standard in such work, identified multiple clusters of transmission in this outbreak, (Lee et al., 2015b) however, there was insufficient genetic variation detected to infer precise person-to-person transmission events within these subgroups, given the short time frame and low mutation rate of M. tuberculosis (~0.2–0.3 SNPs/genome/year for Lineage 4; Menardo et al., 2019). In this study, we illustrate how within-host diversity can be incorporated into transmission analyses. In doing so, we find new features of the transmission networks in this community, in particular, identifying a previously unrecognized super-spreading event. We highlight a potential role for deep sequencing in public health investigations, with implications for TB control in Canada’s North as well as other high-transmission environments.

Materials and methods

Study subjects

All 50 samples from the 2011–2012 outbreak (Lee et al., 2015b) were eligible for inclusion, as well as samples from all cases (n = 15) diagnosed in same village in the preceding five years (2007 onwards), 13/15 of which were caused by the same strain of M. tuberculosis (the ‘Major [Mj]-III’ sublineage; Lee et al., 2015a). There were two episodes of recurrent TB (i.e., where an individual had microbiologically-confirmed TB once, was cured, but developed TB again during the study period); otherwise, all samples are from unique individuals. All cases had pulmonary TB that was Lineage 4 (Euro-American; Lee et al., 2015b). Cross-contamination was ruled out as described in Lee et al. (2015b).

DNA extraction and sequencing

Samples were cultured once on Middlebrook 7H10 agar and plate sweeps were collected for DNA extraction using the van Soolingen method (van Soolingen et al., 1991). Genomic DNA was quantified using the Quant-iT PicoGreen dsDNA Assay (ThermoFisher Scientific, Massachusetts, USA). Library preparation and sequencing were done at the McGill University/Genome Québec Innovation Centre. The Illumina HiSeq 4000 was used to produce paired-end 100 bp reads. To obtain the depth of coverage needed for this study (~500–1000x for deep sequencing, compared to ~50–100x as routinely done by public health), pooled libraries were run on four independent lanes.

Bioinformatics

FastQC (v.0.11.5, https://www.bioinformatics.babraham.ac.uk/projects/fastqc/) was used to assess sequencing data quality and reads were trimmed to remove low-quality bases using Trimmomatic (v.0.36 Bolger et al., 2014). Kraken (v.1.1 Wood and Salzberg, 2014) was then used to identify potential contamination with the miniKraken database (minikraken_20171019). Reads classified as ‘Mycobacterium tuberculosis complex’ (MTBC) were extracted using Seqtk (v.1.2, Li H, available at: https://github.com/lh3/seqtk). Reads were then aligned using the Burrows Wheeler Aligner MEM algorithm (v.0.7.15 Li, 2013) to the H37Rv reference (NC_000962.3 in the National Center for Biotechnology Information [NCBI] RefSeq database) and sorted using Samtools (v.1.5 Li et al., 2009). Analyses were later repeated using a local reference genome (described below). Reads were with ambiguous mappings were excluded, as were reads with excessive soft-clipping (i.e., more than 20% of read length) based on our previous work (Martin et al., 2018). Duplicate reads were marked using Picard MarkDuplicates (v.2.9.0, https://broadinstitute.github.io/picard/) and reads were locally re-aligned around indels using Genome Analysis ToolKit (GATK, v.3.8 McKenna et al., 2010). All sites were called using GATK’s Unified Genotyper algorithm, with the -d 1500 to avoid downsampling to 250 (done by default with this tool during variant calling). Variants (cSNPs and heterogenous SNPs [hSNPs]) versus H37Rv were annotated using snpEff (v.4.3t Cingolani et al., 2012). Variants were filtered for quality using custom Python scripts (v.3.6) with the following thresholds: Phred < 50, Root Mean Squared Mapping Quality (RMS-MQ) ≤ 30, depth (DP) < 20, Fisher Strand Bias (FS) ≥ 60 and read position strand bias (ReadPos) < −8 (Martin et al., 2018). cSNPs were classified as positions where ≥ 95% of reads were the alternative allele (ALT), hSNPs were classified as positions where > 5% and < 95% of reads were ALT, and positions with the ALT present in ≤5% of reads were classified as ‘reference’. We also compared inferences of transmission from this analysis to i) when these thresholds were increased to the minimum values among cSNPs in the initial H37Rv analysis, and ii) when cSNPs were classified using a threshold of ≥ 99%, and hSNPs were classified when 1% < ALT < 99%, in order to assess the robustness of inferences to different filtering protocols. Low-quality variants, variants in proline-proline-glutamic acid (PE) and proline-glutamic-acid/polymorphic-guanine-cytosine-rich sequence (PE_PGRS) genes, transposons, phage and integrase, and positions with missing data, were excluded. All samples were drug-susceptible, except for MT-6429, which was rendered resistant to isoniazid by a frameshift deletion at position 1284 in the catalase-peroxidase gene katG. As such, positions associated with drug resistance were not masked in this analysis. Alignments with informative hSNPs were reviewed using Tablet (v.1.17.08.17, Milne et al., 2013). Concatenated core cSNP alignments were made using snp-sites -c (v.2.4.0 Page et al., 2016), with positions with hSNPs excluded. Pairwise cSNP distances between samples were computed using snp-dists (v.0.6, available at https://github.com/tseemann/snp-dists). The frequency of hSNPs at each position in the genome was tabulated and hSNPs were reviewed to identify variants shared between samples.

Phylogenetics and clustering

Core cSNP alignments were used to generate maximum likelihood trees using IQ-Tree (v.1.6.8 Nguyen et al., 2015). Model selection was based on the lowest Bayesian Information Criterion. Hierarchical Bayesian Analysis of Population Structure (Cheng et al., 2013) was run in R (v.3.5.2) to identify clusters. Phylogenetic trees were visualized using Interactive Tree of Life (Letunic and Bork, 2016).

Single molecule Real-Time (SMRT) sequencing and assembly

To examine the influence of potential alignment errors in identification of hSNPs, we used SMRT sequencing with the PacBio RSII platform to create a local reference genome for the outbreak. Sample MT-0080 was chosen for sequencing because this was previously identified as the probable source for as many as 19 of the 50 cases diagnosed in 2011–2012 (Lee et al., 2015b). Prior to sequencing, the culture was grown on a Middlebrook 7H10 agar plate. A single colony was then selected and grown further in 3 mL of Middlebrook 7H9 Broth to provide sufficient DNA for SMRT sequencing and Illumina MiSeq (for polishing of the long-read assembly). DNA for SMRT sequencing was extracted using the MagAttract High Molecular Weight DNA Kit from Qiagen (Maryland, USA). High molecular weight fragments were verified using gel electrophoresis. Library preparation and sequencing were then done at the McGill University/Genome Québec Innovation Centre. Prior to sequencing, fragment size was evaluated using a BioAnalyzer and the BluePippin system (Sage Science, Massachusetts, USA) was used for size selection. DNA for Illumina MiSeq was extracted using the van Soolingen method, as previous (van Soolingen et al., 1991). Long-reads were assembled and corrected using Canu (v.1.7.1 Koren et al., 2017). Pilon (v.1.23 Walker et al., 2014) was then used to polish the assembly using the Illumina MiSeq reads from the same colony. This was re-run until no further corrections were possible. Quast (v.5.0.2, Gurevich et al., 2013) was used to evaluate assembly quality. RASTtk (v.2.0 Brettin et al., 2015) was used for annotation, to identify regions for masking as previous.

Epidemiological data

Epidemiological and clinical data were collected on all cases and contacts using standardized questionnaires as part of the routine public health response, described previously in Lee et al. (2015b); Lee et al. (2016).

Statistical analyses

A two-sample test of proportions was used to compare overall proportions across references, and the Wilcoxon Signed Rank test was used to compare paired SNP distances. Analyses were done in Stata (v.15, StataCorp, College Station, TX, USA).

Results

62/65 (95·4%) available TB samples from cases diagnosed between 2007–2012 were successfully sequenced and passed quality control. This included 48/49 (98·0%) of the samples with an identical Mycobacterial Interspersed Repetitive Units Variable Number Tandem Repeats (MIRU-VNTR) pattern during the outbreak year. The remaining three samples could not be re-grown. Reads that were non-MTBC were removed (Source data 1) and there was no obvious association between percent contamination and hSNP frequency. Epidemiological and clinical data on all outbreak cases are described in Lee et al. (2015b). Average genome coverage and depth across the H37Rv reference was 98·64% [SD 0·07%] and 714·53 [SD 92·68], respectively. Our primary filtering protocol yielded 51,430 cSNPs and 4,897 hSNPs across all individual samples (Source data 2). Excluding positions that were invariant compared to the reference or where any sample was missing and/or was low-quality resulted in a core alignment of 860 cSNP positions and 136 hSNP positions (note, these are not mutually exclusive, as positions with cSNPs in some samples may have hSNPs in others). 42 positions had hSNPs that were shared across all 62 samples (Table 1, Source data 3). Depth of coverage at these positions was, on average, 39% higher than the average depth across the same sample (SD 36·7%, Source data 4). Along with manual review of alignments (Figure 1), this suggested that many of these were false positives, potentially due to alignment error (e.g., from underlying structural variation in our samples compared to the H37Rv reference).

Table 1.

Comparison of alignments to H37Rv and MT-0080_PB.

	H37Rv (4,411,532 bp)	MT-0080_PB (4,426,525 bp)	P value
Number of positions according to reference genome
Invariant reference across all samples, n (%)	4,018,786 (91·10%)	4,084,195 (92·27%)	<0·00005 ^a
Position was missing/low quality in at least one sample, n (%)	391,761 (8·88%)	342,179 (7·73%)	<0·00005 ^a
Position was an c/hSNP in at least one sample, n (%)	985 (0·22%)	152 (0·00%)	<0·00005 ^a
Shared cSNPs across all samples, n (%)	764 (0·02%)	1 (0·00%)	<0·00005 ^a
Shared hSNPs across all samples, n (%)	42 (0·00%)	0 (0%)	<0·00005 ^a
Core pairwise distances
Core cSNPs vs. reference, median (range)	791 (790–792)	3 (1–65)	<0·00005 ^b
Core cSNPs between samples, median (range)	3 (0–64)	3 (0–66)	<0·00005 ^b

a Two sample test for difference in proportions.

b Wilcoxin Signed Rank test.

Figure 1.

Pileup of reads showing hSNPs suspected to be due to alignment error as listed in Source data 3, with MT-4942 used as an example and zoomed on position 2,255,171 to 2,280,170 in H37Rv (National Center for Biotechnology Information RefSeq Database Accession NC_000962.3).

Binary Alignment Map (BAM) file were loaded into Tablet (v.1.17.08.17, Milne et al., 2013) to visualize the pileup compared to H37Rv.

Pileup of reads showing hSNPs suspected to be due to alignment error as listed in Source data 3, with MT-4942 used as an example and zoomed on position 2,255,171 to 2,280,170 in H37Rv (National Center for Biotechnology Information RefSeq Database Accession NC_000962.3).

Binary Alignment Map (BAM) file were loaded into Tablet (v.1.17.08.17, Milne et al., 2013) to visualize the pileup compared to H37Rv.

Comparison of alignments to H37Rv and MT-0080_PB.

Based on these filters: Phred < 50, Root Mean Square Mapping Quality (RMS-MQ) ≤ 30, depth (DP) < 20, Fisher Strand Bias (FS) ≥ 60 and read position strand bias (ReadPos) < −8 and an allelic fraction of ≥ 95% for cSNPs, with hSNPs classified when 5% < ALT < 95%. Quality metrics for the individual cSNPs/hSNPs identified in each sample are given in Source data 2. a Two sample test for difference in proportions. b Wilcoxin Signed Rank test. To address this, we generated a local reference genome for the outbreak, MT-0080_PB. Quality metrics for the MT-0080_PB assembly are given in Source data 5. Compared to H37Rv, mean genome coverage and depth were higher with MT-0080_PB (at 99·33% [SD 0·09%] and 717·07 [SD 93·01], respectively), fewer positions were missing/low-quality (p < 0·00005, Table 1), and overall, fewer variable positions were detected (p < 0·00005). While core cSNP distances were similar between samples regardless of the reference (Table 1), the number of hSNPs was greatly reduced using MT-0080_PB (Source data 2); while 4,897 hSNPs were identified across all individual samples using H37Rv, only 125 hSNPs were identified using MT-0080_PB. There were also no hSNPs shared across all 62 samples using MT-0080_PB. Together, these findings support our hypothesis that alignment error is responsible for many of the detected variants, and indicate a local reference is important for accurate identification of hSNPs. All further results presented are based on the MT-0080_PB alignment. A maximum likelihood tree was generated from 90 core cSNP positions (excluding sites invariant across all samples and the reference) compared to MT-0080_PB (Figure 2). All 62 samples (historical and outbreak) were included, for comparison with our previous work. (Lee et al., 2015b) Consistent with this, (Lee et al., 2015b) hierBAPS identified two main sub-lineages (‘Mj-V’ and ‘Mj-III’ per Lee et al., 2015a), with three sub-clusters (Mj-IIIA/B/C).

Figure 2.

Transmission of M. tuberculosis in village K.

Maximum likelihood tree of 62/65 cases diagnosed between 2007–2012 in village K based on consensus single nucleotide polymorphisms (cSNPs). After aligning to a local reference, MT-0080_PB, cSNPs were identified based on a minimum threshold of ≥95% of reads supporting the alternative allele. A core cSNP alignment was then produced with 90 positions.and IQ-Tree (v.1.6.8 Nguyen et al., 2015) was used to generate the tree using a KP3 model with correction for ascertainment bias. Model selection was based on the lowest Bayesian Information Criterion. 1000 bootstrap replicates were done; only p values > 60% are shown. Clusters were identified using hierarchical Bayesian Analysis of Population Structure (Cheng et al., 2013). These clusters were consistent with the sub-lineages previously identified in Lee et al. (2015a); Lee et al. (2015b), thus only sub-lineage names are indicated (Major sub-lineages [Mj]-IIIA, B, C, and Mj-VA). Only Mj-IIIA/B/C were present in 2011–2012; Mj-IIIA was first seen in village K in 2007, IIIB was first seen in 2009, and IIIC was first seen in 2012. Alleles informative for transmission in Mj-IIIB, identified using deep sequencing, are indicated. Between 2007–2012, there were two individuals who had a second episode of TB; stars are used to highlight these samples, with a different colour for each patient. MT-0080 is included in the alignment as the deep sequencing data from a sweep of all colonies identified a cSNP compared to the MT-0080_PB reference, which itself was generated from a single colony pick.

(A) Routine sequencing (to ~40–50x) had indicated there were three major clusters present in village K in 2011–2012, Mj-IIIA, B and C. Further analysis of sequencing and epidemiological data indicated that the 2011–2012 cases in Mj-IIIB could be divided into two subgroups of transmission – comprised of five and 13 patients, respectively (Lee et al., 2015b). MT-504 was the suspected source case for the subgroup of five, which all shared a ‘C’ allele at position 276,685 in H37Rv (position 276,544 in MT-0080_PB). In contrast, all members of the subgroup of 13 shared an ‘A’ at this position. Previously, MT-2474 was the suspected source case for this subgroup; this case was the first person with smear-positive (SS+) cavitary disease diagnosed in this subgroup. The epidemiologic curve for the 2011–2012 Mj-IIIB cases is shown, stratified by sub-group based on the alleles detected with routine sequencing. MT-504 and MT-2474, the putative sources within each sub-group are indicated. Sputum smear grade is given where cases were positive on microscopy. Dates of diagnosis are classified into biweekly intervals. (B) In contrast to standard sequencing, deep sequencing data revealed that, in fact, MT-504 – the presumed source for the subgroup of five cases and the first highly contagious case diagnosed in Mj-IIIB during the outbreak year – had both ‘C’ and ‘A’ alleles at this position (563:133 of reads, respectively), suggesting this was in fact the most probable source for both subgroups.

Transmission of M. tuberculosis in village K.

Comparison of epidemiologic inferences using 'routine' versus 'deep' sequencing.

hSNPs identify a novel super-spreading event and more accurately resolve transmission clusters

The core cSNPs and hSNPs between samples are shown in Source data 6, with the sub-groups identified in the original analysis indicated. Overlaying hSNPs with the cSNP-based analysis revealed a novel super-spreader (MT-504) in Cluster Mj-IIIB, undetected by genomic epidemiology analyses relying on lower sequencing depth. (Lee et al., 2015b). In brief, our previous analysis using routine sequencing depth had suggested that Mj-IIIB was comprised of two distinct transmission networks (which we refer to as ‘subgroups’ for consistency with our earlier work); the first subgroup consisted of five cases diagnosed between December 2011 and October 2012, while the second subgroup consisted of 13 cases diagnosed between March and November 2012. Epidemiologic curves for these subgroups are given in Figure 2—figure supplement 1A. These two subgroups were distinguished from one another based on the presence or absence of a shared cSNP (at position 276,685 according to H37Rv/276,544 in the MT-0080_PB alignment, Source datas 6, 7, 8); all samples in the subgroup of five cases shared an alternative ‘C’ allele at this position, while all samples in the subgroup of 13 cases shared the reference ‘A’ allele. Given the short time period, low mutation rate of TB, and overall low diversity of strains circulating in the village, we would expect 0 SNPs to accrue in recent transmission, refuting transmission between these subgroups. In the original analysis, MT-504 was identified as the probable source for the subgroup of five cases. This individual was diagnosed in late 2011 with smear-positive, cavitary disease, and had attended the same local community ‘gathering houses’ (social venues specifically identified by public health during the outbreak) as all four others in this subgroup. For the second subgroup of 13 cases, MT-2474 was identified as the probable source, as this was the first smear-positive case in this subgroup (diagnosed in May 2012, Figure 2—figure supplement 1A).

Figure 2—figure supplement 1.

Comparison of epidemiologic inferences using 'routine' versus 'deep' sequencing.

In contrast to the analysis based on routine sequencing data, our in-depth investigation of within-host diversity using deep sequencing data revealed that MT-504 harboured both the alternative allele (563 reads [80·9%]) shared by all members of the subgroup of five as well as the reference allele (133 reads [19·1%]), shared by all members of the subgroup of 13 (Figure 2—figure supplement 1B). Given that MT-504 was the first contagious case diagnosed in Mj-IIIB (Figure 2—figure supplement 1A), and all 13 cases in the second subgroup had attended or resided in a gathering house (with 9/13 [69·2%] reporting attendance at the same houses as MT-504), this strongly suggests that MT-504 is in fact the most probable source for both subgroups.

hSNP analysis adds support for suspected transmission

Sample 68995 and MT-5543 were from 2007, and were the only strains from the Mj-VA sub-lineage in this village. Previous analysis indicated Mj-VA strains from other villages were distantly related, (Lee et al., 2015b) while these two samples were separated from one another by zero core cSNPs. This suggests direct transmission between these historical cases, a hypothesis strongly supported by hSNP analysis, as the samples share hSNPs that are not found in any other sample in the dataset. These hSNPs were present even when highly conservative filtering thresholds were used (Source data 7), but were not included when using H37Rv as the reference - potentially due to differences in annotation and subsequent filtering.

Potential utility for discriminating TB recurrence

Six individuals had TB recurrence in 2011–2012. Paired samples were available for two of these (Patient 1: samples MT-5195 [Mj-IIIA] in 2007 and MT-1838 [Mj-IIIC] in 2012; Patient 2: samples MT-5543 [Mj-VA] in 2007 and MT-1206 [Mj-IIIB] in 2012, Figure 2). Clinically, both patients had new lesions detected at their second episode, compared to their previous chest x-rays. cSNP-based analyses suggested their second episodes of TB were due to re-infection with a new strain, rather than relapse with the strain causing their original disease. Investigation of within-host diversity strongly supported this conclusion; using deep sequencing, we verified that there was a single, different strain present at both baseline and their second episodes of TB. There was no evidence for mixed infection at either baseline or second episode with these strains, more definitively ruling out relapse in this low diversity setting (Source datas 6, 7, 8).

Impact of altering cSNP and hSNP thresholds

To ensure we were not missing lower frequency variants using the prior cSNP/hSNP thresholds, we re-ran our analysis such that hSNPs were classified when 1% < ALT < 99%. Quality scores for individual cSNPs and hSNPs are given in Source data 9 and the core cSNP/hSNP alignment is shown in Source data 8. While our primary analysis using a threshold of ≥95% for cSNPs identified a single cSNP (A to G) shared across all samples compared to MT-0080_PB, close examination of the MT-0080 deep sequencing data (obtained using DNA from a sweep of the plate) showed that this sample had both alleles at this position, with only the minority ‘A’ allele (33 reads/1189 [2·8%]) isolated for SMRT sequencing. Based on this, we recommend sequencing samples both using a clean sweep (with an alternative sequencing platform) and a single colony pick when generating a reference genome for TB, as using the latter alone may introduce error and affect epidemiological inferences. With this exception, no other informative hSNPs were detected using these thresholds.

Discussion

As the TB epidemic continues among the Canadian Inuit, targeted public health interventions are essential to halt ongoing transmission. In order to do so, it is important that transmission events and associated risk factors are accurately identified. Our previous work suggested that hSNP analysis could enhance resolution of TB transmission (Martin et al., 2018). To investigate how this approach could be applied for TB control, we used deep sequencing to re-examine a major TB outbreak in the Canadian Arctic. Several recent studies, including work by our group (Martin et al., 2018), have shown that M. tuberculosis within-host diversity can be transmitted between individuals (Séraphin et al., 2019; Guthrie et al., 2019). Using deep sequencing data allowed us to better identify this diversity in a Nunavik outbreak compared to previous analyses with standard sequencing depth, (Lee et al., 2015a; Lee et al., 2015b) and facilitated detection of a novel super-spreading event, where one source case may have transmitted to ~1/3 of the other cases diagnosed between 2011 to 2012. This was in addition to a previously identified super-spreader linked to 19 secondary cases - suggesting up to 75% of the outbreak (36/48, excluding the putative super-spreaders) may be attributable to these events. Super-spreading has been described in a number of pathogens, (Stein, 2011) including TB (Kline et al., 1995). Our findings suggest this can play an important role in driving TB outbreaks, and that accurate detection of super-spreading events is important for informing appropriate public health interventions. In the case of MT-504, as nearly all of the secondary cases had attended the same local community gathering houses as the putative source, this strongly suggests these venues play an important role in facilitating transmission in this setting. Several studies have used genomics to investigate TB recurrence, (Witney et al., 2017; Bryant et al., 2013; Guerra-Assunção et al., 2015) however, the methods used to assess for mixed infection at either time point have been inconsistent and may not be sufficient to discriminate recurrence in settings with low strain diversity. In this analysis, we provide proof-of-principle that deep sequencing can potentially help rule out relapse. The distinction between relapse and re-infection is important at individual and population levels; high rates of relapse in a community would indicate a problem with treatment or adherence, potentially warranting changes to clinical management, while re-infection would indicate the need for public health interventions such as active case finding. Also, individuals in Nunavik who have had prior treatment for active TB disease in the past are also not routinely offered prophylaxis on re-exposure, based on historical data suggesting ~80% protection is afforded by prior infection (Menzies, 1997). The degree to which re-infection drives recurrence in Nunavik is currently unknown, but if re-infection is the primary cause, this clinical practice may need to be re-evaluated. A population-level genomic epidemiology study is currently underway to evaluate this. To use deep sequencing to investigate within-host diversity, it is critical we minimize false positive hSNPs. We have shown that using a local strain as a reference can not only reduce error, but improves detection of epidemiologically-informative variants. Genomic differences between outbreak strains and H37Rv have been previously illustrated by Roetzer et al. (2013); O'Toole and Gautam (2017), with O'Toole and Gautam (2017) warning that clinical TB strains may be needed to fully detect virulence genes in reference-based analyses. Our analysis suggests these may also be warranted for hSNP analysis; where possible, we suggest using long-read sequencing to generate complete and local reference genomes. Overall, our study has a number of strengths. Firstly, we had access to a densely-sampled outbreak, which was previously investigated using ‘standard’ sequencing depth and for which detailed epidemiological data was available. This allowed us to readily compare methodological approaches, showing how and when deep sequencing might be beneficial for public health. In this study, we have also identified important methodological considerations for hSNP detection, with implications for transmission analyses, but also potentially for resistance prediction as well (Liu et al., 2015). Finally, the use of long-read data has allowed us to completely assemble a novel TB genome from Nunavik. This will serve as a valuable resource for future studies of transmission in Nunavik (given the low strain diversity in the region Lee et al., 2015a), as well as other Inuit territories. A potential limitation of this work is that, given the historical nature of the outbreak, deep sequencing was done using DNA extracted from culture. Due to methodological challenges of sequencing directly from sputum, (Brown et al., 2015; Votintseva et al., 2017; Doyle et al., 2018) few studies have examined the effect of culture on genome diversity. A recent study by Shockey et al. (2019)., which analyzed allelic variation among reads from individual samples, suggests that some diversity may be lost during the culturing process. However, several studies looking at potential transmission (Votintseva et al., 2017; Doyle et al., 2018; Nimmo et al., 2019) found results were congruent between cSNP analyses from culture versus raw samples. In terms of hSNPs, Votintseva et al. (2017) Doyle et al. (2018); Nimmo et al. (2019) reported detecting fewer hSNPs with sequencing from culture versus from sputum, in Nimmo et al. (2019), the median hSNPs was only 4.5 versus 5 hSNPs, respectively – a difference that may not be clinically significant, regardless of statistical significance. Given the inconsistency of results and paucity of data, further study is needed to understand how hSNP diversity may be affected by the culturing process, and to assess whether this affects inferences of transmission. We note that it is likely that enhanced detection of the hSNPs present in sputum would improve the resolution over that which we present in this work. Another potential limitation is that, while we can compare the epidemiological inferences made between our previous analysis and our deep sequencing analysis, the sequencing data and bioinformatics pipelines themselves are not directly comparable. Methods to accurately identify hSNPs and incorporate them into transmission analyses are currently an area of active research. We illustrated in our recent paper (Martin et al., 2018) that additional steps and strict thresholds must be used to minimize false positive hSNPs, and have conducted the current analysis in consideration of this. However, we note that pre-filtering, our 2015 analysis found that MT-504 had five reference alleles at position 276,685 in the H37Rv alignment (out of 75) and randomly downsampling the current data to simulate ~50 x yielded similar results (5/47 reads at position 276,544 aligned to MT-0080_PB). As most genomic studies of TB employ conservative thresholds of 75–90% allele frequency to classify cSNPs, many bioinformatics pipelines would consider this heterogeneity as potentially suspect at standard sequencing depth. This suggests that greater depth and/or different analytic approaches (e.g., Wyllie et al., 2019) are needed to ensure accurate discrimination of sequencing/analytic error from true variation; ultimately, the optimal approach used to identify variants needs to be carefully considered, and appropriate for the study question and data being analyzed. Finally, while deep sequencing allowed us to detect a novel superspreading event in this context, this approach may not always be necessary; indeed, our previous analysis had identified another super-spreader in the same outbreak using routine sequencing. We acknowledge that this Northern outbreak may not be representative of outbreaks from other settings and/or involving other M. tuberculosis lineages. Further studies are needed to quantify the degree to which super-spreading occurs in TB, and examine how and when deep sequencing should be used to detect this. In summary, we have found evidence of mixed variants with important epidemiologic implications that would not have been detected with standard methods and common filtering criteria. To our knowledge, however, no other studies have been published comparing epidemiological inferences obtained with deep versus routine sequencing for TB outbreak resolution – thus this work represents an important methodological advance in this area. We illustrate that genomic methods, while powerful, still require careful interpretation and can still harbor considerable ambiguity when comparing very close links in a transmission chain, or, as also suggested in Xu et al. (2019), when trying to identify source cases. This finding is likely relevant beyond TB, given the increasing number of pathogens undergoing genomic investigation. Our work also highlights the importance of reproducing previous genomic epidemiology analyses; as the technology and methods used in this field continue to develop, these can lead to improved resolution of transmission and ultimately, challenge previously-held inferences – with critical implications for public health. In terms of TB control, we demonstrate that deep sequencing can aid in transmission analyses, in particular by allowing accurate identification of TB super-spreading events and associated epidemiological characteristics. We propose that deep sequencing is most useful for understanding transmission in settings with low strain diversity, and that these may benefit from routine use of this approach. We hypothesize deep sequencing may also provide additional resolution of transmission events within outbreaks occurring over short, limited timescales – irrespective of local strain diversity, as (by definition) all samples involved in recent transmission would be expected to be closely-related. However, further studies comparing deep versus routine sequencing, ideally from a diversity of clusters and epidemiologic contexts, are needed to fully quantify the added value of this approach for epidemiologic studies of TB. Overall, this work has important implications for the Canadian North, as well as other regions with high TB transmission; as next-generation sequencing becomes a mainstay in public health surveillance, it is critical we recognize the limitations of analyses done using routine sequencing data. Accurate resolution of transmission is essential for TB control programmes, in order to better understand risk factors for such transmission and enable prioritization of public health resources. With respect to Nunavik, our findings were regarded as very valuable by the regional public health unit and local community leaders; as a direct consequence of this work, ongoing and new investigations of TB genomic epidemiology in Nunavik are using deep sequencing to inform transmission analyses. However, while costs continue to decline, we recognize deep sequencing of all samples in an outbreak may not be economically viable in every setting. In the interests of transparency, eLife publishes the most substantive revision requests and the accompanying author responses. Acceptance summary: Tuberculosis infection represents one of the major global health challenges and a better understanding of tuberculosis transmission is critical for designing effective control strategies. This work uses deep sequencing to further investigate a previously studied tuberculosis transmission cluster in Canada. It uses deep sequencing to identify co-infection of a single individual with two closely related strains, and suggests that both strains were subsequently transmitted. This increases the number of individuals thought to have been infected from a single source individual, and thus increases our estimates of the potential frequency and magnitude of 'super-spreading' events. Although this represents analysis of a single infection cluster, such detailed studies also provide a model for future outbreak investigation. Decision letter after peer review: Thank you for submitting your article "Previously undetected super-spreading of Mycobacterium tuberculosis revealed by deep sequencing" for consideration by eLife. Your article has been reviewed by three peer reviewers, and the evaluation has been overseen by Miles Davenport as Reviewing Editor, and Eduardo Franco as the Senior Editor. The following individual involved in review of your submission has agreed to reveal their identity: Conor J Meehan. As is customary in eLife, the reviewers have discussed the reviews with one another and the Reviewing Editor has drafted this decision to help you prepare a revised submission. Summary: Lee et al. report a re-analysis of a tuberculosis outbreak combining a local reference derived from long-reads and ultra-deep sequencing to reveal minor variants. They use the novel information to identify a previously undetected super-spreader. This is very carefully done piece of work into which the authors have clearly put a lot of thought. Although mixed bacterial populations have been used to identify index cases and so-called super-spreaders from WGS TB data in the past, the authors have given this issue greater focus here and emphasised the point that such mixed populations might not be detected without sequencing to a greater depth. This work is an excellent example of the benefits of this deep sequencing and outlines both the need for local outbreak reference sequences and associated hSNP analysis in low diversity settings. Essential revisions: 1) The crux of this work is about identifying MT-504 as a super spreader. From the core SNP tree in Figure 1A this would not be evident at all based on their location in a subgroup of the sublineage and not in a position closer to the root. The use of hSNPs to uncover this is well done but the paragraph is difficult to read. Indeed only because this is my field do I think I was able to follow. Perhaps authors could construct a phylogenetic network of this subsection to show how the MT-504 resides as a central hub compared to the tree? In any case this section needs to be made easy to read as without it the more un-accustomed reader will get lost. 2) This work is most important for low diversity settings. This should be better highlighted throughout, perhaps with a comment on how this could be applicable to high diversity settings, if at all. Many Mtb researchers think all new WGS work is applicable in all settings and it would be of help if the authors demonstrated where and when people need to make the extra effort and indeed massively extra cost of deep sequencing. 3) The portion on super spreaders being found at diagnosis in the Discussion is interesting but not directly related to the findings here. We are a long way from identifying super spreaders based on some genomic trait (especially as this could be host related) and the WGS work here won't lead to such clinical decision making. I suggest this section be vastly shortened or removed so readers don't think this is a possibility from WGS. 4) A major criticism is that although this is a carefully crafted analysis, it remains essentially anecdotal. To draw conclusions that deep sequencing is now ideally required for all outbreaks is probably a little overenthusiastic. A systematic analysis of far larger number of clusters is required to assess the importance of mixed populations that go undetected at routine sequencing depth before conclusions can be drawn about what should and shouldn't be routine practice. It could easily be that most mixed populations that are helpful to inferences around directionality are detectable at routine sequencing depth in the vast majority of outbreaks. That needs to be established. Thus, although the discussion assumes that superspreaders can be discovered following the approach described, it is not totally clear this is always the case. This manuscript only offers one example, so this should be mentioned as a limitation but also as a call for future work to find out how common is the phenomenon and whether it can be revealed by ultra-deep sequencing. 5) The same criticism could be levelled at the conclusion that a local reference is important. It was in this case, but data has not been presented from which to conclude that it is routine necessity. Essential revisions: 1) The crux of this work is about identifying MT-504 as a super spreader. From the core SNP tree in Figure 1A this would not be evident at all based on their location in a subgroup of the sublineage and not in a position closer to the root. The use of hSNPs to uncover this is well done but the paragraph is difficult to read. Indeed only because this is my field do I think I was able to follow. Perhaps authors could construct a phylogenetic network of this subsection to show how the MT-504 resides as a central hub compared to the tree? In any case this section needs to be made easy to read as without it the more un-accustomed reader will get lost. We apologize for any lack of clarity. It has been a challenge to present a comparison with previous work, wherein we need to show our results clearly but also be mindful of not replicating previously published text or figures. The consensus SNP tree in Figure 1A (Figure 2A in the revised version, as the supplementary figure was moved to the main text) included all available samples from this village from 2007-2012 (i.e., 62/65 outbreak and historical cases, not only from IIIB, but IIIA, IIIC and VA as well). We did this to illustrate that, despite changes in sequencing platforms and analysis, we still recapitulate the original three clusters (IIIA/B/C) from this community. We have added the following clarification to the Results: “All 62 samples (historical and outbreak) were included, for comparison with our previous work.” Based on our previous work, cluster IIIA (which was first seen in the community in 2007, while IIIB and IIIC were not seen until 2009 and 2012, respectively) was ancestral to clusters Mj-IIIB and IIIC. We would therefore not expect MT-504, which is in cluster IIIB, to be at the root of this particular tree. In response to the reviewers’ comment, we have done the following to help make this figure and corresponding section of the text more clear for eLife readers: A) We have extensively revised the section entitled “hSNPs identify super-spreaders and more accurately resolve transmission clusters” to walk the reader through the data in more detail (hSNPs identify super-spreaders and more accurately resolve transmission clusters). B) We have added the informative alleles to this figure (2A in the revised version) C) We have also added a new epidemiologic curve showing the cases stratified by sub-group of transmission according to the original analysis to Figure 1B (2B in the revised version), to assist the reader in understanding the timeline. On reviewing the raw fasta file, we also wanted to report that we realized some missing sites had inadvertently been included despite using a well-known tool (‘snp-sites’) to exclude all non-AGTC characters for this step. This led to minor differences in the maximum likelihood tree structure from the submitted version. This has now been updated (and did not alter our overall epidemiologic inferences). 2) This work is most important for low diversity settings. This should be better highlighted throughout, perhaps with a comment on how this could be applicable to high diversity settings, if at all. Many Mtb researchers think all new WGS work is applicable in all settings and it would be of help if the authors demonstrated where and when people need to make the extra effort and indeed massively extra cost of deep sequencing. We thank the reviewers for this comment. It is true that the current study examines the application of deep sequencing to an outbreak in a low-diversity setting. However, given the low mutation rate of M. tuberculosis, samples from cases involved in recent transmission (e.g., within 2 years), can often have 0 consensus SNPs between them. Indeed, our collaborators at the US Center for Disease Control’s Division of TB Elimination have reported to us difficulties resolving recent transmission using consensus SNPs alone for this reason; we therefore anticipate deep sequencing could help provide additional resolution, by helping resolve person-to-person transmission events within these groups of transmission. In this case, the utility of deep sequencing in this case would be irrespective of the diversity of strains circulating in the setting. We do, however, agree that deep sequencing would likely be more useful in low-diversity settings for surveillance over longer time scales than it would be in high-diversity settings, as more fixed variation would be expected to occur over samples spread over greater time periods. We had previously stated that we believe deep sequencing may be useful for “investigation of outbreaks occurring over short time scales and/or in settings such as the Canadian North, where the genetic diversity of circulating strains is especially low. In both of these circumstances, it is common to find many samples separated by zero cSNPs, hindering accurate source ascertainment.” We have also added the following statement to the Discussion: “We propose that deep sequencing is most useful for understanding transmission in settings with low strain diversity, and that these may benefit from routine use of this approach. […] However, further studies comparing deep versus routine sequencing, ideally from a diversity of clusters and epidemiologic contexts, are needed to fully quantify the added value of this approach for epidemiologic studies of TB.” We also discuss representativeness of this dataset further under point #4 below. 3) The portion on super spreaders being found at diagnosis in the Discussion is interesting but not directly related to the findings here. We are a long way from identifying super spreaders based on some genomic trait (especially as this could be host related) and the WGS work here won't lead to such clinical decision making. I suggest this section be vastly shortened or removed so readers don't think this is a possibility from WGS. Thank you for your feedback on this section. We have cut this text from the manuscript to avoid further confusion, however, to clarify, we did not mean to imply that deep sequencing could be used to identify bacterial characteristics predictive of future super-spreading events. We absolutely agree with the reviewer that such a goal is generally ‘a long way off’, and, with respect to Nunavik, our previous work (Lee et al., 2015) actually suggested that bacterial characteristics are not likely to be driving transmission in this region. Furthermore, even if there were such bacterial characteristics, delays in sequencing (due to shipping samples south, batching of samples, need for culture, etc.) generally mean these findings would not be available in real-time to assist with outbreak response. What we are instead proposing to use deep sequencing to help accurately discriminate super-spreading events, and subsequently, identify epidemiological and clinical characteristics that are associated with these using more ‘classical’ epidemiologic methods (i.e., regression analyses). Dr. Lee is currently leading such a study for this region. These communities have very limited resources in terms of clinical staff, who are typically managing numerous routine health needs in addition to TB; if we can help identify the most likely clinical and epidemiological characteristics of these historical super-spreaders, nursing staff and physicians can potentially triage cases with these characteristics in the future as they are diagnosed, and better prioritize investigation of their contacts. 4) A major criticism is that this is a carefully crafted analysis, it remains essentially anecdotal. To draw conclusions that deep sequencing is now ideally required for all outbreaks is probably a little overenthusiastic. A systematic analysis of far larger number of clusters is required to assess the importance of mixed populations that go undetected at routine sequencing depth before conclusions can be drawn about what should and shouldn't be routine practice. It could easily be that most mixed populations that are helpful to inferences around directionality are detectable at routine sequencing depth in the vast majority of outbreaks. That needs to be established. Thus, although the discussion assumes that superspreaders can be discovered following the approach described, it is not totally clear this is always the case. This manuscript only offers one example, so this should be mentioned as a limitation but also as a call for future work to find out how common is the phenomenon and whether it can be revealed by ultra-deep sequencing. We appreciate the reviewers’ feedback, and we agree that the present work is not sufficient to offer guidelines for the study of all outbreaks, in settings with high and low strain diversity and TB burden. However, while this is indeed one outbreak, it is an essential and necessary step to enable such work in the future. We had in this study a unique opportunity to compare approaches for inferring transmission under essentially ideal conditions, i.e., where (due to the scale of the outbreak, and geography of Nunavik, limiting out-migration), there was all cases were detected, and detailed epidemiological data on each was available. In departing from the standard approaches to genomic epidemiology, this paper is a crucial starting point for future work; further studies are needed – with deep and regular sequencing data from all samples, along with epidemiological data – in order to assess the added value of deep sequencing across settings and contexts. In line with point #2 above, we have modified the Discussion to address this. In addition to this, we have also added the following: “Finally, while deep sequencing allowed us to detect a novel superspreading event in this context, this approach may not always be necessary; indeed, our previous analysis had identified another super-spreader in the same outbreak using routine sequencing. […] Further studies are needed to quantify the degree to which super-spreading occurs in TB, and examine how and when deep sequencing be used to detect this.” We have also removed our suggestion in the Discussion that deep sequencing may be needed for all smear-positive cases, to make the text less prescriptive. 5) The same criticism could be levelled at the conclusion that a local reference is important. It was in this case, but data has not been presented from which to conclude that it is routine necessity. We found a significant reduction in what appear to be false positive hSNPs using a local reference genome, and do think such a reference is useful to help minimize these. We have been careful to indicate that this may not be necessary for cSNP detection for TB (though we would note, in our experience, that a local reference is often preferred for this purpose for other pathogens). Our findings are also consistent with other studies that have looked at virulence factors; the H37Rv strain has been passaged numerous times since its isolation, and gene content is not identical to clinical strains. However, in response to the reviewer’s comment, we have modified our text from “We propose these [i.e., local reference genomes] are also warranted for hSNP analysis” to “Our analysis suggests that these may also be warranted for hSNP analysis”. We have also changed our ‘recommendation’ to a ‘suggestion’ that “where it is possible to do so – it would be best to generate and use a local reference”.

40 in total

1. Progression to tuberculosis disease increases with multiple exposures.

Authors: Robyn S Lee; Jean-François Proulx; Dick Menzies; Marcel A Behr
Journal: Eur Respir J Date: 2016-10-06 Impact factor: 16.671

Review 2. Whole genome sequencing of Mycobacterium tuberculosis: current standards and open issues.

Authors: Conor J Meehan; Galo A Goig; Thomas A Kohl; Lennert Verboven; Anzaan Dippenaar; Matthew Ezewudo; Maha R Farhat; Jennifer L Guthrie; Kris Laukens; Paolo Miotto; Boatema Ofori-Anyinam; Viola Dreyer; Philip Supply; Anita Suresh; Christian Utpatel; Dick van Soolingen; Yang Zhou; Philip M Ashton; Daniela Brites; Andrea M Cabibbe; Bouke C de Jong; Margaretha de Vos; Fabrizio Menardo; Sebastien Gagneux; Qian Gao; Tim H Heupink; Qingyun Liu; Chloé Loiseau; Leen Rigouts; Timothy C Rodwell; Elisa Tagliani; Timothy M Walker; Robin M Warren; Yanlin Zhao; Matteo Zignol; Marco Schito; Jennifer Gardy; Daniela M Cirillo; Stefan Niemann; Inaki Comas; Annelies Van Rie
Journal: Nat Rev Microbiol Date: 2019-09 Impact factor: 60.633

Review 3. Issues in the management of contacts of patients with active pulmonary tuberculosis.

Authors: D Menzies
Journal: Can J Public Health Date: 1997 May-Jun

4. Reemergence and amplification of tuberculosis in the Canadian arctic.

Authors: Robyn S Lee; Nicolas Radomski; Jean-Francois Proulx; Jeremy Manry; Fiona McIntosh; Francine Desjardins; Hafid Soualhine; Pilar Domenech; Michael B Reed; Dick Menzies; Marcel A Behr
Journal: J Infect Dis Date: 2015-01-09 Impact factor: 5.226

5. Outbreak of tuberculosis among regular patrons of a neighborhood bar.

Authors: S E Kline; L L Hedemark; S F Davies
Journal: N Engl J Med Date: 1995-07-27 Impact factor: 91.245

6. Recurrence due to relapse or reinfection with Mycobacterium tuberculosis: a whole-genome sequencing approach in a large, population-based cohort with a high HIV infection prevalence and active follow-up.

Authors: José Afonso Guerra-Assunção; Rein M G J Houben; Amelia C Crampin; Themba Mzembe; Kim Mallard; Francesc Coll; Palwasha Khan; Louis Banda; Arthur Chiwaya; Rui P A Pereira; Ruth McNerney; David Harris; Julian Parkhill; Taane G Clark; Judith R Glynn
Journal: J Infect Dis Date: 2014-10-21 Impact factor: 5.226

7. SNP-sites: rapid efficient extraction of SNPs from multi-FASTA alignments.

Authors: Andrew J Page; Ben Taylor; Aidan J Delaney; Jorge Soares; Torsten Seemann; Jacqueline A Keane; Simon R Harris
Journal: Microb Genom Date: 2016-04-29

8. The molecular clock of Mycobacterium tuberculosis.

Authors: Fabrizio Menardo; Sebastian Duchêne; Daniela Brites; Sebastien Gagneux
Journal: PLoS Pathog Date: 2019-09-12 Impact factor: 6.823

9. Direct transmission of within-host Mycobacterium tuberculosis diversity to secondary cases can lead to variable between-host heterogeneity without de novo mutation: A genomic investigation.

Authors: Marie Nancy Séraphin; Anders Norman; Erik Michael Rasmussen; Alexandra M Gerace; Calin B Chiribau; Marie-Claire Rowlinson; Troels Lillebaek; Michael Lauzardo
Journal: EBioMedicine Date: 2019-08-13 Impact factor: 8.143

10. High-resolution mapping of tuberculosis transmission: Whole genome sequencing and phylogenetic modelling of a cohort from Valencia Region, Spain.

Authors: Yuanwei Xu; Irving Cancino-Muñoz; Manuela Torres-Puente; Luis M Villamayor; Rafael Borrás; María Borrás-Máñez; Montserrat Bosque; Juan J Camarena; Ester Colomer-Roig; Javier Colomina; Isabel Escribano; Oscar Esparcia-Rodríguez; Ana Gil-Brusola; Concepción Gimeno; Adelina Gimeno-Gascón; Bárbara Gomila-Sard; Damiana González-Granda; Nieves Gonzalo-Jiménez; María Remedio Guna-Serrano; José Luis López-Hontangas; Coral Martín-González; Rosario Moreno-Muñoz; David Navarro; María Navarro; Nieves Orta; Elvira Pérez; Josep Prat; Juan Carlos Rodríguez; María Montserrat Ruiz-García; Herme Vanaclocha; Caroline Colijn; Iñaki Comas
Journal: PLoS Med Date: 2019-10-31 Impact factor: 11.069

10 in total

1. What Can Genetic Relatedness Tell Us About Risk Factors for Tuberculosis Transmission?

Authors: Sarah V Leavitt; C Robert Horsburgh; Robyn S Lee; Andrew M Tibbs; Laura F White; Helen E Jenkins
Journal: Epidemiology Date: 2022-01-01 Impact factor: 4.822

2. Reproducibility in science: important or incremental?

Authors: Robyn S Lee; William P Hanage
Journal: Lancet Microbe Date: 2020-05-14

Review 3. The population genomics of within-host Mycobacterium tuberculosis.

Authors: Ana Y Morales-Arce; Susanna J Sabin; Anne C Stone; Jeffrey D Jensen
Journal: Heredity (Edinb) Date: 2020-10-15 Impact factor: 3.821

4. One is not enough: On the effects of reference genome for the mapping and subsequent analyses of short-reads.

Authors: Carlos Valiente-Mullor; Beatriz Beamud; Iván Ansari; Carlos Francés-Cuesta; Neris García-González; Lorena Mejía; Paula Ruiz-Hueso; Fernando González-Candelas
Journal: PLoS Comput Biol Date: 2021-01-27 Impact factor: 4.475

5. Tuberculosis in Australia's tropical north: a population-based genomic epidemiological study.

Authors: Ella M Meumann; Kristy Horan; Anna P Ralph; Belinda Farmer; Maria Globan; Elizabeth Stephenson; Tracy Popple; Rowena Boyd; Mirjam Kaestli; Torsten Seemann; Koen Vandelannoote; Christopher Lowbridge; Robert W Baird; Timothy P Stinear; Deborah A Williamson; Bart J Currie; Vicki L Krause
Journal: Lancet Reg Health West Pac Date: 2021-07-31

Review 6. Application of Next Generation Sequencing for Diagnosis and Clinical Management of Drug-Resistant Tuberculosis: Updates on Recent Developments in the Field.

Authors: Navisha Dookie; Azraa Khan; Nesri Padayatchi; Kogieleum Naidoo
Journal: Front Microbiol Date: 2022-03-24 Impact factor: 5.640

Review 7. Mycobacterium bovis: From Genotyping to Genome Sequencing.

Authors: Ana M S Guimaraes; Cristina K Zimpel
Journal: Microorganisms Date: 2020-05-03

8. Genomic variant-identification methods may alter Mycobacterium tuberculosis transmission inferences.

Authors: Katharine S Walter; Caroline Colijn; Ted Cohen; Barun Mathema; Qingyun Liu; Jolene Bowers; David M Engelthaler; Apurva Narechania; Darrin Lemmer; Julio Croda; Jason R Andrews
Journal: Microb Genom Date: 2020-07-31

9. Widespread sharing of pneumococcal strains in a rural African setting: proximate villages are more likely to share similar strains that are carried at multiple timepoints.

Authors: Madikay Senghore; Chrispin Chaguza; Ebrima Bojang; Peggy-Estelle Tientcheu; Rowan E Bancroft; Stephanie W Lo; Rebecca A Gladstone; Lesley McGee; Archibald Worwui; Ebenezer Foster-Nyarko; Fatima Ceesay; Catherine Bi Okoi; Keith P Klugman; Robert F Breiman; Stephen D Bentley; Richard Adegbola; Martin Antonio; William P Hanage; Brenda A Kwambana-Adams
Journal: Microb Genom Date: 2022-02

10. Mycobacterium tuberculosis complex lineage 5 exhibits high levels of within-lineage genomic diversity and differing gene content compared to the type strain H37Rv.

Authors: C N'Dira Sanoussi; Mireia Coscolla; Boatema Ofori-Anyinam; Isaac Darko Otchere; Martin Antonio; Stefan Niemann; Julian Parkhill; Simon Harris; Dorothy Yeboah-Manu; Sebastien Gagneux; Leen Rigouts; Dissou Affolabi; Bouke C de Jong; Conor J Meehan
Journal: Microb Genom Date: 2021-07

10 in total