| Literature DB >> 27005433 |
Hollie-Ann Hatherell1,2, Caroline Colijn3, Helen R Stagg4, Charlotte Jackson4, Joanne R Winter4, Ibrahim Abubakar4,5.
Abstract
BACKGROUND: Whole genome sequencing (WGS) is becoming an important part of epidemiological investigations of infectious diseases due to greater resolution and cost reductions compared to traditional typing approaches. Many public health and clinical teams will increasingly use WGS to investigate clusters of potential pathogen transmission, making it crucial to understand the benefits and assumptions of the analytical methods for investigating the data. We aimed to understand how different approaches affect inferences of transmission dynamics and outline limitations of the methods.Entities:
Keywords: Systematic review; Transmission; Tuberculosis; Whole genome sequencing
Mesh:
Year: 2016 PMID: 27005433 PMCID: PMC4804562 DOI: 10.1186/s12916-016-0566-x
Source DB: PubMed Journal: BMC Med ISSN: 1741-7015 Impact factor: 8.775
Fig. 1Visual representation of the four topics of the review, with colours representing different strains of TB. a Direction of transmission: permissible either way for individuals with the same strain (same colour); excluded for cases with different strains. b Within-host diversity, in the first instance as microevolution of an infecting strain and in the second due to mixed infection. A source case with a diverse burden can transmit different combinations of strains. c Strain diversity over time. d Drug resistance patterns in the form of acquired drug resistance mutations (red line) followed by transmission
Fig. 2PRISMA flowchart. *Includes one additional study that was found through reference list screening. M.tb, Mycobacterium tuberculosis; TB, tuberculosis; WGS, whole genome sequencing
Studies using SNP thresholds to confirm recent transmission, relapse versus re-infection or microevolution versus mixed infection
| Journal article | How was threshold defined? | Cut-off | Sampling fraction | Lineages |
|---|---|---|---|---|
| Bryant et al. [ | Own data | ≤6 SNPs relapse (same strain); >1,306 re-infection (different) | 47 sequenced out of 50 chosen | Four major lineages |
| Clark et al. [ | Unknown | <50 SNPs defined a cluster | CAS, LAM, EAI, T1, T2, Beijing, X1 | |
| Guerra-Assunção et al. [ | Own data | ≤10 SNPs relapse; >100 re-infection | 60 out of 139 WGS confirmed recurrences | Four major lineages |
| Guerra-Assunção et al. [ | Own data (transmission); Guerra-Assunção et al. [ | ≤10 SNPs confirmed transmission; ≤10 SNPs defined a relapse | 1,687 out of 2,332 had WGS | Four major lineages |
| Kato-Maeda et al. [ | Own data | 0–2 SNPs per transmission event | ||
| Lee et al. [ | Own data | 0–1 SNPs confirmed transmission | 631 ‘improbable’ transmission pairs—between outbreak cases and cases in other villages | Outbreak isolates were Euro-American lineage |
| Luo et al. [ | Walker et al. [ | |||
| Roetzer et al. [ | Own data | 3 SNPs confirmed transmission | 31 out of 2,301 (for the threshold). Equivalent to eight transmission chains of 2–7 patients | Haarlem lineage |
| Walker et al. [ | Own data | ≤5 SNPs cluster; >12 SNPs no transmission | 303 out of 609 (for the threshold) | All five major lineages |
| Walker et al. [ | Own data | 475, 1,032 and 1,096 SNPs suggested that patients had been secondarily infected with a different strain rather than within-host evolution | Pulmonary vs extra pulmonary pairs from 49 patients and 110 longitudinal isolates from 30 patients | All five major lineages |
| Witney et al. [ | Walker et al. [ |
Methods studies used to confirm direction of transmission
| Journal article | How was direction of transmission determined? |
|---|---|
| Didelot et al. [ | Epidemiological data and WGS used in a Bayesian inference framework to construct a transmission tree |
| Gardy et al. [ | Social network analysis and contact tracing posed putative transmission, timing of infection and smear status was used to narrow down possible direction and WGS to remove transmission events involving cases with different lineages |
| Kato-Maeda et al. [ | Contact tracing and accumulation of SNPs |
| Luo et al. [ | Epidemiological links and timing of infection and symptoms helped propose direction of transmission between isolates in the same WGS-based cluster. Transmission of mutant alleles from case with mixed base calls |
| Mehaffy et al. [ | Genomic and epidemiological information (i.e. SNP pattern, contact information, year of diagnosis and infectiousness based on smear and chest X-ray results) |
| Pérez-Lago et al. [ | In one case direction was proposed by the transmission of mutant alleles from a case with mixed base calls |
| Roetzer et al. [ | Contact tracing revealed transmission chains and accumulation of variation is mentioned, although not clear if this resolved the order of the chain |
| Schürch et al. [ | Accumulation of SNPs |
| Smit et al. [ | Accumulation of SNPs and period of infectiousness |
Definitions of heterozygous base calls used to classify mixed infection
| Journal article | Mixed infections or microevolution | Definition of heterozygous base call |
|---|---|---|
| Bryant et al. [ | Mixed infection | Mixed base positions were identified at sites where more than one base had been identified in a single sample, where each allele was supported by at least 5 % of reads (minimum read depth of four). Included only positions without strand bias ( |
| Guerra-Assunção et al. [ | Mixed infection | Sample genotypes were called using the majority allele (minimum frequency 75 %) in positions supported by at least 20-fold coverage; otherwise they were classified as missing (thus ignoring heterozygous calls). We excluded samples with >15 % missing genotype calls, to remove possible contaminated or mixed samples or technical errors |
| Guerra-Assunção et al. [ | Mixed infection | A position was classified as heterozygous if >1 allele accounted for ≥30 % of the reads (and there were >30 reads). More than 140 heterozygous positions in one sample classified as mixed infection |
| Kato-Maeda et al. [ | Mixed infection | Mixed infection was identified when there was a heterozygous base call: 38 % of reads supported the variant; the rest supported reference |
| Luo et al. [ | Microevolution | Kept only the calls in which the coverage was ten and the less frequent allele was supported by at least five high-quality reads, as reliable calls. Presence of mixed base calls could indicate microevolution in that patient |
| Pérez-Lago et al. [ | Mixed infection | Less frequent nucleotide was supported by five reads |
| Walker et al. [ | Microevolution | Suggestive of ‘sub-populations’; i.e. microevolution |
Fig. 3Effect of sampling on the phylogenetic tree. a Representation of a transmission tree, where nodes represent individuals, numbers represent the order of infection chronologically and the arrows show the direction of transmission. b Phylogenetic tree when all individuals in the outbreak are sampled. Transmission pairs are not necessarily paired on the tree as they may not be the most similar within the context of the outbreak. For example, if we assume that 1 had a long, chronic TB infection then because of the amount of diversity that can accumulate over time it is possible for the genomes from 2 and 3 to be more closely related to each other than to the genome from 1, even though 1 infected them both. This is because the strain that was sampled from 1 has evolved since 1 infected 2 and 3. While rejecting pairs not adjacent on the phylogenetic tree seems sound when sampling is sparse (as transmission pairs would then be relatively rare in the dataset and closer in phylogenetic distance than typical pairs of tips), when sampling is dense (as is desirable in epidemiological investigations). c Individuals 2, 3, 4 and 8 have not been sampled for the reconstruction of this tree. This makes the distances between the average pair of tips in the tree larger, highlights the close phylogenetic distance between 6 and 7 and (correctly) suggests transmission occurred between these individuals
Findings and recommendations
| Over-arching findings from included papers | Recommendations |
|---|---|
| Suggested SNP thresholds for evidence of transmission are heterogeneous and sensitive to the finding of epidemiological links, SNP calling protocols and culturing/sampling, thus potentially are not transferrable between settings and/or studies | When setting study-specific SNP thresholds consider the time between samples, mutation rate, evolutionary pressure the strain may have been subjected to, and the endemicity of strains. Consider alternative approaches for determining transmission, including Bayesian approaches |
| The distinction between relapse and re-infection for repeated instances of TB disease has been made empirically (by examining the distribution of SNP distances between the initially infecting and subsequently infecting strains) | While existing thresholds appear adequate for clinical trials, consideration of epidemiological and clinical data is important, as well as a better idea of the within-host mutation rate when more accurate classification is required |
| The lack of diversity within | Deep sequencing, multiple samples and looking at shared minor variants (mutations present at low frequencies) will enhance detection of diversity. Epidemiological data, and consideration of associated uncertainty due to missing contact information, will also be necessary |
| Examining resistance-conferring mutations shared by phylogenetic clusters is a common method for identifying transmission of drug-resistant strains. However, phylogenetic clusters do not necessarily correspond to transmission clusters | Reconstruction of the transmission tree followed by an examination of the drug resistance patterns between linked individuals may be more appropriate |