| Literature DB >> 35215195 |
Hélène Duault1,2, Benoit Durand1, Laetitia Canini1.
Abstract
In order to better understand transmission dynamics and appropriately target control and preventive measures, studies have aimed to identify who-infected-whom in actual outbreaks. Numerous reconstruction methods exist, each with their own assumptions, types of data, and inference strategy. Thus, selecting a method can be difficult. Following PRISMA guidelines, we systematically reviewed the literature for methods combing epidemiological and genomic data in transmission tree reconstruction. We identified 22 methods from the 41 selected articles. We defined three families according to how genomic data was handled: a non-phylogenetic family, a sequential phylogenetic family, and a simultaneous phylogenetic family. We discussed methods according to the data needed as well as the underlying sequence mutation, within-host evolution, transmission, and case observation. In the non-phylogenetic family consisting of eight methods, pairwise genetic distances were estimated. In the phylogenetic families, transmission trees were inferred from phylogenetic trees either simultaneously (nine methods) or sequentially (five methods). While a majority of methods (17/22) modeled the transmission process, few (8/22) took into account imperfect case detection. Within-host evolution was generally (7/8) modeled as a coalescent process. These practical and theoretical considerations were highlighted in order to help select the appropriate method for an outbreak.Entities:
Keywords: genomic epidemiology; transmission tree; who-infected-whom
Year: 2022 PMID: 35215195 PMCID: PMC8875843 DOI: 10.3390/pathogens11020252
Source DB: PubMed Journal: Pathogens ISSN: 2076-0817
Figure 1PRISMA flow diagram representing the article selection process (from [27]).
Figure 2A simple transmission scenario (A), the reconstructed phylogenetic tree (B), and the transmission tree (C). Rectangles represent hosts, and black lines within a rectangle represent within-host evolution of the pathogen. Black circles correspond to sampled strains, red circles to transmitted strains, and red dotted lines to a transmission event. Length of host rectangles represent time from infection (t) to removal (R). The phylogenetic tree is reconstructed from sequences (a, b, c, and d) sampled at time T. The transmission tree considered the unobserved host U.
Epidemiological and genomic data necessary for each method. S stands for sequences, and P for phylogenetic trees. Packages are available for methods in bold. Removal time corresponds to time at which an individual becomes non-infectious, generally the culling time or end of hospitalization, and intrinsic characteristics are either number of individuals present on site or predominant animal species. Didelot et al.’s (2014) [17] method, while not based on a spatial kernel, penalized transmission trees after reconstruction if they did not respect geographical data, hence the parentheses surrounding the geographical data. Hall et al.’s (2015) [18] method could include contact data, but geographical data was used instead.
| Family | Method ( | Start of Exposure | Onset of Infectiousness | Sampling Time | Removal Time | Contact Data | Geographical Data | Intrinsic Characteristics | Phylogenetic Tree or Sequences | |
|---|---|---|---|---|---|---|---|---|---|---|
| Non-phylogenetic | Aldrin et al., 2011 [ | X | X | X | X | S | ||||
|
| X | S | ||||||||
| Ypma et al., 2012 [ | X | X | X | S | ||||||
|
| X | S | ||||||||
| Worby et al., 2014 [ | X | S | ||||||||
| Famulare et al. 2015 [ | X | S | ||||||||
|
| X | X | X | S | ||||||
|
| X | X | S | |||||||
| Sequential phylogenetic | Cottam et al., 2008 [ | X | X | X | P | |||||
| Didelot et al., 2014 [ | X | (X) | P | |||||||
| Eldholm et al., 2016 [ | X | P | ||||||||
|
| X | P | ||||||||
|
| X | X | X | X | P | |||||
| Simultaneous phylogenetic | Explicitly phylogenetic | Ypma et al., 2013 [ | X | X | X | X | S | |||
|
| X | X | (X) | X | S | |||||
|
| X | X | X | S | ||||||
|
| X | S | ||||||||
| Implicitly phylogenetic | Morelli et al., 2012 [ | X | X | X | X | S | ||||
| Mollentze et al., 2014 [ | X | X | S | |||||||
| Lau et al., 2015 [ | X | X | X | X | S | |||||
|
| X | X | X | X | X | S | ||||
| Montazeri et al., 2020 [ | X | S | ||||||||
Modeling of unobserved processes in the non-phylogenetic family. Within-host evolution (modeled or not) includes whether the transmission bottleneck is complete or weak. When transmission is modeled, we mention the states hosts can find themselves in (S: susceptible, E: latent, I: infectious, R: removed). In addition, either geographical distance (spatial kernel), contact data, or random mixing are considered. Finally, the transmission model mentions whether there is only one index case possible (single introduction) or multiple.
| Method (Name) [Reference] | Sequence Mutation | Within-Host Evolution | Transmission | Case Observation | Inference Method |
|---|---|---|---|---|---|
| Aldrin et al., 2011 [ | Kimura model | No explicit model | SIR (infectious period) | All cases are observed but not always sampled | Partial Maximum Likelihood |
| Complete | Distance kernel | ||||
| Multiple | |||||
| Jombart et al., 2011 (Seqtrack) [ | User’s choice | No explicit model | No explicit model | All cases are observed and sampled | Edmonds algorithm |
| Complete | |||||
| Ypma et al., 2012 [ | Deletion + Transition + Transversion | No explicit model | SEIR (latency/infectious period) | All cases are observed but not always sampled | Bayesian |
| Complete | Spatial kernel | ||||
| Single | |||||
| Jombart et al., 2014 | Mutation rate | No explicit model | SI (generation times) | Proportion of sampled cases | Bayesian |
| Complete | Random mixing | ||||
| Multiple | |||||
| Worby et al., 2014 [ | Mutation rate | Pathogen population size | No explicit model | All cases are observed and sampled | Observed genetic distance vs. theoretical distribution |
| Weak | |||||
| Famulare et al., 2015 [ | Mutation rate | No explicit model | No explicit model | No assumption | Likelihood ratio test + Pruning algorithm |
| Worby et al., 2016 (bitrugs) [ | No explicit model | No explicit model | SEIR (latency/infectious period) | Test sensitivity < 1 | Bayesian |
| No assumption | Random mixing | ||||
| Multiple | |||||
| Campbell et al., 2019 | Mutation rate | No explicit model | SI (generation times) | Proportion of sampled cases | Bayesian |
| Complete | Contact data | ||||
| Multiple |
Modeling of unobserved processes in the sequential phylogenetic family. For the sequence mutation process, NA stands for not applicable. Within-host evolution (modeled or not) includes whether the transmission bottleneck is complete or weak. When transmission is modeled, we mention the states hosts can find themselves in (S: susceptible, E: latent, I: infectious, R: removed). In addition, either geographical distance (spatial kernel), contact data, or random mixing are considered. Finally, the transmission model mentions whether there is only one index case possible (single introduction) or multiple. In the inference method, we mention how phylogenetic trees are used to infer transmission trees (either internal nodes or branches are labelled with the host or phylogenetic trees are used as a source of information). * means multiple sequences can be considered per epidemiological unit.
| Method (Name) [Reference] | Sequence Mutation | Within-Host Evolution | Transmission | Case Observation | Inference Method |
|---|---|---|---|---|---|
| Cottam et al., 2008 [ | NA | No explicit model | SEIR (latency/infectious period) | All cases are observed and sampled | Label internal nodes |
| Complete | Random mixing | Maximum | |||
| Single | |||||
| Didelot et al., 2014 [ | NA | Coalescent process | SIR (infectious period) | All cases are observed and sampled | Label branches |
| Complete | Random mixing | Bayesian | |||
| Single | |||||
| Eldholm et al., 2016 [ | NA | Coalescent process | SEIR (latency/infectious period) | Probability threshold | Information source |
| Complete | Random mixing | Edmonds’ algorithm | |||
| Single | |||||
| Didelot et al., 2017 | NA | Coalescent process | SI (generation times) | Proportion of sampled cases | Label branches |
| Complete | Random mixing | Bayesian | |||
| Single | |||||
| Sashittal et al., 2020 (TiTUS) [ | NA | No explicit model | No explicit model | All cases are observed and sampled | Label internal nodes |
| Weak * | Logical problem |
Modeling of unobserved processes in the simultaneous phylogenetic family. For the sequence mutation process, the user could either use a single substitution model or choose. Within-host evolution (modeled or not) includes whether the transmission bottleneck is complete or weak. When transmission is modeled, we mention the states hosts can find themselves in (S: susceptible, E: latent, I: infectious, R: removed). In addition, either geographical distance (spatial kernel), contact data, or random mixing are considered. Finally, the transmission model mentions whether there is only one index case possible (single introduction) or multiple. * means multiple sequences can be considered per epidemiological unit.
| Method (Name) [Reference] | Sequence Mutation | Within-Host Evolution | Transmission | Case Observation | Inference Method |
|---|---|---|---|---|---|
| Ypma et al., 2013 [ | Mutation rate | Coalescent process | SEIR (latency/infectious period) | All cases are observed and sampled | Bayesian |
| Complete | Spatial kernel | ||||
| Single | |||||
| Hall et al., 2015 (beastlier) [ | User’s choice | Coalescent process | SEIR (latency/infectious period) | All cases are observed but not always sampled | Bayesian |
| Complete * | Spatial kernel | ||||
| Single | |||||
| De Maio et al., 2016 (SCOTTI) [ | User’s choice | Coalescent process | Migration model | Maximum number of hosts | Bayesian |
| Weak * | |||||
| Klinkenberg | Mutation rate | Coalescent process | SI (generation times) | All cases are observed but not always sampled | Bayesian |
| Complete | Random mixing | ||||
| Single | |||||
| Morelli et al., 2012 [ | Jukes Cantor model | No explicit model | SEIR (latency/infectious period) | All cases are observed and sampled | Bayesian |
| Complete | Spatial kernel | ||||
| Single | |||||
| Mollentze et al., 2014 [ | Kimura model | No explicit model | SEIR (latency/infectious period) | Observed cases contribute to transmission after removal time | Bayesian |
| Complete | Spatial kernel | ||||
| Multiple | |||||
| Lau et al., 2015 [ | Kimura model | No explicit model | SEIR (latency/infectious period) | All cases are observed but not always sampled | Bayesian |
| Complete | Spatial kernel | ||||
| Multiple | |||||
| Firestone et al., 2020 (BORIS) [ | Kimura model | No explicit model | SEIR (latency/infectious period) | All cases are observed but not always sampled | Bayesian |
| Complete | Spatial kernel | ||||
| Multiple | |||||
| Montazeri et al., 2020 [ | Jukes Cantor model | No explicit model | No explicit model | All cases are observed and sampled | Bayesian |
| Complete |
Figure 3Links between methods of the non-phylogenetic family. Rectangles represent criteria on which to choose a method and the grey circles represent either the name of the method’s package or the first author and article date [28,32,37,38].
Figure 4Three links between phylogenetic (on the left) and transmission trees (on the right). Node annotation with observed hosts (A) leads to the identification of transmission links. Annotating the branches (B) adds on the time of transmission t. Annotating branches with observed and unobserved hosts (C) means the identification of host U is possible.
Figure 5Links between methods of the sequential phylogenetic family. Rectangles represent criteria on which to choose a method and the grey circles represent either the name of the method’s package or the first author and article date [2,17,31,39].
Figure 6Links between methods of the simultaneous phylogenetic family. Rectangles represent criteria on which to choose a method and the grey circles represent either the name of the method’s package or the first author and article date [1,5,23,42,43].