| Literature DB >> 32023939 |
Lauren Mak1, Deshan Perera1, Raynell Lang2, Pathum Kossinna1, Jingni He1, M John Gill2, Quan Long1,3,4, Guido van Marle5.
Abstract
Keywords: HIV; Canada; molecular phylogenetics; viral evolution; person-to-person transmission inference; transmission network; summary statistics.Entities:
Year: 2020 PMID: 32023939 PMCID: PMC7074708 DOI: 10.3390/microorganisms8020196
Source DB: PubMed Journal: Microorganisms ISSN: 2076-2607
Figure 1Workflow of data analysis procedures. (a) Graphic representation of the data input/output. (b) Pipeline tools and how they were used. (c) Parameters tested for each tool.
Breakdown of the number of patients based on their clinical data completeness.
| Transmission Categories (TC) | Total | ||||
|---|---|---|---|---|---|
| Recipient | Donor | Control | |||
|
|
| 28 | 14 | 45 | 87 |
|
| 8 | 14 | 26 | 48 | |
|
| 0 | 1 | 3 | 4 | |
|
| 36 | 29 | 74 |
| |
Figure 2The Southern Alberta HIV Clinic (SAC) patient sequence and transmission data. Patient (A) has HIV, and their estimated infection date is the red star on their branch. They have a positive (First Positive—FP) HIV test result after their infection. At some time-point, represented by a red star and the red arrow, (a) infects (b). Previously, (b) had a negative HIV test. Afterwards, (a) has a positive HIV test result. The HIV sequences of samples from (a) and (b) are phylogenetically related, as represented by the tree. (b) is in transmission category 1 and infection category 1. (a) is in transmission category 2 and infection category 2. Samples for (a) and (b) were collected at the end of their respective branches.
Comparison of default and adjusted BEAST2 TransPhylo parameter combinations, and rationale for the choice of adjusted parameter values.
| Analysis Tool | Parameter Description | Default Model(s) | Adjusted Model(s) | Rationale |
|---|---|---|---|---|
|
| Molecular clock model | Strict clock model | Uncorrelated relaxed clock, rates drawn from log-normal distribution | By relaxing the clock and allowing non-zero clock variance, each branch can have its own rate of mutation. Better modelling of variable within-patient selective pressure, especially over multiple eras of combined antiretroviral therapy ranging from the 80s to the 10s can be obtained. |
| Substitution model | Hasegawa, Kishino and Yano (HKY) model | General time-reversible (GTR) model | By allowing the rate of each base-to-base substitution to be estimated independently, as opposed to just transitions and transversions, the overall mutational process can be estimated more accurately (especially for a highly mutable virus like HIV). | |
| Population growth model | Uniform size for coalescent-only populations | Birth-death skyline serial model | The generation, sampling, and removal time distributions are estimable. By allowing for arbitrary changes in the effective population size of HIV across patients, the dynamics of non-coalescent transmission histories can be modelled more accurately. | |
|
| Likelihood that a source is sampled | 0.5, unfixed | 0.99, fixed | If the likelihood is left unfixed, TransPhylo predicts nearly 3.6× as many patients. As the SAC population is i) relatively small, ii) geographically isolated and iii) the sole HIV care provider, having so many HIV-positive patients unknown to the SAC clinic is unlikely. |
| Generation time distribution | Shape = 2 | Shape = 2 | A slight reduction in the initiating scale parameter, increasing the amount of time between new infections, seems to better infer transmission relationships. | |
| Sampling time distribution | Shape = 2 | Shape = 2 | A slight reduction in the initiating scale parameter, increasing the amount of time between infection and sampling, seems to better infer transmission relationships. |
Summary statistics describing transmission trees generated by each combination of parameters for each program.
| Aligner | MUSCLE | HIVAligner | ||||||
|---|---|---|---|---|---|---|---|---|
| BEAST | Default | Adjusted | Default | Adjusted | ||||
| TransPhylo | Default | Adjusted | Default | Adjusted | Default | Adjusted | Default | Adjusted |
|
| 813.25 | 228.25 | 640.75 | 178.25 | 810.5 | 222.75 | 620.75 | 173.25 |
|
| 139.5 | 109.25 | 140 | 137 | 139.75 | 110.5 | 140 | 137.5 |
|
| 8.75 | 55 | 12 | 85 | 8 | 54.5 | 16.25 | 84.75 |
|
| 318.25 | 17.75 | 459.75 | 27.25 | 342.25 | 17.5 | 446.5 | 25.75 |
|
| 0.063 | 0.503 | 0.086 | 0.620 | 0.057 | 0.493 | 0.116 | 0.616 |
|
| 0.695 | 0.140 | 0.767 | 0.166 | 0.710 | 0.137 | 0.761 | 0.158 |
|
| 0.113 | 0.154 | 0.228 | 0.332 | 0.119 | 0.162 | 0.227 | 0.411 |
(A) The total number of patients represented in the tree. (B) The number of sampled patients predicted to be infected after 1989. (C) The number of (B) that was predicted to be infected by another (B). (D) The number of unsampled patients predicted to be infected after 1989. (E) Proportion of sampled patients that were infected by other sampled patients. (F) Proportion of patients predicted to be infected after 1989 that are unknown to the SAC. (G) Ratio of patients who were predicted to have infected multiple people to those who were predicted to have infected a single person.
The average of accurately inferred person-to-person (P2P) transmission relationships and infection dates in selected transmission trees generated by each combination of parameters for each program over 1000 replications at their 95% confidence.
| Align | BEAST | TransPhylo | Transmission Relationships | Infection Dates | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Avg | Low 2.5 | Up 97.5 | Diff | Total | Avg | Low 2.5 | Up 97.5 | Diff | Total | |||
|
|
|
| 12 | 23 | 11 | 36 | 22.456 | 17 | 27 | 10 | 87 | |
|
| 6.445 | 3 | 11 | 8 | 31 | 43 | 12 | |||||
|
|
| 15.855 | 10 | 21 | 11 | 18.45 | 14 | 23 | 9 | |||
|
| 3.679 | 1 | 8 | 7 | 30.696 | 25 | 36 | 11 | ||||
|
|
|
| 17.461 | 12 | 23 | 11 | 22.346 | 18 | 27 | 9 | ||
|
| 6.16 | 2 | 10 | 8 | 37.089 | 31 | 44 | 13 | ||||
|
|
| 15.914 | 11 | 22 | 11 | 18.411 | 14 | 23 | 9 | |||
|
| 3.714 | 1 | 7 | 6 | 30.867 | 25 | 36 | 11 | ||||
* The largest number of correctly inferred transmission relationships and viable infection dates are in bold and underlined.
Figure 3The transmission tree generated by MUSCLE, adjusted BEAST2 parameters, and adjusted TransPhylo parameters. Sampled patients are represented by black branches, and unsampled patients by light grey branches. The symbols represent person-to-person transmission events. Green and red triangles represent correctly and incorrectly inferred relationships respectively. Grey squares represent novel inferences of transmission relationships that have no precedence in the SAC dataset. The branches involved in Figure 4 clusters are highlighted in the correspondingly coloured boxes.
Figure 4Novel transmission clusters (A, B, C and D) identified from the transmission tree in Figure 3. Each node represents a sampled patient from the SAC database, and each arrow a directed transmission relationship. Coloured nodes represent sampled patients and indicate the corresponding coloured boxes in Figure 3. White nodes represent predicted unsampled patients. Solid arrows represent transmission relationships already known to the SAC, whereas dashed arrows represent novel inferences of transmission.