| Literature DB >> 36103580 |
Christian Julian Villabona-Arenas1,2, Stéphane Hué1,2, James A C Baxter3, Matthew Hall4, Katrina A Lythgoe4,5, John Bradley1, Katherine E Atkins1,2,3.
Abstract
Inferring the transmission direction between linked individuals living with HIV provides unparalleled power to understand the epidemiology that determines transmission. Phylogenetic ancestral-state reconstruction approaches infer the transmission direction by identifying the individual in whom the most recent common ancestor of the virus populations originated. While these methods vary in accuracy, it is unclear why. To evaluate the performance of phylogenetic ancestral-state reconstruction to determine the transmission direction of HIV-1 infection, we inferred the transmission direction for 112 transmission pairs where transmission direction and detailed additional information were available. We then fit a statistical model to evaluate the extent to which epidemiological, sampling, genetic, and phylogenetic factors influenced the outcome of the inference. Finally, we repeated the analysis under real-life conditions with only routinely available data. We found that whether ancestral-state reconstruction correctly infers the transmission direction depends principally on the phylogeny's topology. For example, under real-life conditions, the probability of identifying the correct transmission direction increases from 32%-when a monophyletic-monophyletic or paraphyletic-polyphyletic tree topology is observed and when the tip closest to the root does not agree with the state at the root-to 93% when a paraphyletic-monophyletic topology is observed and when the tip closest to the root agrees with the root state. Our results suggest that documenting larger differences in relative intrahost diversity increases our confidence in the transmission direction inference of linked pairs for population-level studies of HIV. These findings provide a practical starting point to determine our confidence in transmission direction inference from ancestral-state reconstruction.Entities:
Keywords: HIV-1 epidemiology; Lasso regression; ancestral-state reconstruction; phylogenetic tree topology; who acquires infection from whom
Mesh:
Year: 2022 PMID: 36103580 PMCID: PMC9499565 DOI: 10.1073/pnas.2210604119
Source DB: PubMed Journal: Proc Natl Acad Sci U S A ISSN: 0027-8424 Impact factor: 12.779
Fig. 1.Phylogenetic covariates. Illustration of the different metrics that are used to define the covariates from the phylogenetic information class. The topology classes are PP, PM, and MM. The identity of the most basal tip is the individual with the tip that minimizes the number of internal nodes along the paths between the root and the tips (the alternative definition for inside the square corresponds to the agreement of the individual with the most basal tip with the individual with the higher probability at the root). The minimum root to tip distance is the shortest path from the root to the tips of an individual (calculated for each partner). Phylogenetic diversity indicates using the unique evolutionary history measure that is the sum of the branch lengths that are not shared across the subtree of an individual and that give rise to every single tip of the individual (calculated for each partner), as described in the documentation of the R package Caper (31). The shortest patristic distance is the shortest path connecting a tip from both individuals.
Covariates used in the two models
| Information class and covariate | Values (units where applicable) | |
|---|---|---|
| Model with all data | Model with routinely available data | |
| Epidemiological (E) | ||
| Sexual risk exposure group | Men who have sex with men or heterosexual | |
| Recency of the transmitter’s infection | Acute (transmission up to 90 d after infection) or chronic (otherwise) | Excluded |
| Sampling (S) | ||
| Sample size | Low (no. of unique sampled sequences in either partner <10) or high (otherwise) | |
| Sample size difference | Difference | Absolute difference in the no. of unique sampled sequences between partners |
| Time from transmission | Sum of the absolute time to sampling of both partners relative to transmission time (d) | Excluded |
| Genetic (G) | ||
| Sequence alignment length | No. of base pairs | |
| Intrahost nucleotide diversity difference | Difference | Absolute difference between within-partner mean pairwise sequence diversity (substitutions/site) |
| Multiplicity of infection | Single (probability of one founder unique sequence in the recipient is greater than or equal to 0.75) or multiple (otherwise) | Excluded |
| Phylogenetic (P) | ||
| Topology class | PP, PM, or MM | |
| Phylogenetic diversity difference | Difference | Absolute difference between the sum of branch lengths of each partner subtree |
| Root to tip difference | Difference | Absolute difference between the minimum root to tip distances of the partners’ sequences (substitutions/site) |
| Most basal tip identity | Transmitter, recipient, or both; the identity of the tip(s) that minimizes the no. of internal nodes along the paths between itself and the root | Agree, disagree, or ambiguous; whether the identity of the tip(s) that minimizes the no. of internal nodes between itself and the root matches the identity with the higher ancestral-state probability at the root |
| Interhost patristic distance | The shortest patristic distance between tips from the partners (substitutions/site) | |
*Subtraction of the recipient’s value from the transmitter’s value.
†As in ref. 8.
‡Illustrated in Fig. 1. To build these covariates when using a posterior distribution of trees, we selected either the most frequent observation (in the case of qualitative covariates) or the mean shift mode (in the case of the quantitative covariates).
§As in ref. 14.
¶We used the sum of the edge lengths that give rise to only one tip in the subtree as in ref. 27.
Fig. 2.Ancestral-state reconstruction. The probability for each transmission pair, i, that the transmitting partner is correctly identified using ML ancestral-state reconstruction. Observations are colored by the topology class. Observations with p > 0.5, p > 0.6, and p > 0.95 indicate that the inferred transmission direction was consistent with the known transmission history for the binary model, the ordinal model with relaxed threshold, and the ordinal model with conservative threshold, respectively. For the ordinal models, the outcome can be equivocal (0.4 < p < 0.6 for the relaxed threshold, 0.05 < p < 0.95 for the conservative threshold). The outcome is inconsistent if not consistent or equivocal.
Fig. 3.Model results. (A) AUC and 95% CIs of the models. The model name indicates the information’s class included in the model (i.e., epidemiological, genetic, sample, or phylogenetic). The size of each circle shows the number of covariates in the model after Lasso regression. The green color underscores the high-ranked models with equivalent discriminatory power. (B) The subset of covariates included in each model after Lasso regression colored by information class. The number of covariates in boxes from B corresponds to the size of the model in A. The green-colored boxes underscore high-ranked models with equivalent discriminatory power. The thick green box indicates the best-fit model. Gray-colored boxes emphasize models for which variable selection returned either a null model or a model without covariates from all the classes. (C and D) The same as in A and B but using only covariates that are routinely available and where the definition of the covariates did not consider the known direction of transmission. *Three covariates excluded in C and D.
Fig. 4.The probability that the inferred transmission direction is correct. (A) One-way sensitivity analysis for the binary model (consistent or inconsistent) best-fit model P, where a single covariate is fixed and all other covariates are varied over their ranges as observed in the data. (B) Multiway analysis with the same model in A, but each covariate value combination is plotted separately. (C and D) The same as A and B, respectively, but corresponding to the ordinal (consistent, inconsistent, equivocal with relaxed threshold) best-fit model SP.