| Literature DB >> 32033422 |
Samuel L Hong1, Simon Dellicour1,2, Bram Vrancken1, Marc A Suchard3,4,5, Michael T Pyne6, David R Hillyard7, Philippe Lemey1, Guy Baele1.
Abstract
Infections with HIV-1 group M subtype B viruses account for the majority of the HIV epidemic in the Western world. Phylogeographic studies have placed the introduction of subtype B in the United States in New York around 1970, where it grew into a major source of spread. Currently, it is estimated that over one million people are living with HIV in the US and that most are infected with subtype B variants. Here, we aim to identify the drivers of HIV-1 subtype B dispersal in the United States by analyzing a collection of 23,588 pol sequences, collected for drug resistance testing from 45 states during 2004-2011. To this end, we introduce a workflow to reduce this large collection of data to more computationally-manageable sample sizes and apply the BEAST framework to test which covariates associate with the spread of HIV-1 across state borders. Our results show that we are able to consistently identify certain predictors of spread under reasonable run times across datasets of up to 10,000 sequences. However, the general lack of phylogenetic structure and the high uncertainty associated with HIV trees make it difficult to interpret the epidemiological relevance of the drivers of spread we are able to identify. While the workflow we present here could be applied to other virus datasets of a similar scale, the characteristic star-like shape of HIV-1 phylogenies poses a serious obstacle to reconstructing a detailed evolutionary and spatial history for HIV-1 subtype B in the US.Entities:
Keywords: BEAGLE; BEAST; HIV; covariates; phylodynamics; phylogeography; predictors; spread
Year: 2020 PMID: 32033422 PMCID: PMC7077180 DOI: 10.3390/v12020182
Source DB: PubMed Journal: Viruses ISSN: 1999-4915 Impact factor: 5.048
Figure 1(A) Ratios between the mean terminal branch lengths and the mean internal branch lengths. The higher this ratio, the more star-like a phylogeny. Violin plots are shown for sample sizes in which tree distributions are available. (B) Log-maximum clade credibility (MCC) scores of each empirical tree distribution. Tree distributions sampled using the Phylogenetic Diversity Analyzer (PDA) method have higher MCC scores, implying somewhat less variance in the tree topologies reconstructed.
Figure 2Connectivity plots for the randomly subsampled datasets. The geographical reconstructions from each MCC tree are projected into geographical space by drawing a line for each transition event, connecting origin (circle), and destination. The connectivity plot for the 250 taxa tree is not included in this plot because it was found to be similar to those of the 500, 750, and 1000 taxa datasets. For the connectivity plots for the datasets subsampled using the Phylogenetic Diversity Analyzer, we refer to the Supplementary Materials (Figure S3).
Predictors included in over 50% of subsamples
| Predictor | Times Included (BF>5) | Bayes Factor | Coefficient |
|---|---|---|---|
| Population size at origin | 10/14 | 265.4 (5.2 - | 2.8 (1.4 - 3.5) |
| Population size at destination | 10/14 | 1.0 (0.8 - 1.6) | |
| Gini index at origin | 10/14 | 5165.4 (36.0 - | 2.7 (1.0 - 4.2) |
1 Median, minimum, and maximum Bayes Factors across all subsamples in which the predictor was included. 2 Median, minimum, and maximum mean predictor coefficient (conditioned on inclusion, ). 3 Income inequality Gini coefficient.
Figure 3Predictors of HIV-1B spatial spread within the United States by subsample size and sampling scheme (PDA/random). Posterior inclusion probabilities for each predictor are represented in terms of indicator expectations (). Only predictors with a Bayes factor greater than 5 in at least one analysis are displayed. Dotted vertical lines denote inclusion probabilities corresponding to Bayes factors of 5, 25, and 100, respectively. For all inclusion probabilities and a description of each predictor, we refer to Supplementary Materials Figures S5 and S6.