| Literature DB >> 33122765 |
Vlad Novitsky1, Jon A Steingrimsson2, Mark Howison3, Fizza S Gillani2, Yuanning Li4, Akarsh Manne2, John Fulton2, Matthew Spence5, Zoanne Parillo5, Theodore Marak5, Philip A Chan2,5, Thomas Bertrand5, Utpala Bandy5, Nicole Alexander-Scott5, Casey W Dunn4, Joseph Hogan2, Rami Kantor6.
Abstract
Public health interventions guided by clustering of HIV-1 molecular sequences may be impacted by choices of analytical approaches. We identified commonly-used clustering analytical approaches, applied them to 1886 HIV-1 Rhode Island sequences from 2004-2018, and compared concordance in identifying molecular HIV-1 clusters within and between approaches. We used strict (topological support ≥ 0.95; distance 0.015 substitutions/site) and relaxed (topological support 0.80-0.95; distance 0.030-0.045 substitutions/site) thresholds to reflect different epidemiological scenarios. We found that clustering differed by method and threshold and depended more on distance than topological support thresholds. Clustering concordance analyses demonstrated some differences across analytical approaches, with RAxML having the highest (91%) mean summary percent concordance when strict thresholds were applied, and three (RAxML-, FastTree regular bootstrap- and IQ-Tree regular bootstrap-based) analytical approaches having the highest (86%) mean summary percent concordance when relaxed thresholds were applied. We conclude that different analytical approaches can yield diverse HIV-1 clustering outcomes and may need to be differentially used in diverse public health scenarios. Recognizing the variability and limitations of commonly-used methods in cluster identification is important for guiding clustering-triggered interventions to disrupt new transmissions and end the HIV epidemic.Entities:
Year: 2020 PMID: 33122765 PMCID: PMC7596705 DOI: 10.1038/s41598-020-75560-1
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1Comparison of proportion of HIV-1 sequences in clusters within commonly-used analytical approaches. Graphs A-M each represent the 12 model-based methods/variations examined. Solid lines in each graph represent the range of proportions of clustered sequences (Y axis) according to topological support (X axis) and distance thresholds (colored squares (legend at the top of the Figure), matching the line colors). Color-matching dashed lines in each graph represent the range of proportions of clustered sequences identified by HIV-TRACE according to five distance thresholds (see text for details).
Figure 2Comparison of proportion of HIV-1 sequences in clusters between commonly-used analytical approaches. Each of the 49 panels demonstrates proportions of HIV sequences in clusters (Y axis) identified by the 12 selected methods (X axis; also represented by colors and outlined in the legend above the panels), representing a distinct combination of topological support (outlined in the gray line above the panels) and distance thresholds (outlined in the gray line to the right of the panels); see text for more details.
HIV-1 clusters identified by seven commonly-used analytical approaches according to strict and relaxed sets of topological support and distance thresholds. HIV-TRACE at 0.015 TN93 distance threshold identified 172 clusters (671 sequences in clusters; 36%).
| Methods | Strict thresholds | Relaxed thresholds | ||||
|---|---|---|---|---|---|---|
| Topological support | Mean TN93 pairwise distances | # Of clusters (# of sequences in clusters; %) | Topological support | Mean TN93 pairwise distances | # Of clusters (# of sequences in clusters; %) | |
| RAxML | 0.95; rapid bootstrap; 1000 replicates | 0.015 | 167 (500; 27%) | 0.80; rapid bootstrap; 1000 replicates | 0.045 | 220 (847; 45%) |
| FastTree aLRT | 0.95; aLRT | 0.015 | 163 (500; 27%) | 0.90; aLRT | 0.030 | 212 (856; 45%) |
| FastTree bootstrap | 0.95; regular bootstrap; 1000 replicates | 0.015 | 146 (451; 24%) | 0.80; rapid bootstrap; 1000 replicates | 0.045 | 201 (772; 41%) |
| PhyML aLRT | 0.95; aLRT | 0.015 | 162 (496; 26%) | 0.90; aLRT | 0.045 | 234 (1019; 54%) |
| MEGA | MCL; 0.95; regular bootstrap; 1000 replicates | 0.015 | 156 (411; 22%) | MCL; 0.80; regular bootstrap; 1000 replicates | 0.045 | 223 (712; 38%) |
| IQ-Tree ufast | 1.0; ultrafast bootstrap; 1000 replicates | 0.015 | 187 (573; 30%) | 0.95; ultrafast bootstrap; 1000 replicates | 0.030 | 231 (913; 48%) |
| IQ-Tree regular | 0.95; regular bootstrap; 100 replicates | 0.015 | 146 (439; 23%) | 0.80; regular bootstrap; 100 replicates | 0.045 | 198 (758; 40%) |
aLRT approximate likelihood ratio test, MCL maximum composite likelihood, ufast ultrafast bootstrap, # number.
Figure 3Differences of proportions of clustered HIV-1 sequences between method-pairs. The graph represents differences in proportions of clustered HIV-1 sequences (Y axis; shown with 95% CI) that were identified by pairs of the seven methods (X axis). Differences are ranked from left to right in descending order of absolute values, according to relaxed (red squares) and strict (green squares) thresholds. The red dashed line outlines a proportion difference of zero. Positive or negative differences in proportions depend on the directionality of the comparison between each methods-pair; see text for more details.
Figure 4Concordance of HIV-1 clustering: proportion of sequence pairs clustered by method-pairs. In these asymmetric heatmaps, each of the 64 small squares in each panel represents the proportion of sequence pairs that were clustered together in one of the eight methods examined (listed at the bottom of the heatmap), and also in the second paired method (listed on the left of the heatmap). For example, the 3rd square from the left in the top row shows proportion of sequence pairs that clustered together by IQ-Tree ultra-fast bootstrap that also clustered together by RAxML; with the denominator being the proportion of clustered sequence pairs in IQ-Tree ultra-fast bootstrap analysis). The squares on the diagonal line from bottom left to upper right of each panel show concordance between the same methods, which is always 100%. Panel A demonstrates analyses according to strict thresholds and panel B according to relaxed thresholds (for more methods and thresholds details see text and Table 1). The scale of proportions for both panels is also shown.
Figure 5Concordance of HIV-1 clustering: proportion of identical clusters in method-pairs. In these asymmetric heatmaps, each of the 64 small squares in each panel represents the proportion of identical clusters that were identified in one of the eight methods examined (listed at the bottom of the heatmap), and also in the second paired method (listed on the left of the heatmap). The squares on the diagonal line from bottom left to upper right of each panel show concordance between the same methods, which is always 100%. Panel A demonstrates analyses according to strict thresholds and panel B according to relaxed thresholds (for more methods and thresholds details see text and Table 1). The scale of proportions for both panels is also shown.
Figure 6Concordance of HIV-1 clustering: proportion of sequences not clustered by method-pairs. In these asymmetric heatmaps each of the 64 small squares in each panel represents the proportion of non-clustered sequences that were identified in one of the eight methods examined (listed at the bottom of the heatmap), and also in the second paired method (listed on the left of the heatmap). The squares on the diagonal line from bottom left to upper right of each panel show concordance between the same methods, which is always 100%. Panel A demonstrates analyses according to strict thresholds and panel B according to relaxed thresholds (for more methods and thresholds details see text and Table 1). The Scale of proportions for both panels is also shown.