| Literature DB >> 26859389 |
Chris A Kieslich1,2, Phanourios Tamamis1,2, Yannis A Guzman1,2,3, Melis Onel1,2, Christodoulos A Floudas1,2.
Abstract
HIV-1 entry into host cells is mediated by interactions between the V3-loop of viral glycoprotein gp120 and chemokine receptor CCR5 or CXCR4, collectively known as HIV-1 coreceptors. Accurate genotypic prediction of coreceptor usage is of significant clinical interest and determination of the factors driving tropism has been the focus of extensive study. We have developed a method based on nonlinear support vector machines to elucidate the interacting residue pairs driving coreceptor usage and provide highly accurate coreceptor usage predictions. Our models utilize centroid-centroid interaction energies from computationally derived structures of the V3-loop:coreceptor complexes as primary features, while additional features based on established rules regarding V3-loop sequences are also investigated. We tested our method on 2455 V3-loop sequences of various lengths and subtypes, and produce a median area under the receiver operator curve of 0.977 based on 500 runs of 10-fold cross validation. Our study is the first to elucidate a small set of specific interacting residue pairs between the V3-loop and coreceptors capable of predicting coreceptor usage with high accuracy across major HIV-1 subtypes. The developed method has been implemented as a web tool named CRUSH, CoReceptor USage prediction for HIV-1, which is available at http://ares.tamu.edu/CRUSH/.Entities:
Mesh:
Substances:
Year: 2016 PMID: 26859389 PMCID: PMC4747591 DOI: 10.1371/journal.pone.0148974
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Fig 1Flowchart of hybrid modeling approach.
Comparison of HIV-1 subtype accuracies for unique patient test set.
| Subtype | CRUSH AUC | CRUSH Sensitivity | PhenoSeq AUC | PhenoSeq Sensitivity | PhenoSeq Specificity |
|---|---|---|---|---|---|
| A/AG | 0.758 | 0.750 | 0.784 | 0.750 | 0.818 |
| AE | 1.000 | 1.000 | 0.980 | 1.000 | 0.960 |
| B | 0.889 | 0.750 | 0.754 | 0.750 | 0.758 |
| C | 1.000 | 1.000 | 0.915 | 1.000 | 0.831 |
| Total | 0.922 | 0.889 | 0.852 | 0.889 | 0.815 |
* CRUSH sensitivity is based on a probability threshold that produces the corresponding PhenoSeq specificity.
Fig 2Effect of the rules and V3-loop:coreceptor interactions included in SVM models on prediction accuracy.
(A) The effect of the number of V3-loop:coreceptor interactions on accuracy. Accuracy is represented by the median AUC for 500 runs of 10-fold cross validation for both panels A and B. The accuracy at zero interactions is the accuracy based on only the rules. (B) Contribution of rules to accuracy when used in addition to the top 11 interactions Fig 3A. The naming scheme is as follows: Int Only–interactions only; Q–net charge; R – 11/24/25 rule; M–glycosylation motif; L–length. Dashed red line illustrates the accuracy when using all four rules and the top 11 interactions (QLMR, 0.977).
Fig 3Diagrams of the top selected interactions for the cases of all rules + interactions and interactions only.
(A) Interaction map for the 11 interactions selected in combination with all rules. V3-loop is shown as an idealized loop with 35 amino acids where grey circles indicate positions for which no interactions were selected (inactive), while green circles indicate V3-loop positions with interactions selected (active). Red triangles represent residues of CCR5 and blue squares represent residues of CXCR4, with dashed lines representing interactions with V3-loop residues. Ordered lists of observed amino acids (based on occurrence with a minimum of 5%) in one-letter code for each active V3-loop residue are provided. Observed amino acids for CCR5 tropic sequences are in red and those observed for CXCR4 tropic sequences in blue. Bolded letters in the ordered list of observed amino acids indicate an amino acid that is observed in at least 50% of sequences at a given position. (B) Interaction map for the 18 interactions selected without rules. Color scheme and layout is the same as in (A). Faded triangles/squares indicate interactions that were also selected when including all rules. The crossed out interaction was selected when including rules, but not when using interactions only.
Summary of best coreceptor usage model.
| Number of interacting residue pairs | 11 |
| Additional rules | Net charge, 11/24/25 rule, Motif, Length |
| Median AUC | 0.977 |
| Median sensitivity at 0.05 FPR | 0.917 |
a Results based on 500 runs of 10-fold cross-validation.
Comparison of methods on test set (1161 sequences).
| Method | Sensitivity at FPR 0.02 | Sensitivity at FPR 0.05 | Number of features | Total test CPU time (sec) |
|---|---|---|---|---|
| CRUSH | 0.797 | 0.892 | 15 | 0.59 |
| Probit | 0.723 | 0.811 | 3 | 0.13 |
| T-CUP2 | 0.635 | 0.770 | 70 | 222.46 |
| g2p | 0.486 | 0.676 | 1140 | - |
* The g2p method as described by Bozek et al. [24] requires that a sequence alignment be performed on the entire dataset prior to training/testing preventing an equal comparison of CPU time.
Method accuracy comparison on unique patient test set (373 sequences).
| Method | Sensitivity at FPR 0.02 | Sensitivity at FPR 0.05 |
|---|---|---|
| CRUSH | 0.677 | 0.839 |
| probit | 0.645 | 0.742 |
| T-CUP2 | 0.581 | 0.774 |
| g2p | 0.161 | 0.258 |
Comparison of HIV-1 subtype accuracies for test set.
| Subtype | CRUSH AUC | CRUSH Sensitivity | PhenoSeq AUC | PhenoSeq Sensitivity | PhenoSeq Specificity |
|---|---|---|---|---|---|
| A/AG | 0.758 | 0.750 | 0.721 | 0.750 | 0.692 |
| AE | 0.994 | 1.000 | 0.981 | 1.000 | 0.962 |
| B | 0.957 | 0.931 | 0.861 | 0.966 | 0.757 |
| C | 0.987 | 1.000 | 0.899 | 1.000 | 0.797 |
| Total | 0.953 | 0.942 | 0.869 | 0.962 | 0.777 |
* CRUSH sensitivity is based on a probability threshold that produces the corresponding PhenoSeq specificity.