| Literature DB >> 30355996 |
Olga Tarasova1, Nadezhda Biziukova2, Dmitry Filimonov3, Vladimir Poroikov4.
Abstract
The high variability of the human immunodeficiency virus (HIV) is an important cause of HIV resistance to reverse transcriptase and protease inhibitors. There are many variants of HIV type 1 (HIV-1) that can be used to model sequence-resistance relationships. Machine learning methods are widely and successfully used in new drug discovery. An emerging body of data regarding the interactions of small drug-like molecules with their protein targets provides the possibility of building models on "structure-property" relationships and analyzing the performance of various machine-learning techniques. In our research, we analyze several different types of descriptors in order to predict the resistance of HIV reverse transcriptase and protease to the marketed antiretroviral drugs using the Random Forest approach. First, we represented amino acid sequences as a set of short peptide fragments, which included several amino acid residues. Second, we represented nucleotide sequences as a set of fragments, which included several nucleotides. We compared these two approaches using open data from the Stanford HIV Drug Resistance Database. We have determined the factors that modulate the performance of prediction: in particular, we observed that the prediction performance was more sensitive to certain drugs than a type of the descriptor used.Entities:
Keywords: HIV-1; computational prediction; protease; random forest; resistance; reverse transcriptase
Mesh:
Substances:
Year: 2018 PMID: 30355996 PMCID: PMC6278491 DOI: 10.3390/molecules23112751
Source DB: PubMed Journal: Molecules ISSN: 1420-3049 Impact factor: 4.411
The performance of the prediction of the human immunodeficiency virus type 1 (HIV-1) resistance to reverse transcriptase (RT) and protease inhibitors (PR) inhibitors.
| Drug | Peptide Descriptors | Nucleotide Descriptors | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|
|
| ||||||||||
| Sns | Spc | PPV | MCC | AUC | Sns | Spc | PPV | MCC | AUC | |
| 3TC | 0.98 | 0.68 | 0.95 | 0.74 | 0.96 | 0.99 | 0.63 | 0.93 | 0.75 | 0.97 |
| ABC | 0.98 | 0.74 | 0.94 | 0.70 | 0.91 | 0.98 | 0.72 | 0.89 | 0.72 | 0.92 |
| AZT | 0.91 | 0.76 | 0.89 | 0.70 | 0.93 | 0.93 | 0.78 | 0.85 | 0.72 | 0.94 |
| D4T | 0.93 | 0.80 | 0.84 | 0.70 | 0.91 | 0.94 | 0.79 | 0.85 | 0.70 | 0.91 |
| DDI | 0.90 | 0.74 | 0.92 | 0.72 | 0.94 | 0.98 | 0.65 | 0.89 | 0.69 | 0.91 |
| EFV | 0.88 | 0.76 | 0.91 | 0.70 | 0.91 | 0.87 | 0.89 | 0.81 | 0.69 | 0.91 |
| ETR | 0.92 | 0.74 | 0.94 | 0.70 | 0.92 | 0.87 | 0.99 | 0.88 | 0.78 | 0.93 |
| NVP | 0.96 | 0.80 | 0.90 | 0.72 | 0.94 | 0.92 | 0.89 | 0.87 | 0.77 | 0.96 |
| TDF | 0.88 | 0.69 | 0.86 | 0.70 | 0.91 | 0.60 | 0.95 | 0.91 | 0.69 | 0.97 |
| Avg * | 0.93 | 0.75 | 0.91 | 0.71 | 0.93 | 0.90 | 0.81 | 0.88 | 0.72 | 0.94 |
|
| ||||||||||
| Sns | Spc | PPV | MCC | AUC | Sns | Spc | PPV | MCC | AUC | |
| FPV | 0.96 | 0.68 | 0.89 | 0.69 | 0.91 | 0.94 | 0.61 | 0.88 | 0.69 | 0.91 |
| ATV | 0.98 | 0.68 | 0.90 | 0.70 | 0.92 | 0.97 | 0.61 | 0.90 | 0.70 | 0.91 |
| IDV | 0.97 | 0.74 | 0.89 | 0.72 | 0.93 | 0.98 | 0.86 | 0.92 | 0.77 | 0.96 |
| LPV | 0.96 | 0.76 | 0.86 | 0.70 | 0.92 | 0.92 | 0.71 | 0.87 | 0.74 | 0.93 |
| NFV | 0.96 | 0.89 | 0.88 | 0.77 | 0.96 | 0.95 | 0.84 | 0.91 | 0.77 | 0.94 |
| SQV | 0.94 | 0.79 | 0.89 | 0.75 | 0.93 | 0.96 | 0.83 | 0.91 | 0.77 | 0.94 |
| TPV | 0.96 | 0.74 | 0.86 | 0.72 | 0.94 | 0.92 | 0.64 | 0.89 | 0.70 | 0.92 |
| DRV | 0.96 | 0.78 | 0.88 | 0.72 | 0.91 | 0.98 | 0.82 | 0.84 | 0.72 | 0.92 |
| Avg | 0.96 | 0.76 | 0.88 | 0.72 | 0.93 | 0.95 | 0.74 | 0.89 | 0.73 | 0.93 |
* Average value for the set of drugs. Abbreviations are as follows: lamivudine (3TC), abacavir (ABC), zidovudine (AZT), stavudine (D4T), didanosine (DDI), efavirenz (EFV), etravirine (ETR), nevirapine (NVP), rilpivirine (RPV), and tenofovir (TDF). The data on the resistance to protease inhibitors are available for the following eight drugs: fosamprenavir (FPV), azatanavir (ATV), indinavir (IDV), lopinavir (LPV), nelfinavir (NFV), saquinavir (SQV), tipranavir (TPV), and darunavir (DRV).
The results of the prediction performance for the Protease sequences with wild-type residues in the major drug resistance position (HiglyResPR dataset).
| Drug | Nr * | Ns | Sns | Spc | PPV | MCC | AUC |
|---|---|---|---|---|---|---|---|
| FPV | 65 | 340 | 0.43 | 0.94 | 0.51 | 0.40 | 0.91 |
| ATV | 96 | 271 | 0.84 | 0.96 | 0.82 | 0.81 | 0.98 |
| IDV | 214 | 184 | 0.78 | 0.76 | 0.69 | 0.90 | 0.92 |
| LPV | 145 | 142 | 0.79 | 0.94 | 0.77 | 0.70 | 0.93 |
| NFV | 248 | 168 | 0.94 | 0.98 | 0.88 | 0.96 | 0.97 |
| SQV | 192 | 223 | 0.83 | 0.80 | 0.80 | 0.71 | 0.94 |
| TPV | 78 | 118 | 0.52 | 0.96 | 0.50 | 0.50 | 0.96 |
| DRV | 64 | 124 | 0.65 | 0.94 | 0.59 | 0.76 | 0.96 |
| Avg | 0.725 | 0.91 | 0.70 | 0.72 | 0.95 |
* Nr is the number of resistant instances in the dataset; Ns is the number of sensitive instances in the dataset.
The comparison of the prediction performance of our approach and some earlier developed approaches.
| Drug | BA (Our) | BA [ | BA [ | AUC (Our) | AUC [ | MCR * (Our) | MCR [ |
|---|---|---|---|---|---|---|---|
| 3TC | 0.81 | 0.89 | 0.9 | 0.97 | 0.94 | 7.29 | 3.87 |
| ABC | 0.85 | 0.85 | 0.69 | 0.92 | 0.92 | 6.8 | 6.53 |
| AZT | 0.86 | 0.89 | 0.70 | 0.94 | 0.91 | 13.96 | 36.19 |
| D4T | 0.87 | 0.75 | 0.76 | 0.94 | 0.90 | 10.01 | 7.31 |
| DDI | 0.82 | 0.68 | 0.75 | 0.91 | 0.85 | 10.90 | 8.05 |
| EFV | 0.88 | 0.902 | 0.84 | 0.96 | 0.93 | 18.08 | 16.08 |
| ETR | 0.93 | N/D | N/D | 0.93 | N/D | 10.01 | 6.58 |
| NVP | 0.91 | 0.91 | 0.91 | 0.94 | 0.92 | 12.7 | 24.87 |
| RPV | N/D | 0.89 | N/D | N/D | N/D | N/D | 1.55 |
| TDF | 0.78 | N/D | N/D | 0.92 | 0.83 | 12.3 | 5.39 |
| FPV | 0.78 | N/D | N/D | 0.92 | N/D | 15.8 | 16.08 |
| ATV | 0.79 | 0.87 | 0.71 | 0.93 | 0.93 | 26.2 | 26.69 |
| IDV | 0.92 | 0.89 | 0.75 | 0.98 | 0.97 | 8.2 | 34.29 |
| LPV | 0.82 | N/D | 0.77 | 0.94 | 0.96 | 23.8 | 9.79 |
| NFV | 0.90 | 0.89 | 0.76 | 0.96 | 0.94 | 7.15 | 25.23 |
| SQV | 0.90 | 0.88 | 0.75 | 0.96 | 0.96 | 11.15 | 30.37 |
| TPV | 0.78 | N/D | N/D | 0.87 | N/D | 4.77 | 9.07 |
| DRV | 0.79 | N/D | N/D | 0.92 | N/D | 2.38 | 2.98 |
| Avg | 0.854 | 0.857 | 0.78 | 0.94 | 0.92 | 11.85 | 15.05 |
* Misclassification Rate: calculated as non-concordant pairs between resistant/susceptible classes, obtained experimentally (Phenosense test system) and classes by prediction (the percentage); N/D: no data available.
The comparison of the prediction performance of our approach and the earlier developed approach.
| Drug | Random Forest (Our) | Decision Trees [ | R * | |||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Sns | Spc | PPV | BA | AUC | Sns | Spc | PPV | BA | ||
| FPV | 0.41 | 0.97 | 0.89 | 0.69 | 0.94 | 0.99 | 0.34 | 0.52 | 0.675 | 32 |
| ATV | 0.69 | 0.99 | 0.99 | 0.84 | 0.97 | 0.86 | 0.72 | 0.70 | 0.91 | 89 |
| IDV | 0.99 | 0.96 | 0.92 | 0.98 | 0.98 | 0.91 | 0.66 | 0.89 | 0.785 | 190 |
| LPV | 0.99 | 0.83 | 0.92 | 0.92 | 0.92 | 0.90 | 0.90 | 0.99 | 0.90 | 96 |
| NFV | 0.97 | 0.97 | 0.95 | 0.99 | 0.97 | 0.86 | 0.50 | 0.86 | 0.68 | 215 |
| SQV | 0.91 | 0.82 | 0.91 | 0.92 | 0.96 | 0.83 | 0.49 | 0.82 | 0.66 | 162 |
| TPV | 0.10 | 0.99 | 0.09 | 0.76 | 0.78 | 0.54 | 0.89 | 0.53 | 0.715 | 16 |
| DRV | 0.20 | 0.99 | 0.16 | 0.76 | 0.80 | 0.75 | 0.88 | 0.75 | 0.815 | 24 |
* R: number of sequences of resistant variants.
Figure 1The reciever operating characteristics (ROC) curves obtained for the results of the human immunodeficiency virus type 1 (HIV-1) resistance prediction to eight protease inhibitors: (a) fosamprenavir (FPV), (b) azatanavir (ATV), (c) indinavir (IDV), (d) lopinavir (LPV), (e) nelfinavir (NFV), (f) saquinavir (SQV), (g) tipranavir (TPV), (h) darunavir (DRV). X axis: False Positive Rate (FPR); Y axis: True Positive Rate (TPR); Color represents Weka threshold value set to get the value of FPR/YPR for each point. For blue color, threshold value is close to zero (“0”), for orange color, value it is close to one (“1”).
The number of amino acid sequences considered to belong to the resistant and susceptible variants.
| Drug | FR * | Total | Susceptible | Resistant |
|---|---|---|---|---|
| 3TC | 1.5 | 1727 | 635 | 1092 |
| ABC | 4.5 | 1655 | 1494 | 161 |
| AZT | 2.2 | 1747 | 1002 | 745 |
| D4T | 1.7 | 1755 | 1632 | 123 |
| DDI | 1.7 | 1756 | 1034 | 722 |
| EFV | 2.5 | 1378 | 1278 | 100 |
| ETR | 2.9 | 1836 | 1754 | 82 |
| NVP | 2.5 | 1844 | 962 | 882 |
| RPV ** | N/D | N/D | N/D | N/D |
| TDF | 1.5 | 1378 | 1218 | 160 |
| FPV | 20 | 1965 | 1614 | 351 |
| ATV | 2.2 | 1309 | 714 | 595 |
| IDV | 2.4 | 2007 | 1036 | 971 |
| LPV | 6.7 | 1693 | 917 | 717 |
| NFV | 3.6 | 2102 | 954 | 1148 |
| SQV | 2.07 | 2012 | 925 | 1087 |
| TPV | 1.2 | 1060 | 477 | 583 |
| DRV | 5.5 | 734 | 147 | 582 |
* FR corresponding to the clinical cut-off. ** N/D: no data available.
The number of nucleotide sequences considered to belong to the resistant and susceptible variants.
| Drug | FR * | Total | Susceptible | Resistant |
|---|---|---|---|---|
| 3TC | 1.5 | 720 | 74 | 646 |
| ABC | 4.5 | 740 | 181 | 563 |
| AZT | 2.2 | 718 | 272 | 446 |
| D4T | 1.7 | 723 | 258 | 465 |
| DDI | 1.7 | 720 | 123 | 597 |
| EFV | 2.5 | 744 | 353 | 391 |
| ETR | 2.9 | 193 | 57 | 136 |
| NVP | 2.5 | 756 | 316 | 440 |
| RPV | N/D | N/D | N/D | N/D |
| TDF | 1.5 | 423 | 234 | 189 |
| FPV | 20 | 774 | 666 | 108 |
| ATV | 2.2 | 352 | 150 | 202 |
| IDV | 2.4 | 795 | 367 | 428 |
| LPV | 6.7 | 614 | 332 | 282 |
| NFV | 3.6 | 833 | 342 | 491 |
| SQV | 2.07 | 827 | 445 | 382 |
| TPV | 1.2 | 196 | 101 | 96 |
| DRV | 5.5 | 165 | 139 | 26 |
* FR corresponding to the clinical cut-off. ** N/D: no data available.