| Literature DB >> 22082002 |
J Nikolaj Dybowski1, Mona Riemenschneider, Sascha Hauke, Martin Pyka, Jens Verheyen, Daniel Hoffmann, Dominik Heider.
Abstract
BACKGROUND: Maturation inhibitors such as Bevirimat are a new class of antiretroviral drugs that hamper the cleavage of HIV-1 proteins into their functional active forms. They bind to these preproteins and inhibit their cleavage by the HIV-1 protease, resulting in non-functional virus particles. Nevertheless, there exist mutations in this region leading to resistance against Bevirimat. Highly specific and accurate tools to predict resistance to maturation inhibitors can help to identify patients, who might benefit from the usage of these new drugs.Entities:
Year: 2011 PMID: 22082002 PMCID: PMC3248369 DOI: 10.1186/1756-0381-4-26
Source DB: PubMed Journal: BioData Min ISSN: 1756-0381 Impact factor: 2.522
Area under the curve
| method | mean AUC | 95% CI |
|---|---|---|
| RF [ | 0.927 | 0.002 |
| RF.293 | 0.944 | 0.003 |
| RF.ESP | 0.898 | 0.006 |
| GAD | 0.947 | 0.004 |
| CE1.max | 0.946 | 0.004 |
| CE1.min | 0.943 | 0.006 |
| CE1.product | 0.947 | 0.004 |
| CE1.mean | 0.947 | 0.004 |
| CE2.max | 0.956 | 0.004 |
| CE2.min | 0.954 | 0.004 |
| CE2.product | 0.956 | 0.004 |
| CE2.mean | 0.955 | 0.004 |
| CE2.stacking | 0.933 | 0.005 |
| RF.ESP+293.max | 0.946 | 0.001 |
| RF.ESP+293.min | 0.945 | 0.001 |
| RF.ESP+293.product | 0.956 | 0.001 |
| RF.ESP+293.mean | 0.958 | 0.001 |
| RF.ESP+293.stacking | 0.930 | 0.006 |
| CE.GA | 0.954 | 0.006 |
| CE.GA.RF | 0.949 | 0.005 |
Results of the 10-fold leave-one-out cross validation. 95% CI: 95% confidence interval.
Figure 1ROC curves. Performance comparison of single descriptor classifiers (dashed lines) and fused classifiers (solid lines). The hydropathy scale classifier was used in the original study [10]. The combined classifiers CE2 and ESP+293 achieve an AUC higher than any of the single descriptor classifiers (compare Table 1). Despite the inferiority of ESP as a single classifier, fusion with 293 results in the best overall performance.
GAD values
| amino acid | value | amino acid | value |
|---|---|---|---|
| A | 0.0956 | L | 0.8577 |
| R | 0.8571 | K | 0.2695 |
| N | 0.9697 | M | 0.6212 |
| D | 0.1930 | F | 0.7062 |
| C | 0.2472 | P | 0.7154 |
| Q | 0.7865 | S | 0.5001 |
| E | 0.5843 | T | 0.2675 |
| G | 0.8036 | W | 0.4322 |
| H | 0.9811 | Y | 0.3246 |
| I | 0.3265 | V | 0.4513 |
The genetic algorithm optimized descriptor values for the 20 amino acids.
Correlation analyses
| descriptor | ||
|---|---|---|
| 42 | 0.835 | 0.497 |
| 124 | 0.945 | 0.122 |
| 134 | 0.893 | 0.010 |
| 136 | 0.891 | 0.018 |
| 137 | 0.883 | 0.019 |
| 164 | 0.833 | 0.085 |
| 225 | 0.817 | 0.068 |
| 293 | 1.000 | 1.000 |
| 368 | 0.855 | 0.010 |
| 424 | 0.863 | 0.500 |
| 478 | 0.822 | 0.009 |
Correlation between the best descriptor (293) and the other 10 descriptors in CE2 were calculated. cor.res: correlation R based on the votes for each protein sequence in the dataset; cor.des: correlation R based on the descriptor values for the 20 amino acids.
Figure 2Importance of sequence positions. Importance of sequence positions in p2 for prediction of Bevirimat resistance. The y-axis denotes the "sum of all decreases in Gini impurity" [11]. The upper horizontal axis indicates wild type sequence. A: importance analysis over all descriptors; B: importance analysis of CE2 descriptors; The red lines mark the importance analysis for RF.293.
Figure 3Electrostatic Hull Importance. Importance analysis and implications of the electrostatic potential classifier. (A) NMR structure of p2 [35]. (B) Electrostatic hull used as features for model training. Each sphere represents a feature on the hull, colored according to its importance for model performance (white to red). (C) Electrostatic potential at hull averaged over susceptible p2 models. (D) Electrostatic potential at hull averaged over resistant p2 models.
Figure 4Class Probabilities. Vertical and horizontal axis give class probabilities for the p2 sequences from the RF.293 and RF.ESP models, respectively. The dotted lines represent the regions where the resulting votes for a given sequence are to be found with 90% confidence. Confidence regions are shown for sequences where the standard deviation was greater than 0.01 for both models.
Figure 5Prediction Landscape. Vertical and horizontal axis give all potential class probabilities from the RF.293 and RF.ESP models, respectively. The color marks the output of the second-level learning (stacking).