| Literature DB >> 18586740 |
Michal Rosen-Zvi1, Andre Altmann, Mattia Prosperi, Ehud Aharoni, Hani Neuvirth, Anders Sönnerborg, Eugen Schülter, Daniel Struck, Yardena Peres, Francesca Incardona, Rolf Kaiser, Maurizio Zazzi, Thomas Lengauer.
Abstract
MOTIVATION: Optimizing HIV therapies is crucial since the virus rapidly develops mutations to evade drug pressure. Recent studies have shown that genotypic information might not be sufficient for the design of therapies and that other clinical and demographical factors may play a role in therapy failure. This study is designed to assess the improvement in prediction achieved when such information is taken into account. We use these factors to generate a prediction engine using a variety of machine learning methods and to determine which clinical conditions are most misleading in terms of predicting the outcome of a therapy.Entities:
Mesh:
Substances:
Year: 2008 PMID: 18586740 PMCID: PMC2718619 DOI: 10.1093/bioinformatics/btn141
Source DB: PubMed Journal: Bioinformatics ISSN: 1367-4803 Impact factor: 6.937
Fig. 1.Number of IAS mutations in HIV genotypes submitted to drug therapies.
IDB data fields
| Field | [min max] | Corr (training data) | Corr (full labeled set) |
|---|---|---|---|
| NUMBER OF PAST TREATMENT LINES | [0 28] | –0.26 (8.0 × 10−42, 1.1 × 10−27) | –0.15 (0, 0) |
| RISKID | [1 6] | –0.04 (–, 4.3 × 10−4) | –0.03 (1.3 × 10−3, 0) |
| GENDERID | [1 4] | –0.02 (–, –) | 0.00 (–, –) |
| AGE | [1 102] | 0.01 (–, –) | 0.04 (1.0 × 10−7, 5.6 × 10−15) |
| BASELINE VL | [34 7656900] | 0.01 (–, 6.8 × 10−11) | 0.01 (–, 0) |
| BASELINE CD4 | [0 4411] | 0.06 (–, –) | 0.15 (0, 0) |
| BASELINE CD4 PERCENT | [0 70] | 0.06 (–, 9.0 × 10−3) | 0.17 (0, 0) |
| ALL TREATMENTS RECORDED | [0 1] | 0.10 (9.3 × 10−4, 5.4 × 10−3) | 0.12 (7.2 × 10−43, 0) |
| DATABASEID | [1 4] | 0.12 (7.4 × 10−9, 1.5 × 10−10) | 0.17 (0, 0) |
P-values are provided in parenthesis only if they are below 0.05, − stands for higher values.
aCategorical data, the χ2-test, second in parenthesis, is more informative.
Drugs and the associated success rate
| Drug | Class | SR with | SR without | χ2-test | |
|---|---|---|---|---|---|
| DDC | NRTI | 28 (343) | 0.36 (0.43) | 0.67 (0.69) | – (0) |
| APV | PI | 77 (344) | 0.36 (0.40) | 0.68 (0.69) | 2.8 × 10−6 (0) |
| SQV | PI | 195 (1560) | 0.50 (0.49) | 0.68 (0.70) | 4.4 × 10−5 (0) |
| RTV | PI | 130 (1035) | 0.56 (0.57) | 0.67 (0.69) | – (1.1 × 10−12) |
| D4T | NRTI | 715 (5949) | 0.57 (0.60) | 0.70 (0.72) | 4.6 × 10−9 (0) |
| NFV | PI | 235 (2329) | 0.57 (0.65) | 0.68 (0.69) | – (–) |
| IDV | PI | 193 (2322) | 0.58 (0.67) | 0.68 (0.69) | – (–) |
| DDI | NRTI | 864 (4943) | 0.59 (0.60) | 0.70 (0.71) | 1.2 × 10−5 (0) |
| NVP | NNRTI | 308 (2221) | 0.63 (0.68) | 0.67 (0.69) | – (–) |
| ABC | NRTI | 484 (3755) | 0.67 (0.75) | 0.67 (0.67) | – (0) |
| TDF | NRTI | 1053 (4711) | 0.70 (0.73) | 0.65 (0.67) | – (9.8 × 10−10) |
| TC3 | NRTI | 1676 (13168) | 0.70 (0.71) | 0.63 (0.64) | 2.8 × 10−3 (0) |
| RTVB | PI | 1600 (6868) | 0.72 (0.74) | 0.61 (0.66) | 6.6 × 10−7 (0) |
| LPV | PI | 1108 (4156) | 0.73 (0.73) | 0.64 (0.67) | 1.3 × 10−4 (2.9 × 10−10) |
| ATV | PI | 271 (1643) | 0.73 (0.81) | 0.66 (0.68) | – (0) |
| EFV | NNRTI | 526 (3524) | 0.73 (0.78) | 0.66 (0.67) | – (0) |
| FPV | PI | 118 (449) | 0.74 (0.73) | 0.67 (0.69) | – (–) |
| AZT | NRTI | 970 (7536) | 0.76 (0.74) | 0.63 (0.65) | 1.0 × 10−11 (0) |
| FTC | NRTI | 242 (1194) | 0.80 (0.88) | 0.66 (0.67) | 1.6 × 10−3 (0) |
– stands for P-value higher than 0.05, 0 stands for P-value lower than 10−20 and SR stands for the success rate with and without the drug.
Fig. 2.Bayesian network used in the GD engine.
Log VL prediction results
| Model | Correlation | Mean squared error | ||
|---|---|---|---|---|
| Train | Test | Train | Test | |
| GD | 0.658 (0.023) | 0.657 | 0.586 | 2.519 |
| EV | 0.679 (0.020) | 0.678 | 0.602 | 2.745 |
| ME | 0.664 (0.023) | 0.642 | 0.863 | 1.723 |
Binary prediction results
| Model | AUC | Accuracy | ||
|---|---|---|---|---|
| Train | Test | Train | Test | |
| Minimal feature set | ||||
| GD | 0.747 (0.027) | 0.744 | 0.745 (0.024) | 0.724 |
| EV | 0.766 (0.030) | 0.768 | 0.754 (0.031) | 0.748 |
| ME | 0.758 (0.019) | 0.745 | 0.748 (0.031) | 0.757 |
| Combined minimal | ||||
| Min | 0.771 (0.020) | 0.765 | 0.746 (0.027) | 0.761 |
| Max | 0.760 (0.023) | 0.765 | 0.742 (0.030) | 0.731 |
| Median | 0.773 (0.020) | 0.766 | 0.759 (0.027) | 0.766 |
| Mean | 0.777 (0.020) | 0.772 | 0.760 (0.024) | 0.744 |
| Majority | 0.683 (0.023) | 0.660 | 0.759 (0.027) | 0.738 |
| Product | 0.776 (0.020) | 0.772 | 0.759 (0.025) | 0.744 |
| Oracle | 0.914 (0.015) | 0.911 | 0.842 (0.025) | 0.844 |
| Maximal feature set | ||||
| GD | 0.768 (0.025) | 0.760 | 0.752 (0.028) | 0.757 |
| EV | 0.789 (0.023) | 0.804 | 0.780 (0.032) | 0.751 |
| ME | 0.762 (0.021) | 0.742 | 0.754 (0.030) | 0.757 |
| Combined maximal | ||||
| Min | 0.792 (0.021) | 0.793 | 0.760 (0.030) | 0.764 |
| Max | 0.779 (0.021) | 0.779 | 0.757 (0.030) | 0.741 |
| Median | 0.789 (0.029) | 0.786 | 0.768 (0.029) | 0.761 |
| Mean | 0.794 (0.019) | 0.793 | 0.780 (0.028) | 0.781 |
| Majority | 0.697 (0.027) | 0.683 | 0.768 (0.029) | 0.761 |
| Product | 0.794 (0.019) | 0.795 | 0.780 (0.027) | 0.771 |
| Oracle | 0.917 (0.013) | 0.920 | 0.850 (0.022) | 0.860 |
*Assuming that miraculously the combined engine knows to pick the engine with best result.
Fig. 3.Comparison of success/failure of the three engines.
Fig. 4.ROC curves for prediction engines on test data.
Drugs and mutations most related to inconsistency among engines
| Feature | Agreement/disagreement | Success/failure |
|---|---|---|
| D4T | –0.14 (3 × 10−9, 4 × 10−8) | –0.12 (9 × 10−7, 1 × 10−2) |
| DDI | –0.13 (1 × 10−7, 1 × 10−6) | –0.10 (3 × 10−3, –) |
| SQV | –0.12 (2 × 10−6, 2 × 10−5) | –0.10 (1 × 10−2, –) |
| NFV | –0.11 (6 × 10−4, 5 × 10−3) | –0.06 (– , –) |
| # IAS mutations | –0.24 (0, 0) | –0.32 (0, 0) |
| PRO L90M | –0.13 (4 × 10−7, –) | –0.22 (0, 3 × 10−12) |
| PRO L10I | –0.09 (2 × 102, –) | –0.21 (0, 3 × 10−13) |
| PRO M46I | –0.08 (–, –) | –0.19 (0,3 × 10−3) |
| RT T215Y | –0.18 (0,3.4 × 10−11) | –0.23 (0, 0) |
| RT M41L | –0.15 (5 × 10−11, 1 × 10−6) | –0.20 (0,0) |
| RT M184V | –0.14 (2 × 10−10, 7 × 10−8) | –0.13 (3 × 10−7, 4 × 10−5) |
| RT D67N | –0.14 (2 × 10−9,6 × 10−4) | –0.16 (7 × 10−14, 8 × 10−7) |
| RT K219Q | –0.12 (2 × 10−6, –) | –0.08 (–, –) |
| RT K70R | –0.11 (6 × 10−5, –) | –0.08 (–, –) |
| RT L210W | –0.10 (1 × 10−3, –) | –0.20 (–0, 9.4 × 10−11) |
| RT T215F | –0.10 (2 × 10−3, –) | –0.10 (3 × 10−3, –) |
| RT V118I | –0.09 (4 × 10−2, –) | –0.17 (2 × 10−15, 4 × 10−4) |