| Literature DB >> 28394945 |
Alejandro Pironti1, Nico Pfeifer1, Hauke Walter2,3, Björn-Erik O Jensen4, Maurizio Zazzi5, Perpétua Gomes6,7, Rolf Kaiser8, Thomas Lengauer1.
Abstract
Antiretroviral treatment history and past HIV-1 genotypes have been shown to be useful predictors for the success of antiretroviral therapy. However, this information may be unavailable or inaccurate, particularly for patients with multiple treatment lines often attending different clinics. We trained statistical models for predicting drug exposure from current HIV-1 genotype. These models were trained on 63,742 HIV-1 nucleotide sequences derived from patients with known therapeutic history, and on 6,836 genotype-phenotype pairs (GPPs). The mean performance regarding prediction of drug exposure on two test sets was 0.78 and 0.76 (ROC-AUC), respectively. The mean correlation to phenotypic resistance in GPPs was 0.51 (PhenoSense) and 0.46 (Antivirogram). Performance on prediction of therapy-success on two test sets based on genetic susceptibility scores was 0.71 and 0.63 (ROC-AUC), respectively. Compared to geno2pheno[resistance], our novel models display a similar or superior performance. Our models are freely available on the internet via www.geno2pheno.org. They can be used for inferring which drug compounds have previously been used by an HIV-1-infected patient, for predicting drug resistance, and for selecting an optimal antiretroviral therapy. Our data-driven models can be periodically retrained without expert intervention as clinical HIV-1 databases are updated and therefore reduce our dependency on hard-to-obtain GPPs.Entities:
Mesh:
Substances:
Year: 2017 PMID: 28394945 PMCID: PMC5386274 DOI: 10.1371/journal.pone.0174992
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Fig 1Relationship between drug exposure, drug resistance, and therapeutic success.
a) Prior to drug exposure, the virus typically does not carry drug-resistance mutations. In the absence of drug pressure, drug-susceptible virus can replicate at high titers (dark-green viral particles). b) If drug susceptibility is given, antiretroviral therapy frequently leads to the suppression of viral replication, which is a prerequisite for therapeutic success. While antiretroviral therapy is administered, however, drug concentrations fluctuate over the dosing interval and may vary within the different body compartments (orange-yellow gradient). This can give rise to sub-inhibitory concentrations in some compartments (light-yellow area in gradient), resulting in the selection of mutations that confer to the virus a selective advantage in the presence of the drug (light-green viral particles). These mutations need not result in virological therapy failure, since they may not enable the virus to replicate at high drug concentrations. c) Recurrence of sub-inhibitory drug concentrations can ultimately select for mutations that enable the virus to replicate even at the highest drug concentrations (red viral particles). d) The selection of drug-resistant virus leads to virological therapy failure: the virus replicates at high titers in spite of antiretroviral therapy.
Dataset cheat sheet.
| Dataset | Description | Input Variables | Target Variables |
|---|---|---|---|
| PRRT | Protease and reverse-transcriptase sequences from the EIDB and the LANLSD, along with the drug compounds previously used by the patient at the time of sequencing. | Sequence of protease and reverse transcriptase | Binary drug-exposure label for each protease inhibitor or reverse-transcriptase inhibitor |
| IN | Integrase sequences from the EIDB and the LANLSD, along with the drug compounds previously used by the patient at the time of sequencing. | Sequence of integrase | Binary drug-exposure label for each integrase inhibitor |
| TP | Past drug compounds and sequences in PRRT and in IN that were obtained during therapy pause. | Sequence of protease and reverse-transcriptase or integrase | Binary drug-exposure label for each drug |
| TPRRT | Test set of protease and reverse-transcriptase sequences and drug-exposure information. | Sequence of protease and reverse transcriptase | Binary drug-exposure label for each protease inhibitor or reverse-transcriptase inhibitor |
| TIN | Test set of integrase sequences and drug-exposure information. | Sequence of integrase | Binary drug-exposure label for each integrase inhibitor |
| DPRRT | Development set of protease and reverse-transcriptase sequences and drug-exposure information. | Sequence of protease and reverse transcriptase | Binary drug-exposure label for each protease inhibitor or reverse-transcriptase inhibitor |
| DIN | Development set of integrase sequences and drug-exposure information. | Sequence of integrase | Binary drug-exposure label for each integrase inhibitor |
| EuResistTCE | Test set of TCEs. Each TCE contains a protease and reverse-transcriptase baseline sequence, the drug compounds that were used in the therapy, and a label indicating therapeutic success or failure. | Baseline protease and reverse-transcriptase sequence for therapy | Binary therapy-success label |
| EuResistTCETP | Test set of TCEs whose baseline sequences were obtained during a therapy pause. Each TCE contains a protease and reverse-transcriptase baseline sequence, the drug compounds that were used in the therapy, and a label indicating therapeutic success or failure. | Baseline protease and reverse-transcriptase sequence for therapy | Binary therapy-success label |
| HIVdbExposure | Test set of protease and reverse-transcriptase sequences and drug-exposure information. | Sequence of protease and reverse transcriptase | Binary drug-exposure label for each protease inhibitor or reverse-transcriptase inhibitor |
| HIVdbTCE | Test set of TCEs. Each TCE contains a protease and reverse-transcriptase baseline sequence, the drug compounds that were used in the therapy, and a label indicating therapeutic success or failure. | Baseline protease and reverse-transcriptase sequence for therapy | Binary therapy-success label |
| Pheno | Dataset of GPPs. | Protease, reverse-transcriptase or integrase sequence | Resistance factors for different drugs |
| TPheno | Test set of GPPs. | Protease, reverse-transcriptase or integrase sequence | Resistance factors for different drugs or resistance categories |
| DPheno | Development set of genotype-phenotype pairs. | Protease, reverse-transcriptase or integrase sequence | Resistance factors for different drugs or resistance categories |
| NaïvePRRT | Dataset of protease and reverse-transcriptase sequences from treatment-naïve patients without TDR mutations. | Sequence of protease and reverse transcriptase | None |
| NaïveIN | Dataset of integrase sequences from treatment-naïve patients without TDR mutations. | Sequence of integrase | None |
| Exposure | Cross-validation / development set for the compound | Protease, reverse-transcriptase or integrase sequence | Binary drug-exposure label for |
| ExposurenaïvePRRT | Cross-validation /development set for models discriminating sequences from treatment-exposed and treatment-naïve patients. | Protease and reverse-transcriptase sequence | Binary label indicating whether sequence was obtained from therapy-naïve patient |
| ExposurePheno | Cross-validation / development set for the compound | Protease, reverse-transcriptase or integrase sequence | Binary label indicating exposure or resistance to |
In the table above, the names of the datasets used in this study are tabulated along with a short description of their contents. The datasets are shown in order of appearance in Methods. Above, the term sequences refers to HIV-1 nucleotide sequences.
EIDB: EuResist Integrated Database; GPP: genotype-phenotype pair; LANLSD: Los Alamos National Laboratory Sequence Database; TCE: Therapy-Change Episode; TDR: transmitted drug resistance.
Number of nucleotide sequences by subtype and dataset.
| Subtype | PRRT | IN | HIVdbExposure |
|---|---|---|---|
| B | 42,634 (61%) | 2,721 (49%) | 1,377 (1%) |
| C | 6,243 (9%) | 1,293 (23%) | 1 (< 1%) |
| A1 | 3,704 (5%) | 166 (3%) | 1 (< 1%) |
| G | 3,223 (5%) | 270 (5%) | 0 (0%) |
| 02_AG | 3,010 (4%) | 66 (1%) | 1 (< 1%) |
| 01_AE | 4,275 (6%) | 596 (11%) | 1 (< 1%) |
| D | 1,169 (2%) | 53 (1%) | 1 (< 1%) |
| F1 | 971 (1%) | 69 (1%) | 0 (0%) |
| 06_cpx | 312 (< 1%) | 89 (2%) | 0 (0%) |
| 07_BC | 651 (1%) | 4 (< 1%) | 0 (0%) |
| Other | 4,112 (6%) | 196 (4%) | 2 (< 1%) |
| Total | 70,304 | 5,523 | 1,381 |
Nucleotide sequences in the PRRT, IN, and HIVdbExposure datasets were subtyped with the Comet subtyping tool. Sequence counts for the ten most frequent subtypes are tabulated above. For each dataset, the percentage of nucleotide sequences with a particular subtype are stated in parenthesis.
Number of sequences by dataset and drug exposure.
| DPRRT | DPRRT Comp. | DIN | DIN Comp. | TPRRT | TPRRT Comp. | TIN | TIN Comp. | TP | TP Comp. | HIVdbExposure | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 7,482 | 4,560 | 295 | 229 | 1,839 | 1,028 | 164 | 103 | 163 | 30 | 301 | |
| 18,542 | 12,184 | 441 | 336 | 3,895 | 2,405 | 222 | 135 | 372 | 68 | 1,075 | |
| 13,335 | 8,079 | 259 | 197 | 2,956 | 1,764 | 141 | 80 | 250 | 31 | 998 | |
| 4,007 | 2,341 | 57 | 45 | 1,114 | 750 | 52 | 40 | 71 | 6 | 297 | |
| 12,113 | 7,398 | 227 | 173 | 2,725 | 1,657 | 123 | 73 | 197 | 23 | 722 | |
| 4,580 | 3,258 | 359 | 266 | 900 | 595 | 162 | 112 | 52 | 8 | 59 | |
| 20,730 | 13,416 | 525 | 394 | 4,191 | 2,543 | 262 | 151 | 390 | 70 | 0 | |
| 9,546 | 6058 | 479 | 356 | 1,933 | 1,192 | 211 | 130 | 119 | 13 | 219 | |
| 118 | 56 | 5 | 2 | 96 | 58 | 26 | 18 | 6 | 1 | 73 | |
| 9,673 | 6,228 | 301 | 238 | 2,168 | 1,310 | 168 | 110 | 194 | 35 | 400 | |
| 255 | 169 | 62 | 51 | 145 | 94 | 70 | 48 | 4 | 3 | 1 | |
| 8,405 | 5,044 | 232 | 178 | 1,836 | 1,054 | 123 | 74 | 179 | 31 | 508 | |
| 5 | 4 | 2 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | |
| 1,240 | 615 | 48 | 37 | 463 | 216 | 41 | 31 | 38 | 3 | 192 | |
| 3,444 | 2,293 | 230 | 166 | 833 | 510 | 131 | 79 | 39 | 6 | 52 | |
| 916 | 587 | 152 | 95 | 328 | 200 | 111 | 68 | 16 | 6 | 4 | |
| 1,028 | 621 | 79 | 58 | 381 | 211 | 52 | 26 | 28 | 5 | 20 | |
| 9,466 | 5,965 | 184 | 150 | 2,134 | 1,433 | 112 | 84 | 144 | 20 | 737 | |
| 8,516 | 5,293 | 332 | 244 | 2,156 | 1,315 | 180 | 104 | 142 | 22 | 147 | |
| 7,540 | 4,669 | 137 | 104 | 1,698 | 1,018 | 101 | 68 | 113 | 17 | 706 | |
| 6,187 | 3,646 | 166 | 125 | 1,638 | 951 | 91 | 41 | 136 | 21 | 428 | |
| 643 | 345 | 71 | 58 | 246 | 153 | 62 | 39 | 9 | 0 | 5 | |
| 10 | 1 | 3 | 2 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | |
| 650 | 448 | 223 | 171 | 251 | 156 | 116 | 80 | 7 | 3 | 0 | |
| 37,577 | 37,577 | 1,917 | 1,917 | 2,056 | 2,056 | 154 | 154 | 3 | 3 | 0 | |
| 61,163 | 53,098 | 2,579 | 2,408 | 6,641 | 4,862 | 444 | 326 | 441 | 84 | 1,384 |
The numbers of sequences by drug exposure for the development and test datasets are tabulated above. Columns including the abbreviation Comp. in their headers indicate the numbers of sequences from a certain dataset and with a certain drug exposure whose complete drug exposure history is known. The complete drug exposure history for all sequences from the HIVdbExposure dataset is available.
3TC: lamivudine, ABC: abacavir, AZT: zidovudine, d4T: stavudine, ddC: zalcitabine, ddI: didanosine, FTC: emtricitabine, TDF: tenofovir, DLV: delavirdine, EFV: efavirenz, ETR: etravirine, NVP: nevirapine, RPV: rilpivirine, APV: amprenavir, ATV: atazanavir, DRV: darunavir, FPV: fosamprenavir, IDV: indinavir, LPV: lopinavir, NFV: nelfinavir, SQV: saquinavir, TPV: tipranavir, EVG: elvitegravir, RAL: raltegravir
Fig 2Drug-combination counts for therapies in EuResistTCE and HIVdbTCE.
The frequencies of the 20 most-frequent drug combinations in EuResistTCE (a) and HIVdbTCE (b) datasets are displayed above. 3TC: lamivudine, ABC: abacavir, AZT: zidovudine, d4T: stavudine, ddI: didanosine, FTC: emtricitabine, TDF: tenofovir, EFV: efavirenz, NVP: nevirapine, APV: amprenavir, ATV: atazanavir, DRV: darunavir, IDV: indinavir, LPV: lopinavir, NFV: nelfinavir, SQV: saquinavir.
Number of phenotypes by drug in the pheno datasets.
| Antivirogram | PhenoSense | Susceptible | Resistant | Total | |
|---|---|---|---|---|---|
| 905 | 1546 | 346 | 1362 | 2451 | |
| 840 | 1473 | 531 | 186 | 2313 | |
| 855 | 1567 | 801 | 773 | 2422 | |
| 889 | 1573 | 1031 | 60 | 2462 | |
| 821 | 451 | 371 | 47 | 1272 | |
| 891 | 1575 | 654 | 59 | 2466 | |
| 633 | 1234 | 850 | 33 | 1867 | |
| 1016 | 1638 | 794 | 1091 | 2654 | |
| 1106 | 1652 | 924 | 1127 | 2758 | |
| 363 | 476 | 304 | 156 | 839 | |
| 1170 | 1653 | 772 | 1447 | 2823 | |
| 91 | 176 | 62 | 75 | 267 | |
| 774 | 1134 | 401 | 978 | 1908 | |
| 282 | 629 | 400 | 178 | 911 | |
| 1088 | 1695 | 917 | 859 | 2783 | |
| 1151 | 1734 | 782 | 1229 | 2885 | |
| 1040 | 1468 | 665 | 1279 | 2508 | |
| 1185 | 1780 | 483 | 1584 | 2965 | |
| 1181 | 1741 | 985 | 1039 | 2922 | |
| 742 | 854 | 584 | 191 | 1596 | |
| 97 | 598 | 112 | 137 | 695 | |
| 97 | 630 | 336 | 148 | 727 | |
| 115 | 166 | 37 | 158 | 281 | |
| 107 | 166 | 60 | 25 | 273 | |
| 107 | 165 | 92 | 88 | 272 | |
| 110 | 168 | 122 | 6 | 278 | |
| 105 | 46 | 38 | 5 | 151 | |
| 111 | 168 | 72 | 7 | 279 | |
| 87 | 132 | 87 | 4 | 219 | |
| 126 | 169 | 81 | 125 | 295 | |
| 141 | 171 | 105 | 136 | 312 | |
| 43 | 52 | 36 | 15 | 95 | |
| 146 | 175 | 82 | 170 | 321 | |
| 14 | 21 | 13 | 10 | 35 | |
| 85 | 131 | 42 | 115 | 216 | |
| 22 | 79 | 50 | 20 | 101 | |
| 110 | 193 | 88 | 105 | 303 | |
| 125 | 194 | 76 | 142 | 319 | |
| 113 | 172 | 76 | 151 | 285 | |
| 127 | 199 | 48 | 189 | 326 | |
| 129 | 195 | 105 | 119 | 324 | |
| 80 | 106 | 56 | 22 | 186 | |
| 17 | 61 | 9 | 11 | 78 | |
| 17 | 65 | 36 | 21 | 82 | |
The numbers of phenotypes by drug in the DPheno and TPheno datasets are tabulated above. Phenotypes were measured with the Antivirogram™ or PhenoSense™ assays. Resistance-factor cutoffs one and ten were used for dichotomizing phenotypes into susceptible and resistant.
3TC: lamivudine, ABC: abacavir, AZT: zidovudine, d4T: stavudine, ddC: zalcitabine, ddI: didanosine, FTC: emtricitabine, TDF: tenofovir, DLV: delavirdine, EFV: efavirenz, ETR: etravirine, NVP: nevirapine, RPV: rilpivirine, APV: amprenavir, ATV: atazanavir, DRV: darunavir, FPV: fosamprenavir, IDV: indinavir, LPV: lopinavir, NFV: nelfinavir, SQV: saquinavir, TPV: tipranavir, EVG: elvitegravir, RAL: raltegravir
Fig 3Performance of drug-exposure prediction.
Performance of drug-exposure prediction was assessed with 10-fold cross validation on the development set and four test sets. Test sets TPRRT and TIN were obtained from the EuResist database and contain protease and reverse-transcriptase and integrase sequences, respectively. TP is a subset of TPRRT ∪ TIN and contains nucleotide sequences that were measured during therapy pauses. HIVdbExposure was obtained from the HIVdb TCE repository and contains protease and reverse-transcriptase sequences. Performance on the test sets was compared to that of geno2pheno[resistance]. Bars depicting mean performances were calculated only using drugs that are common to Exposure and ExposurePheno models, as well as to geno2pheno[resistance]. Error bars indicate the standard deviation. 3TC: lamivudine, ABC: abacavir, AZT: zidovudine, d4T: stavudine, ddC: zalcitabine, ddI: didanosine, FTC: emtricitabine, TDF: tenofovir, DLV: delavirdine, EFV: efavirenz, ETR: etravirine, NVP: nevirapine, RPV: rilpivirine, APV: amprenavir, ATV: atazanavir, DRV: darunavir, FPV: fosamprenavir, IDV: indinavir, LPV: lopinavir, NFV: nelfinavir, SQV: saquinavir, TPV: tipranavir, EVG: elvitegravir, RAL: raltegravir.
Fig 4Correlation of drug-exposure scores with logarithmized resistance factors.
Genotypes in TPheno were interpreted with drug-exposure models. The correlation of the resulting drug-exposure scores with the corresponding logarithmized resistance factors is displayed above. Note that drug-resistance assays (either Antivirogram™ or PhenoSense™) are denoted by the colors of the bars, while the drug-exposure model types (Exposure or ExposurePheno) are denoted by the shading of the bars. Bars depicting the mean performances were calculated with the drugs for which Exposure and ExposurePheno models are available. Error bars indicate the standard deviation. 3TC: lamivudine, ABC: abacavir, AZT: zidovudine, d4T: stavudine, ddC: zalcitabine, ddI: didanosine, FTC: emtricitabine, TDF: tenofovir, DLV: delavirdine, EFV: efavirenz, ETR: etravirine, NVP: nevirapine, RPV: rilpivirine, APV: amprenavir, ATV: atazanavir, DRV: darunavir, FPV: fosamprenavir, IDV: indinavir, LPV: lopinavir, NFV: nelfinavir, SQV: saquinavir, TPV: tipranavir, EVG: elvitegravir, RAL: raltegravir.
Performance of prediction of therapy-success for therapies in EuResistTCE, EuResistTCETP, and HIVdbTCE.
| Exposure | ExposurePheno | geno2pheno[resistance] | |
|---|---|---|---|
| 0.71 | 0.71 | 0.68 | |
| 0.72 | 0.73 | 0.66 | |
| 0.62 | 0.63 | 0.64 |
Therapy success was predicted for therapies in the EuResistTCE, EuResistTCETP, and HIVdbTCE test using three different genetic susceptibility scores (GSS) for each therapy. The first GSS was obtained with Exposure models, the second GSS with ExposurePheno models and the third GSS with geno2pheno[resistance]. Above, the performances of the three different GSS are tabulated for each dataset. Performances were quantified with the area under the receiver operating characteristic curve.