Literature DB >> 20140064

A Rough Set-Based Model of HIV-1 Reverse Transcriptase Resistome.

Marcin Kierczak1, Krzysztof Ginalski, Michał Dramiński, Jacek Koronacki, Witold Rudnicki, Jan Komorowski.   

Abstract

Reverse transcriptase (RT) is a viral enzyme crucial for HIV-1 replication. Currently, 12 drugs are targeted against the RT. The low fidelity of the RT-mediated transcription leads to the quick accumulation of drug-resistance mutations. The sequence-resistance relationship remains only partially understood. Using publicly available data collected from over 15 years of HIV proteome research, we have created a general and predictive rule-based model of HIV-1 resistance to eight RT inhibitors. Our rough set-based model considers changes in the physicochemical properties of a mutated sequence as compared to the wild-type strain. Thanks to the application of the Monte Carlo feature selection method, the model takes into account only the properties that significantly contribute to the resistance phenomenon. The obtained results show that drug-resistance is determined in more complex way than believed. We confirmed the importance of many resistance-associated sites, found some sites to be less relevant than formerly postulated and-more importantly-identified several previously neglected sites as potentially relevant. By mapping some of the newly discovered sites on the 3D structure of the RT, we were able to suggest possible molecular-mechanisms of drug-resistance. Importantly, our model has the ability to generalize predictions to the previously unseen cases. The study is an example of how computational biology methods can increase our understanding of the HIV-1 resistome.

Entities:  

Keywords:  HIV-1 drug-resistance; bioinformatics; resistance model; viral complexity; viral proteomics

Year:  2009        PMID: 20140064      PMCID: PMC2808174          DOI: 10.4137/bbi.s3382

Source DB:  PubMed          Journal:  Bioinform Biol Insights        ISSN: 1177-9322


Introduction

More than two decades have passed since the discovery of HIV, the causative agent of AIDS. Numerous groups focused their research on understanding the details of HIV life cycle and on developing efficient antiviral therapies. Unfortunately, the high rate of replication combined with the high mutability of the virus leads to the rapid emergence of drug-resistant strains efficiently undermining the efforts to stop the AIDS pandemic. Currently, there are some 7,000 new HIV infections reported worldwide every day. In total, more than 30 million people in both the developed and the developing countries are HIV-positive.1 About 109 virions are produced in an infected individual every day and it has been estimated that each possible single-point mutation arises 104–105 times in this population.2 While some mutations result in the production of functionally-impaired viruses, other lead to the emergence of drug-resistant forms. Reverse transcriptase (RT) is one of the viral enzymes that are required for successful replication. The RT catalyzes reverse transcription, a process of transforming single-stranded viral RNA into double-stranded viral DNA. The viral DNA is later incorporated into the host genome and it re-programs the host cell to produce new viral particles that undergo maturation, bud off and infect new cells thus completing the viral life-cycle. In peripheral blood lymphocytes the maturation occurs after viral release while in macrophages it takes place prior to the release, within the cell, in the multivesicular bodies. Not unlike the other enzymes in the family of reverse transcriptases, the HIV-1 RT lacks proof-reading activity which, combined with the high replication rate of the virus and the RT-mediated recombination, leads to the rapid emergence of HIV mutants. Many of these mutants are drug-resistant. The first antiviral therapies were targeted against the RT and this enzyme still remains one of the most common targets for anti-HIV drugs. An initial hope that followed the introduction of AZT (Zidovudine), the first anti-viral agent targeting HIV, has been quickly shattered by the rapid emergence of drug-resistant viruses. Among the 25 drugs currently used in HIV therapy, 12 attempt at inhibiting the RT enzyme. There exist two groups of RT inhibitors, namely the nucleoside/nucleotide RT inhibitors (NRTI) and the non-nucleoside RT inhibitors (NNRTI). The former ones mimic dNTPs, the ordinary RT substrates but due to the lack of the 3’-OH group in the ribose ring they inhibit DNA chain elongation immediately after being incorporated. The mode of action of the NNRTI drugs is somewhat different since they bind in the so-called NNRTI-binding pocket of the RT and induce conformational changes that terminate the synthesis of the viral DNA. Various attempts have been undertaken to associate particular mutations in the RT sequence with the drug resistance level. Often, however, it is not a single mutation, but rather a non-linear combination of different mutations that leads to drug resistance. This increases the complexity of the problem and various machine learning techniques have been used in order to predict resistance from RT sequence. Drăghici and Potter3 have used neural networks to build a predictive model of HIV drug resistance to RT inhibitors. The commonly used Geno2Pheno tool4 relates sequence to resistance by using regression models. An international panel of experts semiannually releases a set of rules for predicting resistance.5 Similar approach has been used by Johnson et al6 Garriga and Menéndez-Arias7 released a tool that uses the available sets of expert-derived rules to predict resistance. In their interesting studies, Rhee et al8 use five different statistical learning methods (decision trees, neural networks, support vector regression, least-squares regression and least-angle regression) to model sequence-resistance relationship in HIV-1. A fresh and stimulating approach to the problem is presented in Kjaer et al9 where the authors propose to represent protein sequences in terms of physicochemical properties of amino acids. Recently, Prosperi et al10 published an interesting comparison of linear and non-linear machine learning techniques used in HIV resistome research. They conclude that fully data-driven models derived from large-scale data are promising as antiretroviral treatment decision support tools and postulate complementing sequence data sets with patient-derived data such as treatment history. Although the existing models were able to predict HIV-1 resistance to RT inhibitors, none of them provided any deeper insight into the underlying mechanisms in a physicochemical sense. There was also a lack of a method that would be able to predict resistance caused by a previously unseen mutation. In this paper we attempted at filling this gap by developing a computational model of HIV-1 resistance to several RT inhibitors. Rather than looking at mutating amino acids, we based our model on local physicochemical properties of a protein sequence. This approach, combined with the Monte Carlo feature selection and the rough set theory resulted in an interpretable high quality model of the RT resistome. The model consists of a number of general IF-THEN rules associating changes in the physicochemical properties of RT-sequence with drug resistance level, e.g.: IF (polarity at site 101 = (−∞, 2.100)) AND (normalized freq. of turn at site 190 = [0.045, ∞]) THEN resistant to Nevirapine This makes the model easy-to-interpret and generative and lets us believe that the presented approach will contribute to the development of new, more potent antiretroviral drugs.

Materials and Methods

Data

We used publicly available data obtained from Stanford HIV Drug Resistance Database.8 For each of the examined drugs we extracted a number of amino acid sequences of the HIV-1 RT p66 subunit. Each sequence in the database has been annotated with the resistance value relative to the HXB2 wild-type strain. Since Zhang et al11 have demonstrated that the Monograms PhenoSense is more reliable than other drug-resistance-testing assays and that it produces highly reproducible results, we used only the sequences with the resistance value determined using this method. In total, there were 781 sequences of the p66 subunit (91% of them complete within the first 240 aa sites, 31% of them complete within all the 560 aa sites) that we could use for constructing data sets. Following the established clinical practice, we labeled each sequence as “susceptible”, “moderately resistant” or “resistant”. We used cut-off values for the discretization as described in Rhee et al.8 The detailed distributions of the resistance classes per drug are presented in Table 1.
Table 1.

Number of resistance-annotated sequence examples per class.

ClassDrugNumber of examples
Total
Training set
Test set
SusceptibleModerately resistantResistantTotalSusceptibleModerately resistantResistantTotal
NRTIAbacavir159257150566396437140706
Didanosine2712564056767639139706
Lamivudine17295307574422376141715
Stavudine29518289566734522140706
Tenofovir18361312764615768344
Zidovudine274143147564683536139703
NNRTIDelavirdine35295132579872333143722
Nevirapine31643240599791059148747

Description of sequences

Kjaer et al9 have used 544 different physicochemical properties of amino acids obtained from the aaIndex database12 to describe HIV-1 protein sequences. Although we used the descriptors from the same database, our approach is different. Rather than constructing a large number of data sets, each based on a single physicochemical property, we constructed one data set per each antiviral drug and described each amino acid in a sequence by a vector of biologically relevant and interpretable properties. Following procedure described by Rudnicki and Komorowski,13 we extracted a number of biologically-meaningful descriptors from the aaIndex database. First, we selected descriptors that are representative for three broad biophysical categories: Transfer free energy from octanol to water14 for hydrophobicity; Normalized van der Waals volume15 for size; Isoelectric point16 for charge. These properties were fixed during the simulated annealing run. Than we added randomly four different properties and computed the sum of the r-square for all pairs of this set, which was used as a pseudo-energy measure. A single move in the simulation consisted of replacing one of the four random properties. Moves leading to the decrease of pseudo-energy were always accepted, and moves leading to the increase of pseudo-energy were accepted with the probability: where DE is the the increase of pseudo-energy, T is a pseudo-temperature and k is a scaling constant. The pseudo-temperature was slowly decreasing during simulation, from 1000 to 1, and the scaling constant was selected by trial and error. Ultimately, we selected seven relatively low-correlated (cf. Fig. 1) physicochemical descriptors that are presented in Table 2.
Figure 1.

Correlation matrix of physicochemical descriptors. The lower triangle contains bivariate scatter plots with a fitted line. The actual absolute values of the correlation are provided. The significance levels of the correlation are encoded in the following way: p <= 0.001(***); p <= 0.01(**); p <= 0.05(*); p <= 0.1(.).

Table 2.

Physicochemical descriptors of amino acids used in this study.

No.aaIndex11 codeAbbreviationDescriptor
1RADA880102E oct-wat.Transfer free energy from octanol to water14
2FAUJ880103vdW vol.Normalized van der Waals volume15
3ZIMJ680104isoel. PIsoelectric point16
4GRAR740102polarityPolarity17
5CRAJ730103freq. turnNormalized frequency of turn18
6BURA740101freq. helixNormalized frequency of alpha-helix19
7CHAM820102E sol. wat.Free energy of solution in water20
The selected properties let us represent each naturally occurring amino acid as a unique point in the coordinates frame spanned by them. After the description, each amino acid sequence in the data set was represented by 3,920 properties (560 aa × 7 properties). We described each site in an aa sequence as a difference between the vector representing the wild-type and the vector representing the observed amino acid. Therefore, if no mutation was observed at all, the site was described by the vector of seven zeroes. The final data sets were the ensembles of the described sequences annotated with the drug resistance values.

Monte Carlo feature selection

In order to select only the attributes (here the properties of 560 amino acids) that significantly contributed to drug resistance, we applied Monte Carlo Feature Selection (MCFS) method as described in Dramiński et al.21 In short, MCFS relies on the construction of a large number of decision trees. Trees are trained on different random subsets of attributes and different random subsets of objects. More precisely, out of all d features, we select s random subsets of m features, s and m being fixed, s being large and m ≪ d, and for each subset of features, t trees are constructed and their performance is assessed. Each of the t trees in the inner loop is trained and evaluated on a different, randomly selected training and test data sets. The evaluation results obtained from all s.t trees let one build a ranking of features reflecting their importance or, in other words, their discriminative power. In due course, the most informative features are selected with the help of a Student’s t-test. In this way, all non-informative features were removed from the initial data set. The results of the feature selection are presented in tables: Table 3–Table 10.
Table 3.

Sites selected by the MCFS as significant for resistance to Abacavir (NRTI). Only the top-scoring property is presented per site. Prevalence of mutations in the data and MCFS score are reported.

RankSitePropertyScorePrevalenceStatus
1P184E sol. wat.104.390.57Known for NRTIs (abacavir, didanosine, lamivudine)*
8P210freq. helix66.110.26Known for NRTIs (abacavir, stavudine, tenofovir, zidovudine)*
12P41isoel. point41.610.4Known for NRTIs (abacavir, didanosine, stavudine, tenofovir, zidovudine)*
16P215E oct-wat.34.390.54Known for NRTIs (abacavir, didanosine, stavudine, tenofovir, zidovudine)*
27P67vdW vol.18.340.11Known for NRTIs (abacavir, stavudine, tenofovir, zidovudine)*
32P151freq. turn14.550.04Known for NRTIs (abacavir, didanosine, lamivudine, stavudine, zidovudine)*
33P75vdW vol.14.120.09Known for other NRTIs (stavudine)+
36P74polarity130.11Known for NRTIs (abacavir, didanosine, tenofovir)*
37P219freq. helix12.790.27Known for other NRTIs (didanosine, stavudine, zidovudine)+
39P118E oct-wat.12.480.17Known but considered unimportant*
41P44vdW vol.12.180.1Known for other NRTIs (tenofovir)+
49P43freq. helix10.610.14Unknown+++
54P116freq. helix9.770.03Unknown+++
59P115isoel. point9.360.03Known for NRTIs (abacavir)*
78P228E oct-wat.7.990.14Unknown+++
79P65vdW vol.7.970.04Known for NRTIs (abacavir, didanosine, lamivudine, tenofovir)*
83P70freq. turn7.60.28Known for NRTIs (didanosine, stavudine, tenofovir, zidovudine)+
86P135freq. turn7.420.42Unknown+++
87P181freq. helix7.390.15Known for NNRTIs (efavirenz, etravirine, nevirapine)++
91P122isoel. point7.280.49Unknown+++

Symbols represent the status of a site:

Sites known to contribute to resistance to the particular drug;

Sites where mutations are associated with resistance to some NRTI drugs but not to Abacavir;

Sites where mutations contribute to resistance to NNRTI drugs;

Sites that are not included in the literature.5,6,30

Table 10.

Sites selected by the MCFS as significant for resistance to Nevirapine (NNRTI). Only the top-scoring property is presented per site. Prevalence of mutations in the data and MCFS score are reported.

RankSitePropertyScorePrevalenceStatus
1P103vdW vol.77.840.08Known for NNRTIs (efavirenz, nevirapine)*
4P181freq. turn57.50.15Known for NNRTIs (efavirenz, etravirine, nevirapine)*
9P190freq. turn43.420.11Known for NNRTIs (efavirenz, etravirine, nevirapine)*
22P100E sol. wat.9.990.04Known for NNRTIs (efavirenz, etravirine, nevirapine)*
23P101freq. helix9.330.09Known for NNRTIs (efavirenz, etravirine, nevirapine)*
29P188vdW vol.7.950.03Known for NNRTIs (efavirenz, nevirapine)*
34P211isoel. point6.430.52Unknown+++
35P379E oct-wat.6.330.02Unknown+++
36P98freq. helix6.210.13Known for NNRTIs (etravirine, nevirapine)*
38P102E oct-wat.60.15Unknown+++
39P184E oct-wat.5.970.53Known for NRTIs (abacavir, didanosine, lamivudine)++
44P179freq. turn5.660.14Known for NNRTIs (etravirine)+
46P74polarity5.510.11Known for NRTIs (abacavir, didanosine, tenofovir)++
51P106freq. turn5.390.04Known for NNRTIs (efavirenz, etravirine, nevirapine)*
56P468E oct-wat.5.280.03Unknown+++
63P357polarity4.90.06Unknown+++

Symbols represent the status of a site:

sites known to contribute to resistance to Nevirapine;

sites where mutations are associated with resistance to some NNRTI drugs but not to Nevirapine;

sites where mutations contribute to resistance to NRTI drugs;

sites that are not included in the literature.5,6,30

For the sake of comparison, the process of attributes-ranking differs between Breiman’s random forests (RF)22 and MCFS. In RF, the ranking is obtained by reshuffling the values of an attribute and observing the change in the quality of classification. In MCFS randomization test is done in a standard way by reshuffling decision labels. The importance of an attribute is determined by looking at the weighted accuracy related to randomization test-derived background. Another important difference between MCFS and RF is that while in the former individual trees are built on training samples drawn without replacement from the original set of samples (and are evaluated on the remaining samples) in the latter bootstrap techniques are used which rely on sampling with replacement. We perform feature selection on the whole entire data sets prior to splitting them into the training set and the test set. In our previous work,21 we argue in detail and show by examples that the MCFS provides a possibly objective ranking of features, independent of a classifier to be later used and pertaining only to the classification problem per se. In particular, using the MCFS does not lead to overfitting when proper classification is performed. At the same time, to benefit the most from the application of the MCFS, it should be performed on the largest available set of examples.

Rough sets

Rough set theory described in Pawlak23 has been introduced in the early eighties. It constitutes a mathematical framework particularly suitable for dealing with imprecise and incomplete data. In the rough set-based machine learning a set of minimal decision IF-THEN rules is inferred from a number of labelled examples. These rules constitute a model that can be used for assigning class labels to the previously unseen objects. The IF part of a rule is a conjunction of feature values and the THEN part is a disjunction of class labels. We used the ROSETTA24 implementation of the rough set theory in order to learn a number of IF-THEN rules that associate the MCFS-selected physicochemical properties of the amino acids of the HIV-1 RT with the resistance level. As it is required by the rough sets approach that all the features take discrete values, we first applied the entropy scaler and the equal frequency binning discretization algorithm. The process of inferring minimal sets of features (reducts) is computationally expensive. We used a genetic algorithm, a heuristic approach to finding approximate reducts. The obtained reducts let us infer a number of IF-THEN rules that link minimal combinations of amino acid properties with a resistance level. In order to make the model even more general, we applied a rule-generalization algorithm as described by Mąkosa.25 In short, a general rule is obtained by merging similar or partially redundant rules and on relaxing constraints imposed by them. For instance the following three rules (abbreviations explained in Table 2): are partially redundant and can be merged into one rule: IF P101 polarity ((–∞, 2.100)) AND P190 freq. turn ([0.045,∞)) AND P179 freq. turn((−∞, 0.70)) THEN resistant to Nevirapine IF P101 polarity([−1.800, 2.100)) AND P190 freq. turn([1.40,∞)) THEN resistant to Nevirapine IF P101 polarity((−∞, −0.500)) AND P190 freq. turn([0.045, 1.50)) AND P179 freq. turn((−∞, 0.70)) THEN resistant to Nevirapine IF P101 polarity ((−∞, 2.100)) AND P190 freq. turn ([0.045,∞)) AND P179 freq. turn((−∞, 0.70)) THEN resistant to Nevirapine Since the removal of the P179 freq. turn ((–∞, 0. 70)) part has very little effect on the accuracy of the rule, further simplification can be applied which results in the final rule: IF P101 polarity ((−∞, 2.100)) AND P190 freq. turn ((0.045,∞)) THEN resistant to Nevirapine By using general rules we minimized the risk of overfitting our model to the training data. The ensemble of this rules constitutes a model that can be used to predict resistance of new HIV-1 strains. Typically all the rules that constitute the model vote for the final decision. A threshold defining a minimal amount of votes necessary to label an object with a decision may result in multiple decisions for the same object. We would like to emphasize that the rules used by the model are inherently descriptive and can easily be analyzed by a domain expert. The description of the data is presented in Table 1. Table 12 provides the detailed description of the models.
Table 12.

Results of the 10-fold cross-validation and the external test obtained by using the set of standard and the set of generalized rules. The underlined value indicates the use of a negated classifier. SD stands for standard deviation and RMSE for root mean squared error (WEKA provides RMSE instead of SD). The highest accuracy and AUC values are in bold.

DrugModel resistance classCV Accuracy
Accuracy (external test set)
CV AUC
Rules
Standard
General
J48
StandardGeneralJ48Standard
General
J48
Standard
General
MeanSDMeanSDMeanRMSEMeanSDMeanSDMeanNumberNumber
NRTI drugs
AbacavirSusceptible0.920.050.890.070.89
Intermediate0.710.040.70.050.720.390.610.580.650.760.060.750.070.7418136611
Resistant0.840.050.820.060.79
DidanosineSusceptible0.830.060.820.060.8
Intermediate0.750.070.740.060.720.380.780.730.770.80.090.790.10.7519873444
Resistant0.910.110.90.110.85
LamivudineSusceptible0.950.020.940.030.94
Intermediate0.890.030.880.040.90.240.910.930.90.860.10.830.110.8323994312
Resistant0.980.020.970.020.97
StavudineSusceptible0.860.070.870.060.86
Intermediate0.720.040.740.050.740.370.810.80.710.780.060.790.050.7520031541
Resistant0.930.040.890.050.85
TenofovirSusceptible0.870.050.860.070.65
Intermediate0.780.060.760.070.690.410.730.560.720.780.050.750.070.610078256
Resistant0.880.090.820.190.74
ZidovudineSusceptible0.950.030.940.040.89
Intermediate0.750.040.750.050.660.420.770.710.710.780.060.760.070.6324975531
Resistant0.890.060.890.050.75
NNRTI drugs
DelavirdineSusceptible0.780.050.780.040.79
Intermediate0.690.060.690.060.690.410.710.670.690.610.070.60.10.5827143716
Resistant0.810.080.80.080.71
NevirapineSusceptible0.870.040.870.040.86
Intermediate0.770.030.780.020.830.310.760.760.770.60.160.480.140.6113970240
Resistant0.870.050.850.050.86

Validation

The validity of each model was determined in 10-fold cross-validation and in the so-called randomization test. In addition, the predictive quality of each general model was verified using an external test set. First, we randomly divided each data set into a training set and an external test set. Each training set contained 80% of the sequences from the original data set and the remaining 20% of the sequences constituted the external test set. Both the training and the test set had the same distribution of the decision class (resistance) as the original data. Subsequently, we performed 10-fold cross-validation on the training set. The training data were randomly divided into ten subsets of equal size, Di, i = 1, 2, …, 10. We then generated ten new training sets of sequences (Ni) by sequentially removing one of the Di subsets from the original training set. Thus, the N1 data set contained all the data but the D1 subset, the N2 data set contained all the data but the D2 and so forth. Thereafter we used each of the Ni training sets to build a rough set-based classifier. The classifier was then used to classify the objects from the remaining Di subset. Therefore each sequence from the original data set was present once in a test set and nine times in a training set. In order to assess the probability that the obtained results could have been generated by random data, we constructed additional 1000 data sets per model by randomly permuting the decision in the original data set. Thus, we broke correspondence between the sequence and the resistance value. Each of the 1000 randomized data sets was evaluated using 10-fold cross-validation. Ultimately, we were using all the sequences from the original data set to train a rough set-based classifier and validated the predictions on the external test set. The performance of the models was validated using prediction accuracy and the area under the ROC (or Receiver Operating Characteristic) curve AUC. The accuracy, equal to a fraction of correctly classified sequences, was measured by its mean value for the cross-validated experiments and, finally, by its measurement on the external test. The AUC was measured by its mean for the cross-validated experiments. For a two-class classification task, the ROC curve accounts for an uneven distribution of the decision classes in the original data set and visualizes the behavior of the classifier at different sensitivity to specificity ratios. Sensitivity is defined as a ratio between true positive predictions and the total number of positives. Specificity is a ratio between true negative predictions and the total number of negative examples. The ROC curve is constructed by plotting sensitivity vs. 1-specificity. The AUC value is an integral over the ROC curve. For a perfect binary classifier we have AUC = 1.0 whereas for a random classifier AUC = 0.5. Since in our case the decision takes three distinct resistance values: “susceptible”, “moderately resistant” and “resistant”, we provide a separate AUC value for each class by treating the two remaining classes as one. For instance, to calculate an AUC value for the class “susceptible”, we consider both the “moderately resistant” and the “resistant” as a new “non-susceptible” class. At last, we used the results of the randomization tests to compute a kind of p-values, i.e. the probability that the relationships found in the original data arose by pure chance. Our computations were based on the assumption that the AUCs obtained in the randomization test are normally distributed. The normality was assessed by examining the so-called Q-Q plots and applying Shapiro-Wilk test for normality. Subsequently we used Student’s t-test to obtain the p-values. In addition, we compared the performance of our models with the performance of their standard decision tree-based counterparts with mutations represented by one-letter aa codes. We used J48 algorithm as provided in the WEKA26 suite to derive the decision tree models.

Results and Discussion

Application of the Monte Carlo feature selection method combined with a rough set-based approach resulted in statistically sound, interpretable and generative rule-based models of the RT sequence-resistance relationship. The models can be used to predict HIV-1 resistance to six different NRTI drugs and two NNRTIs. By representing mutating amino acids in terms of physicochemical changes, the models gained generality and can be used to predict resistance for previously unseen mutants. Let us assume that only the following amino acids have been observed at site 101: A, E, H, K, P, Q, R, S, insertion, and that this observation led to the following rule: IF (polarity at site 101 = (−∞, 2.100)) THEN resistant to Nevirapine Now, if the model is asked to predict whether a newly observed mutation to asparagine at site 101 will result in drug resistance, the polarity value for asparagine, (polarityN = 11.60) will be substituted to the rule and the prediction will be “Resistant to NVP”. At the first step, each RT sequence was represented by 3,920 properties. Application of the MCFS led to a significant reduction of this number (see Table 3–Table 10). It was already at this point that we have discovered that mutations at several, previously unnoticed sites contribute to drug resistance. There are 5 such sites for Abacavir, 5 for Didanosine, 4 for Lamivudine, 8 for Stavudine, 6 for Tenofovir, 6 for Zidovudine, 10 for Delavirdine and 5 for Nevirapine. Apart from these, there are several sites where mutations were previously associated with resistance to some drugs, but our results suggest that also resistance to other drugs may be induced by them. We speculate that mutations at the newly discovered sites may be either directly responsible for drug-resistance or may play compensatory role by accompanying other drug-resistance mutations and diminishing their negative effects, e.g. the decreased replication rate. Table 11 presents sites that are included in various sets of rules for predicting drug resistance5,6 but were not selected as significant by the MCFS method. The missed sites are either underrepresented in the data sets or their influence on drug-resistance is much weaker than previously assumed. This issue has to be investigated further.
Table 11.

Sites mentioned in5,6 but not selected as significant by the MCFS method are marked with “X”.

SiteDrug
Description
ABCddI3TCd4TTDFAZTNVP
P62XXXXXPart of the 69 multi-resistance complex and of the 151 multi-resistance complex. Included in6 only.
P69XXXPart of the 69 multi-resistance complex.
P70XPart of the 69 multi-resistance complex and the TAM complex.
P77XXXXXPart of the 151 multi-resistance complex.
P108XIncluded in6 only.
P116XPart of the 151 multi-resistance complex.
P151XPart of the 151 multi-resistance complex.
P219XPart of the 69 multi-resistance complex and the TAM complex.
P230XIncluded in5 only.

Abbreviations: ABC, abacavir; ddI, didanosine; 3TC, lamivudine; d4T, stavudine; AZT, zidovudine; NVP, nevirapine. Delavirdine is not included in the articles.

Following the feature-selection step, we applied rough set approach to build rule-based models of HIV-1 resistance to drugs. We used two different sets of parameters leading either to very specific or to more general rules that underly a model. Prior to model-building, we excluded 20% of the available examples from each data set in order to use them for independent validation. We used the remaining data for model-construction. We validated our models in 10-fold cross-validation and used area under ROC curve to measure their performance. All the models showed good results with accuracy varying from 69% for Delavirdine to 89% for Lamivudine when using specific sets of rules and from 69% for Delavirdine to 88% for Lamivudine when using generalized rules. Similarly, the corresponding AUC values were high in the majority of the models (cf. Table 12). In some cases, e.g. the resistance-to-Nevirapine model that was based on general rules, we observed low AUC values for the “moderately resistant” class. This may be due to the fact that the artificially set threshold values and the arbitrary split into three resistance classes is not completely reflected in real mutation patterns. Generalization of the rules did not lead to any significant deterioration of the classification quality.25 At the same time it reduced the number of rules by an order of magnitude. Models built on general rules are smaller, less sensitive to overtraining and easier to analyze. Finally, we validated each model on an external test set (20% of the available examples). In addition, we compared the performance of our models to the standard decision tree-based models. The decision trees performed similarly to their rough set-based counterparts but at the same time they were less stable. The decision tree-based models derived with no feature selection step loose generality and an important interpretational layer. The results are summarized in Table 12. We also compared our model based on generalized rules with the model described by the domain expert rules5 (cf. Table 13). For both sets of rules, we computed coverage and accuracy. In the case of the domain expert rules, we could use the entire data sets for the computation while in the case of the rough set model, we used only the test sets to avoid the possible bias caused by the fact that the rules were derived from the training data. Therefore for our model, we provide only a pessimistic estimates of accuracy and coverage. While accurate, expert rules are applicable only to a very limited fraction of examples. The generalized rules that underlie our model have significantly higher coverage.
Table 13.

The coverage and the accuracy of the rules. For expert rules we compute accuracy and coverage using all the available examples. The “moderately resistant” cases are treated as “resistant”. In the case of rule-based model we compute accuracy and coverage using only the test set examples. This gives pessimistic assessment of both the measures but enables one to avoid possible bias coming from the fact that the rules were derived from the training set. The underlined value indicate that the classifier was negated.

DrugExpert rules5
Rough set rule-based model
CoverageAccuracyCoverageAccuracy
Abacavir0.290.950.850.58
DelavirdineNo rulesNo rules0.990.67
Didanosine0.320.780.990.73
Lamivudine0.580.9810.67
Nevirapine0.40.990.990.76
Stavudine0.590.780.980.8
Tenofovir0.440.570.730.56
Zidovudine0.580.830.980.71
Importantly, our generalized rules are conjuncts of the values (intervals of values) of physicochemical properties of amino acids. This allows seeing which amino acids fulfill the criteria imposed by a given rule, also when such amino acids were not represented in the training set. Given the following rule: IF P101 polarity ((−∞, 2.100)) AND P190 freq. turn ([0.045,∞)) THEN resistant to Nevirapine we can easily find which amino acids satisfy the conditions and substitute them into the rule: IF P101(any of: D, E, H, K, N, Q, R) AND P190(any but: A, G, N, P, Y) THEN resistant to Nevirapine Even though asparagine (N) was not observed at site 101 in the available data, our general model is able to foresee that an occurrence of such a mutation may result in the acquisition of resistance. Such an approach already proved to be successful in revealing mechanisms underlying resistance to protease inhibitors.27 Figure 2 and Figure 3 present an instance of analysis of the strongest rules determining resistance to Abacavir and Nevirapine respectively. For more details see Supplementary Material, Figure S1–S7.
Figure 2.

The strongest rules determining resistance to Abacavir. Amino acids are encoded using standard one-letter abbreviations. # indicates insertion of any type; “AA” is an amino acid observed in the data in the given resistance class; “[AA]” represents an amino acid observed in the data, but in the other resistance class and “aa” denotes an amino acid not observed in the data. “LHS support” is a number of examples satisfying the rule.

Figure 3.

The strongest rules determining resistance to Nevirapine. Amino acids are encoded using standard one-letter abbreviations. # indicates insertion of any type; “AA” is an amino acid observed in the data in the given resistance class; “[AA]” represents an amino acid observed in the data, but in the other resistance class and “aa” denotes an amino acid not observed in the data. “LHS support” is a number of examples satisfying the rule.

All the remaining sets of rules were included in the online supplementary material. Detailed analysis indicates that although amino acids at these newly discovered positions interact directly neither with nucleic acid nor with the ABC triphosphate (ABCTP), the detected mutations may disturb the complex network of hydrophobic and polar interactions responsible for the stability of the tertiary structure. This may lead to subtle structural changes in the relative orientation of the domains and active site architecture, preventing ABCTP binding in a catalytically competent configuration. However, it seems that these small structural changes do not prevent the ability of a drug-resistant enzyme to incorporate normal nucleotides in the catalyzed reaction. There are 10 sites (98, 100, 101, 103, 106, 108, 181, 188, 190 and 230) that experts have associated with the resistance to Nevirapine. The model finds all these important (except the 108 and the 230 site) and pinpoints six other sites as significant (102, 211, 357, 379, 401 and 468). None of these was previously associated with resistance. Additionally, sites 74 and 184 associated so far only with resistance to NRTI drugs and site 179 previously connected to resistance to the other NNRTI drugs, transpired to play significant role in acquiring the resistance to Nevirapine. Since the training data does not contain any information on the history of treatment, some of the newly discovered sites might have emerged as a result of the past therapies. For instance, sites 74 and 184 known to contribute to resistance to NRTI drugs were selected as important to the resistance to Nevirapine which is a NNRTI drug. Therefore their role in the resistance to Nevirapine should be further investigated. Similarly, sites that are often mutated in other HIV subtypes28–30 (e.g. 35, 43, 122, 123, 135, 200, 211) should be treated with caution. While Kearney et al28 consider sites 35, 83, 122, 123, 135, 200 and 211 as “non-resistance polymorphic”, Kantor and Katzenstein29 suggest that mutations at these sites (in particular 43 and 211) may play a significant role in drug resistance evolution and increase viral fitness. Site 118 that our method selected as important to resistance to some NRTI drugs was previously considered important but in 2005 was removed from the list of resistance-inducing mutations.31 The remaining sites discovered by our method yet not included in the expert rules5,6,30 deserve further attention. Indeed, mutations at sites 208, 218 and 228 have even been previously suspected32 to contribute to resistance. The presented predictive models are derived from a large, although limited number of training examples. Even a very large number of examples would not guarantee that they cover all possible sorts of mutations. A particular advantage of rough sets is the ability to deal with contradictions. A rule that classifies an object to e.g. the “susceptible OR resistant” class is actually very useful since it indicates that, with the present knowledge, the object can belong any of these classes. If such rule has a significant coverage, it suggests the directions of further research. This ability is especially important in the context of medical applications where it is more desirable to perform additional examination than misclassifying the case. While statistically sound, our findings should be subjected to further experimental validation and we see them as a navigational aid for clinicians and molecular biologists.

Conclusion

The presented approach led us to the in silico discovery of several previously unknown mutations that contribute to resistance to RT inhibitors. Moreover, we discovered the exact values of the biochemical properties that will lead to resistance. This extends applicability of our model to previously unseen cases. Last, but not least, this approach can be applied to a wide class of similar problems, such as analysis of influenza neuramidase-mutants resistant to drugs, protein engineering or efficient drug design.
Table 4.

Sites selected by the MCFS as significant for resistance to Delavirdine (NNRTI). Only the top-scoring property is presented per site. Prevalence of mutations in the data and MCFS score are reported.

RankSitePropertyScorePrevalenceStatus
1P151E oct-wat.42.480.04Known for NRTIs (abacavir, didanosine, lamivudine, stavudine, zidovudine)*
4P184vdW vol.36.060.56Known for NRTIs (abacavir, didanosine, lamivudine)*
7P41isoel. point35.550.4Known for NRTIs (abacavir, didanosine, stavudine, tenofovir, zidovudine)*
10P210E sol. wat.27.360.26Known for NRTIs (abacavir, stavudine, tenofovir, zidovudine)+
21P75freq. helix22.540.09Known for NRTIs (stavudine)+
26P215E oct-wat.20.150.54Known for NRTIs (abacavir, didanosine, stavudine, tenofovir, zidovudine)*
28P118E oct-wat.18.880.17Known but considered unimportant*
29P116freq. helix18.380.03Unknown+++
32P74polarity17.360.11Known for NRTIs (abacavir, didanosine, tenofovir)*
58P65isoel. point9.510.04Known for NRTIs (abacavir, didanosine, lamivudine, tenofovir)*
60P44E oct-wat.9.370.1Known for NRTIs (tenofovir)+
64P67vdW vol.7.940.11Known for NRTIs (abacavir, stavudine, tenofovir, zidovudine)+
68P43freq. helix7.270.14Unknown+++
76P218polarity6.610.08Unknown+++
83P228freq. turn6.230.14Unknown+++
85P219E sol. wat.5.950.27Known for NRTIs (didanosine, stavudine, zidovudine)*
86P211freq. turn5.680.54Unknown+++

Symbols represent the status of a site:

Sites known to contribute to resistance to Delavirdine;

Sites where mutations are associated with resistance to some NNRTI drugs but not to Delavirdine;

Sites where mutations contribute to resistance to NRTI drugs;

Sites that are not included in the literature.5,6,30

Table 5.

Sites selected by the MCFS as significant for resistance to Lamivudine (NRTI). Only the top-scoring property is presented per site. Prevalence of mutations in the data and MCFS score are reported.

RankSitePropertyScorePrevalenceStatus
1P184vdW vol.407.380.56Known for NRTIs (abacavir, didanosine, lamivudine)*
8P67E oct-wat.25.350.11Known for NRTIs (abacavir, stavudine, tenofovir, zidovudine)+
12P41isoel. point21.620.4Known for NRTIs (abacavir, didanosine, stavudine, tenofovir, zidovudine)+
13P215vdW vol.a21.290.54Known for NRTIs (abacavir, didanosine, stavudine, tenofovir, zidovudine)+
18P75freq. turn18.470.09Known for NRTIs (stavudine)+
21P210E oct-wat.16.230.26Known for NRTIs (abacavir, stavudine, tenofovir, zidovudine)+
25P65E oct-wat.14.520.04Known for NRTIs (abacavir, didanosine, lamivudine, tenofovir)*
44P44E oct-wat.9.40.1Known for NRTIs (tenofovir)+
50P118E oct-wat.7.070.17Known but considered unimportant*
51P228E oct-wat.70.14Unknown+++
57P83E sol. wat.6.030.15Unknown+++
62P211vdW vol.5.660.54Unknown+++
64P70isoel. point5.60.28Known for NRTIs (didanosine, stavudine, tenofovir, zidovudine)+
65P122vdW vol.5.580.48Unknown+++
66P181aisoel. point5.570.15Known for NNRTIs (efavirenz, etravirine, nevirapine)++

Symbols represent the status of a site:

Sites known to contribute to resistance to Lamivudine;

Sites where mutations are associated with resistance to some NRTI drugs but not to Lamivudine;

Sites where mutations contribute to resistance to NNRTI drugs;

Sites that are not included in the literature.5,6,30

Table 6.

Sites selected by the MCFS as significant for resistance to Stavudine (NRTI). Only the top-scoring property is presented per site. Prevalence of mutations in the data and MCFS score are reported.

RankSitePropertyScorePrevalenceStatus
1P215E oct-wat.101.720.54Known for NRTIs (abacavir, didanosine, stavudine, tenofovir, zidovudine)*
3P210isoel. point82.640.26Known for NRTIs (abacavir, stavudine, tenofovir, zidovudine)*
11P67vdW vol.59.610.11Known for NRTIs (abacavir, stavudine, tenofovir, zidovudine)*
14P41isoel. point50.640.4Known for NRTIs (abacavir, didanosine, stavudine, tenofovir, zidovudine)*
27P151vdW vol.26.31A0.04Known for NRTIs (abacavir, didanosine, lamivudine, stavudine, zidovudine)*
29P75polarity22.940.09Known for NRTIs (stavudine)*
30P208isoel. point22.440.1Unknown+++
31P118freq. helix22.240.17Known but considered unimportant*
33P44E oct-wat.21.590.1Known for NRTIs (tenofovir)+
35P69E oct-wat.21.020.15Known for NRTIs (abacavir, didanosine, lamivudine, stavudine, tenofovir, zidovudine)*
47P219freq. turn18.150.27Known for NRTIs (didanosine, stavudine, zidovudine)*
48P70vdW vol.17.830.28Known for NRTIs (didanosine, stavudine, tenofovir, zidovudine)*
52P116polarity16.930.03Unknown+++
60P43freq. helix15.780.14Unknown+++
94P218freq. helix8.310.08Unknown+++
96P228isoel. point7.830.14Unknown+++
100P203freq. turn7.450.12Unknown+++
101P122vdW vol.7.170.48Unknown+++
103P184E sol. wat.6.630.56Known for NRTIs (abacavir, didanosine, lamivudine)+
110P211polarity5.90.54Unknown+++
111P62E sol. wat.5.850.05Part of the multi-nRTi resistance complex. Affects all NRTIs except Tenofovir*

Symbols represent the status of a site:

Sites known to contribute to resistance to Stavudine;

Sites where mutations are associated with resistance to some NRTI drugs but not to Stavudine;

Sites where mutations contribute to resistance to NNRTI drugs;

Sites that are not included in the literature.5,6,30

Table 7.

Sites selected by the MCFS as significant for resistance to Tenofovir (NRTI). Only the top-scoring property is presented per site. Prevalence of mutations in the data and MCFS score are reported.

RankSitePropertyScorePrevalenceStatus
1P215E oct-wat.37.660.53Known for NRTIs (abacavir, didanosine, stavudine, tenofovir, zidovudine)*
3P184E oct-wat.26.410.49Known for NRTIs (abacavir, didanosine, lamivudine)+
10P67vdW vol.17.810.12Known for NRTIs (abacavir, stavudine, tenofovir, zidovudine)*
13P210isoel. point15.270.29Known for NRTIs (abacavir, stavudine, tenofovir, zidovudine)*
19P41polarity13.950.37Known for NRTIs (abacavir, didanosine, stavudine, tenofovir, zidovudine)*
27P75E oct-wat.12.040.09Known for NRTIs (stavudine)+
30P203freq. turn10.620.15Unknown+++
31P65isoel. point10.460.06Known for NRTIs (abacavir, didanosine, lamivudine, tenofovir)*
38P219E sol. wat.9.280.31Known for NRTIs (didanosine, stavudine, zidovudine)+
47P43freq. helix8.090.14Unknown+++
48P44E oct-wat.8.070.11Known for NRTIs (tenofovir)*
56P35polarity6.930.28Unknown+++
57P69vdW vol.6.890.16Known for NRTIs (abacavir, didanosine, lamivudine, stavudine, tenofovir, zidovudine)*
58P101freq. helix6.740.12Known for NNRTIs (efavirenz, etravirine, nevirapine)++
65P74E oct-wat.6.260.16Known for NRTIs (abacavir, didanosine, tenofovir)*
74P70isoel. point5.850.28Known for NRTIs (didanosine, stavudine, tenofovir, zidovudine)*
77P200polarity5.380.31Unknown+++
91P135polarity4.710.38Unknown+++
94P208isoel. point4.640.11Unknown+++

Symbols represent the status of a site:

Sites known to contribute to resistance to Tenofovir;

Sites where mutations are associated with resistance to some NRTI drugs but not to Tenofovir;

Sites where mutations contribute to resistance to NNRTI drugs;

Sites that are not included in the literature.5,6,30

Table 8.

Sites selected by the MCFS as significant for resistance to Zidovudine (NRTI). Only the top-scoring property is presented per site. Prevalence of mutations in the data and MCFS score are reported.

RankSitePropertyScorePrevalenceStatus
1P215polarity173.430.54Known for NRTIs (abacavir, didanosine, stavudine, tenofovir, zidovudine)*
6P67isoel. point78.240.11Known for NRTIs (abacavir, stavudine, tenofovir, zidovudine)*
11P41isoel. point58.560.4Known for NRTIs (abacavir, didanosine, stavudine, tenofovir, zidovudine)*
19P210isoel. point41.710.26Known for NRTIs (abacavir, stavudine, tenofovir, zidovudine)*
25P70isoel. point28.080.28Known for NRTIs (didanosine, stavudine, tenofovir, zidovudine)*
28P219isoel. point23.910.27Known for NRTIs (didanosine, stavudine, zidovudine)*
37P75polarity18.150.09Known for NRTIs (stavudine)+
46P184E oct-wat.13.690.56Known for NRTIs (abacavir, didanosine, lamivudine)+
48P69E oct-wat.11.880.15Known for NRTIs (abacavir, didanosine, lamivudine, stavudine, tenofovir, zidovudine)*
56P151polarity9.560.04Known for NRTIs (abacavir, didanosine, lamivudine, stavudine, zidovudine)*
57P228vdW vol.9.550.14Unknown+++
62P43freq. helix8.640.14Unknown+++
63P203freq. turn8.630.12Unknown+++
64P116vdW vol.8.20.03Unknown+++
71P74isoel. point7.290.11Known for NRTIs (abacavir, didanosine, tenofovir)+
72P44vdW vol.7.270.1Known for NRTIs (tenofovir)+
74P208isoel. point7.210.1Unknown+++
76P35freq. turn7.050.28Unknown+++

Symbols represent the status of a site:

Sites known to contribute to resistance to Zidovudine;

Sites where mutations are associated with resistance to some NRTI drugs but not to Zidovudine;

Sites where mutations contribute to resistance to NNRTI drugs;

Sites that are not included in the literature.5,6,30

Table 9.

Sites selected by the MCFS as significant for resistance to Didanosine (NRTI). Only the top-scoring property is presented per site. Prevalence of mutations in the data and MCFS score are reported.

RankSitePropertyScorePrevalenceStatus
1P103vdW vol.134.050.07Known for NNRTIs (efavirenz, nevirapine)+
8P181freq. turn50.790.15Known for NNRTIs (efavirenz, etravirine, nevirapine)+
15P100E sol. wat.12.70.04Known for NNRTIs (efavirenz, etravirine, nevirapine)+
21P211isoel. point7.450.51Unknown+++
22P101vdW vol.6.710.09Known for NNRTIs (efavirenz, etravirine, nevirapine)+
23P190Polarity6.620.11Known for NNRTIs (efavirenz, etravirine, nevirapine)+
26P74Polarity5.610.11Known for NRTIs (abacavir, didanosine, tenofovir)++
27P122Polarity5.270.44Unknown++
28P219E oct-wat.5.260.25Known for NRTIs (didanosine, stavudine, zidovudine)++
29P210freq. turn5.090.25Known for NRTIs (abacavir, stavudine, tenofovir, zidovudine)++
35P41vdW vol.4.850.37Known for NRTIs (abacavir, didanosine, stavudine, tenofovir, zidovudine)++
41P135E oct-wat.4.50.39Unknown+++
49P184freq. helix4.320.53Known for NRTIs (abacavir, didanosine, lamivudine)++
59P179polarity40.13Known for NNRTIs (etravirine)+
63P43E oct-wat.3.970.13Unknown+++
64P221polarity3.970.04Unknown+++
68P188freq. turn3.770.03Known for NNRTIs (efavirenz, nevirapine)+
70P245freq. turn3.730.32Unknown+++
76P123E oct-wat.3.630.22Unknown+++
87P67freq. turn3.510.11Known for NRTIs (abacavir, stavudine, tenofovir, zidovudine)++
90P207polarity3.450.24Unknown+++
96P200freq. helix3.350.29Unknown+++
100P35Polarity3.330.26Unknown+++
105P228vdW vol.3.30.13Unknown+++

Symbols represent the status of a site:

Sites known to contribute to resistance to Didanosine;

Sites where mutations are associated with resistance to some NRTI drugs but not to Didanosine;

Sites where mutations contribute to resistance to NNRTI drugs;

Sites that are not included in the literature.5,6,30

  22 in total

1.  Comparison of the precision and sensitivity of the Antivirogram and PhenoSense HIV drug susceptibility assays.

Authors:  Jie Zhang; Soo-Yon Rhee; Jonathan Taylor; Robert W Shafer
Journal:  J Acquir Immune Defic Syndr       Date:  2005-04-01       Impact factor: 3.731

2.  Update of the drug resistance mutations in HIV-1: Fall 2005.

Authors:  Victoria A Johnson; Francoise Brun-Vezinet; Bonaventura Clotet; Brian Conway; Daniel R Kuritzkes; Deenan Pillay; Jonathan M Schapiro; Amalio Telenti; Douglas D Richman
Journal:  Top HIV Med       Date:  2005 Oct-Nov

3.  Prediction of phenotypic susceptibility to antiretroviral drugs using physiochemical properties of the primary enzymatic structure combined with artificial neural networks.

Authors:  J Kjaer; L Høj; Z Fox; J D Lundgren
Journal:  HIV Med       Date:  2008-07-08       Impact factor: 3.180

4.  Update of the drug resistance mutations in HIV-1: Spring 2008.

Authors:  Victoria A Johnson; Françoise Brun-Vezinet; Bonaventura Clotet; Huldrych F Günthard; Daniel R Kuritzkes; Deenan Pillay; Jonathan M Schapiro; Douglas D Richman
Journal:  Top HIV Med       Date:  2008 Apr-May

5.  AAindex: Amino Acid Index Database.

Authors:  S Kawashima; H Ogata; M Kanehisa
Journal:  Nucleic Acids Res       Date:  1999-01-01       Impact factor: 16.971

6.  The reverse turn as a polypeptide conformation in globular proteins.

Authors:  J L Crawford; W N Lipscomb; C G Schellman
Journal:  Proc Natl Acad Sci U S A       Date:  1973-02       Impact factor: 11.205

7.  The characterization of amino acid sequences in proteins by statistical methods.

Authors:  J M Zimmerman; N Eliezer; R Simha
Journal:  J Theor Biol       Date:  1968-11       Impact factor: 2.691

8.  HIV population dynamics in vivo: implications for genetic variation, pathogenesis, and therapy.

Authors:  J M Coffin
Journal:  Science       Date:  1995-01-27       Impact factor: 47.728

9.  The structural dependence of amino acid hydrophobicity parameters.

Authors:  M Charton; B I Charton
Journal:  J Theor Biol       Date:  1982-12-21       Impact factor: 2.691

10.  Extended spectrum of HIV-1 reverse transcriptase mutations in patients receiving multiple nucleoside analog inhibitors.

Authors:  Matthew J Gonzales; Thomas D Wu; Jonathan Taylor; Ilana Belitskaya; Rami Kantor; Dennis Israelski; Sunwen Chou; Andrew R Zolopa; W Jeffrey Fessel; Robert W Shafer
Journal:  AIDS       Date:  2003-04-11       Impact factor: 4.177

View more
  7 in total

1.  Machine learning on normalized protein sequences.

Authors:  Dominik Heider; Jens Verheyen; Daniel Hoffmann
Journal:  BMC Res Notes       Date:  2011-03-31

2.  Computational Analysis of Molecular Interaction Networks Underlying Change of HIV-1 Resistance to Selected Reverse Transcriptase Inhibitors.

Authors:  Marcin Kierczak; Michał Dramiński; Jacek Koronacki; Jan Komorowski
Journal:  Bioinform Biol Insights       Date:  2010-12-12

3.  Improved Bevirimat resistance prediction by combination of structural and sequence-based classifiers.

Authors:  J Nikolaj Dybowski; Mona Riemenschneider; Sascha Hauke; Martin Pyka; Jens Verheyen; Daniel Hoffmann; Dominik Heider
Journal:  BioData Min       Date:  2011-11-14       Impact factor: 2.522

4.  A complete map of potential pathogenicity markers of avian influenza virus subtype H5 predicted from 11 expressed proteins.

Authors:  Zeeshan Khaliq; Mikael Leijon; Sándor Belák; Jan Komorowski
Journal:  BMC Microbiol       Date:  2015-06-26       Impact factor: 3.605

5.  PASE: a novel method for functional prediction of amino acid substitutions based on physicochemical properties.

Authors:  Xidan Li; Marcin Kierczak; Xia Shen; Muhammad Ahsan; Orjan Carlborg; Stefan Marklund
Journal:  Front Genet       Date:  2013-03-06       Impact factor: 4.599

6.  Exploiting HIV-1 protease and reverse transcriptase cross-resistance information for improved drug resistance prediction by means of multi-label classification.

Authors:  Mona Riemenschneider; Robin Senge; Ursula Neumann; Eyke Hüllermeier; Dominik Heider
Journal:  BioData Min       Date:  2016-02-29       Impact factor: 2.522

7.  A Computational Approach for the Prediction of HIV Resistance Based on Amino Acid and Nucleotide Descriptors.

Authors:  Olga Tarasova; Nadezhda Biziukova; Dmitry Filimonov; Vladimir Poroikov
Journal:  Molecules       Date:  2018-10-24       Impact factor: 4.411

  7 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.