Literature DB >> 33085662

Predominance of positive epistasis among drug resistance-associated mutations in HIV-1 protease.

Tian-Hao Zhang¹, Lei Dai², John P Barton³, Yushen Du^4,5, Yuxiang Tan², Wenwen Pang⁶, Arup K Chakraborty^7,8, James O Lloyd-Smith⁹, Ren Sun⁵.

Abstract

Drug-resistant mutations often have deleterious impacts on replication fitness, posing a fitness cost that can only be overcome by compensatory mutations. However, the role of fitness cost in the evolution of drug resistance has often been overlooked in clinical studies or in vitro selection experiments, as these observations only capture the outcome of drug selection. In this study, we systematically profile the fitness landscape of resistance-associated sites in HIV-1 protease using deep mutational scanning. We construct a mutant library covering combinations of mutations at 11 sites in HIV-1 protease, all of which are associated with resistance to protease inhibitors in clinic. Using deep sequencing, we quantify the fitness of thousands of HIV-1 protease mutants after multiple cycles of replication in human T cells. Although the majority of resistance-associated mutations have deleterious effects on viral replication, we find that epistasis among resistance-associated mutations is predominantly positive. Furthermore, our fitness data are consistent with genetic interactions inferred directly from HIV sequence data of patients. Fitness valleys formed by strong positive epistasis reduce the likelihood of reversal of drug resistance mutations. Overall, our results support the view that strong compensatory effects are involved in the emergence of clinically observed resistance mutations and provide insights to understanding fitness barriers in the evolution and reversion of drug resistance.

Entities: CellLine Chemical Disease Gene Mutation Species

Mesh：

Substances：

Year: 2020 PMID： 33085662 PMCID： PMC7605711 DOI： 10.1371/journal.pgen.1009009

Source DB: PubMed Journal: PLoS Genet ISSN： 1553-7390 Impact factor: 5.917

Introduction

Antibiotics and antiviral drugs have achieved great success in recent history [1]. However, therapeutic failure may occur due to low adherence and the emergence of drug resistance [2, 3]. The increasing amount of drug resistant pathogens is a global threat to public health [4-11]. The genetic barrier to drug resistance, defined as the number of mutations needed to acquire resistance, is a major determining factor of treatment outcomes [12-14]. Another important but often overlooked aspect of drug resistance is the fitness barrier [15-17]. Resistance associated mutations (RAMs) in pathogen proteins may decrease enzymatic activities, interfere with molecular interactions, or destabilize the protein structure [18-22]. Because of the impaired replication capacity without drug selection, drug-resistant mutants cannot normally outcompete wild-type or establish in the population [23-25]. However, drug-resistant mutants can sometimes reach substantial frequency in the population. Fluctuating drug concentrations may create time windows when drug-resistant mutants replicate better than wild-type virus [26]. Moreover, compensatory mutations can rescue the impaired replication capacity of mutants and stabilize drug resistance [22, 27, 27–29]. Thus, comprehensive quantification of the fitness landscape is needed to predict the evolution of drug resistance [30, 31]. Epistasis, i.e. genetic interactions between mutations, is prevalent in molecular evolution [30-34]. Negative epistasis decreases fitness of the double mutant, posing constraints on gaining multiple mutations [35, 36]. It plays an important role in shaping the local fitness landscape [37]. Positive epistasis increases replication capacity of the double mutant, facilitating pathogens to acquire and maintain drug resistance [38-40]. Positive epistasis may create a fitness valley that prevents drug resistant mutations from reversal [41]. Collectively, positive and negative epistasis determine the topography of the fitness landscape [42] and the course of drug resistance evolution [32]. Empirical studies on the genetic interactions between RAMs, especially in high-order mutants, are still rare [43, 44]. HIV-1 protease inhibitors are important components of combination antiretroviral therapy [45] that target HIV-1 protease enzymatic activity [46, 47]. Second-generation protease inhibitors have extremely high binding affinity to viral protein [48]. Resistance to them typically requires more mutations than resistance to first-generation protease inhibitors and other antiretroviral drugs [49, 50]. For example, mutation K103N on reverse transcriptase is sufficient to confer HIV-1 nevirapine (NVP) resistance [51], while more than 4 de novo mutations are needed for protease inhibitor Darunavir (DRV) resistance [52]. Protease inhibitor-resistant viruses with multiple RAMs also have significantly reduced fitness [53, 54]. HIV-1 gained RAMs on protease during sub-optimal protease inhibitor therapy [55]. Most resistance mutations directly affect the binding affinity between HIV-1 protease and the inhibitor, but they are likely to be deleterious because they also reduce binding to the native substrate of HIV-1 protease. To compensate the deleterious effect, some other RAMs stabilize HIV-1 protease, allowing drug-resistant virus to replicate as efficiently as its parental wild-type virus [27, 56]. The compensatory effects between pairs of RAMs have been studied in several studies and are available on the Stanford HIV drug resistance database [22, 57–61]. Meanwhile, reversals of protease inhibitor resistance-associated mutations were rarely seen clinically, even when therapy was interrupted [62] or when mutant virus infected drug-naïve patients [63, 64]. These observations indicate that epistasis may be important for the evolution of protease inhibitor resistance. Recent analyses of sequence co-variation in drug-targeted HIV Pol proteins (protease, reverse transcriptase and integrase) and co-evolutionary Potts model provide evidence that epistasis plays an important role in drug resistance. Despite being disfavored in the wild-type background, primary resistance mutations can become entrenched by the complex mutation patterns which arise in response to drug therapy [65, 66]. Here, we present a quantitative high-throughput genetics approach [67, 68] to study the fitness distribution and epistasis of HIV-1 protease inhibitor RAMs. Combining these data with clinical data and fitness models, we found that positive epistasis was predominant and especially enriched among RAMs, and prevalent along drug resistance evolutionary paths. Our results suggest that fitness hills created by epistasis result in barriers that entrench RAMs, and thus drug-resistant viruses are unlikely to revert after transmission to drug-naïve patients or discontinuation of anti-retroviral drug treatment.

Results

Fitness profiling of RAMs in HIV protease

To study the interactions among RAMs in HIV protease, we constructed a library of virus mutants that covers combinations of amino acid substitutions at 11 resistance-associated sites in HIV protease (Fig 1A, Table 1, 29 × 32 = 4608 genotypes). To ensure sufficient coverage, we harvested more than 30000 colonies after transforming E. coli. These sites have been annotated as major drug resistance sites in Stanford Drug Resistance Database [61, 69], and all have been shown to be strongly associated with drug resistance [3]. In our mutant library, 9 sites have one amino acid substitution and the other 2 sites have 2 amino acid substitutions (Fig 1A, Table 1). 2736 out of 4608 possible genotypes (59.38%) were covered in the plasmid library.

Fig 1

High-throughput fitness profiling of combinatorial HIV-1 protease mutant library.

(A) The structure of protease dimer (PDB: 4LL3). The side chains of selected resistance associated residues are shown. (B) Workflow of the fitness profiling. Protease mutations were introduced into NL4-3 background. T cells were infected by the mutant virus library. The frequency of mutants before (input library) and after (output library) selection were deep sequenced. (C) The correlation of relative fitness between two biological replicates. Pearson correlation coefficient (R) is 0.80. (D) Two independent validation experiments were performed. We constructed 7 protease single mutant plasmids and recovered viruses independently. We mixed each mutant virus with wild-type virus (validation 1, black dots) and passaged in T cells for 6 days. We also mixed all 7 mutant viruses together with wild-type (validation 2, red dots) and infected T cells for 6 days. The relative fitness of each mutant was quantified by the same means as that in the library. Pearson correlation coefficients (R) for validation 1 and validation 2 are both 0.84. Error bar is standard deviation (n = 3). (E) The correlation of relative fitness in this study with the experimental selection coefficients in [71]. Pearson correlation coefficients (R) is 0.79.

Table 1

List of protease inhibitor resistance associated mutations covered in the library.

a From 148840 subtype B protease sequences in Los Alamos Database [70]. b From 1951 isolates tested in PhenoSense assay [61].

Residue number	Consensus	Mutation	Prevalence in clinical dataset^a	Occurrence in in vitro dataset^b
10	L	F	1.54%	10.20%
32	V	I	1.37%	7.53%
46	M	I	4.32%	22.19%
47	I	V	0.88%	4.36%
50	I	V	0.30%	1.85%
54	I	L	0.68%	4.92%
54	I	M	0.48%	3.02%
74	T	P	0.37%	2.15%
76	L	V	0.46%	2.92%
82	V	T	0.64%	4.05%
82	V	F	0.33%	1.54%
84	I	V	3.00%	17.12%
90	L	M	7.71%	31.78%

High-throughput fitness profiling of combinatorial HIV-1 protease mutant library.

List of protease inhibitor resistance associated mutations covered in the library.

a From 148840 subtype B protease sequences in Los Alamos Database [70]. b From 1951 isolates tested in PhenoSense assay [61]. We quantified the relative fitness of mutants using high-throughput fitness profiling (Fig 1B, See Material and methods for details). We performed 3 independent transfection experiments to validate the reproducibility of fitness profiling. 20 million 293T cells were transfected and 50 million T cells were infected in each experiment. For each biological replicate, relative fitness was calculated independently. The Pearson’s correlation coefficients of single, double and triple mutations between replicates range from 0.80 to 0.82 (Fig 1C and S1 Fig). After filtering out mutants with low frequency or low reproducibility among replicates of input virus libraries (see Material and methods for details), we were able to estimate the relative fitness of 1219 genotypes. The fitnesses of all single mutants, and more than 70% of double and triple mutants, were quantified (S2 Fig). To validate the quantification of relative fitness, we conducted competition experiments with individually constructed protease mutants. We performed two sets of validation experiments. For the first set, we packaged the mutant virus and wild-type virus independently and mixed them in pairs for head-to-head competition. The frequency of the mutant virus and wild-type virus were quantified by deep sequencing and the relative fitness was calculated in the same way as we did in library screening. A total of 7 mutants were constructed and validated. For the second set of experiments, we mixed all 7 single mutants with wild-type virus in competition experiments. The relative fitness was defined in the same way. The fitness measured in validation experiments was highly correlated with the fitness in library screening (Fig 1D, R = 0.84 for each independent validation, Pearson’s correlation test). In addition, we compared the selection coefficients of HIV-1 protease mutants measured in an independent study by Boucher et al. [71] and the relative fitness values in our experiment (Fig 1E, S2 Table). The experimental results from two studies show a good correlation (Pearson’s correlation coefficient is 0.79), supporting the reliability of our experimental methods.

Positive epistasis rescues the mutational load of RAMs

We first looked at fitness effect of RAMs. In our definition, a mutant virus of relative fitness −1 means that the relative frequency of this mutant drops 10 fold after infection in cell culture. All single mutations were deleterious to virus replication (Fig 2A). The relative fitness of single mutants ranged from -2.33 (V82F) to -0.19 (L90M). This is consistent with previous reports that randomly introduced mutations were mostly deleterious to protease enzymatic activity or HIV-1 replication capacity [34, 72–74]. Random mutagenesis in other viruses also revealed a lack of beneficial mutations in well-adapted systems [73, 75–77]. RAMs in particular were also reported to be deleterious to virus replication [31, 44]. They may destabilize viral protein, affect enzymatic activities or impact other protein-protein interactions [21, 78].

Fig 2

Positive epistasis is enriched among RAMs.

Positive epistasis is enriched among RAMs.

(A) Relative fitness of single mutants. Error bar is standard deviation (n = 3). (B) The predicted relative fitness and observed relative fitness of double mutants. The predicted relative fitness was the sum of that of the two single mutants. Inset, the distribution of epistasis between double mutants. Error bar is standard deviation (n = 3). (C)The predicted and observed fraction of viable mutants. A mutant was defined as viable if its relative fitness is higher than −4(dashed line) or −2(solid line). We then analyzed epistasis between all pairs of RAMs. Previous studies have shown the prevalence of epistasis among pairs of random mutations [34, 37, 75] or spontaneously accumulated mutations [79]. However, studies focused on the epistasis among drug resistance mutations are still limited [30, 39, 72, 75, 80]. Based on the fitness effect of single RAMs, we predicted the relative fitness of double mutants with the assumption that no epistasis existed among any two single mutations (i.e., the predicted relative fitness of a double mutant was the sum of those of two single mutants)(Fig 2B). Surprisingly, the observed relative fitness of most double mutants were significantly higher than the predicted values (p = 2.2 × 10−6, two-sided Wilcoxon rank sum test), suggesting that positive epistasis is prevalent among RAMs (Fig 2B inset). Pairwise epistasis between two RAMs is quantified as ε = f − f − f, f represents the relative fitness of mutants i. The distribution of epistasis ranged from -0.69 (M46I and L90M) to 2.34 (L76V and V82F) and 86.6% of pairwise interactions between RAMs are positive. We also analyzed the extent of epistasis among high-order mutants. We observed a trend that relative fitness decreased as the order of mutants increased (S3 Fig). This is consistent with previous reports that mutational load restricted virus replication capacity [30, 81, 82]. To better quantify the fitness cost of multiple mutations, we calculated the frequency of viable mutants by different thresholds, f > −2 or f > −4. The frequency of viable mutant virus decreased as the number of mutations increased (Fig 2C), consistent with previous observations in HIV-1 and other RNA viruses [83-86]. We then predicted the relative fitness of high-order mutants by summing the relative fitness of corresponding single mutants. We observed more viable mutants than would be predicted without epistasis (Fig 2C). This indicated pervasive positive epistasis rescued high-order mutants from lethal relative fitness, which is consistent with other clinical observations in protease inhibitor resistant virus [30, 44]. As a result, positive epistasis partially relieved HIV-1 mutational load and allowed viruses to explore more sequence space.

Enrichment of positive epistasis among RAMs

There are two possible explanations for the observed positive epistasis among RAMs of HIV protease. The first hypothesis is that all mutations in HIV protease tend to interact positively. The second hypothesis is that epistasis among random mutations in HIV protease is on average zero, but positive epistasis is enriched among RAMs. We introduced the Potts model to test our hypotheses, while simultaneously testing whether our finding of prevalent positive epistasis among RAMs carries over to the clinical setting. Potts models, originally developed in statistical physics, have been employed previously to use the population-level frequencies and correlations between different mutations to estimate their fitness effects [87-90]. In the Potts model, the probability of observing a genotype is given by equations in Fig 3A. Here the A, i ∈ {1, 2, …, 99} are variables that represent the amino acid at site i on each of the 99 sites of protease. Two sets of Potts parameters, fields h(A) and couplings J(A, A), give the statistical energy , which is negatively correlated with fitness. These parameters are estimated in order to reproduce the frequencies and correlations between mutations that are observed in the data. The fields h(A) represent the fitness effect of amino acids A at sites i alone, while the couplings J(A, A) describe epistatic interactions between amino acids A at site i and A at site j. For both the couplings and the fields, positive parameter values correspond to beneficial effects on fitness, while negative values correspond to deleterious fitness effects. We applied a maximum entropy method [91] to an alignment of 20911 HIV-1 clade B protease sequences from drug naïve patients, obtained from the Los Alamos National Laboratory HIV sequence database (hiv.lanl.gov, accessed 24 March 2017) to calculate these two sets of Potts parameters.

Fig 3

Positive epistasis rescues the mutational load of RAMs.

Positive epistasis rescues the mutational load of RAMs.

(A) The conceptual graph of Potts model. Potts model uses the probability of mutations occurring with other mutations to estimate the statistical energy. h is the field parameter while J is the coupling parameter. (B) The correlation of Potts energy(ΔE = E − E) and relative fitness of mutants with lower than 4 RAMs. Spearman correlation coefficient (ρ) is −0.46. (C)The cumulative density function of coupling parameters of RAMs and all other mutations. Coupling parameters between RAMs are more positive positive than those between RAMs and others (D = 0.22, p = 2.1 × 10−7, two-sided K-S test) and those between other residues(D = 0.22, p = 5.1 × 10−7, two-sided K-S test). (D) The cumulative density function of field parameters of RAMs and all other mutations. Field parameters of RAMs and other residues are not significantly different(D = 0.25, p = 0.20, two-sided K-S test). Then we calculated for all mutants in our protease library. We found that the Potts energy for single, double or triple mutants (ΔE = E − E) is significantly correlated with the relative fitness we measured in our screening (ρ = −0.46, p = 1.2 × 10−14, Spearman’s correlation test, Fig 3B). The correlation was lower than previous analysis in HIV-1 Gag and Env region [88, 90]. This may be due in part to strong phylogenetic bias on the inferred Potts parameters, because protease is highly conserved. It is also possible that epistatic interactions with cleavage sites on other parts of the HIV-1 genome and complicated anti-innate immunity functions of protease [57, 59, 92] obscure the effects of individual mutations on replicative fitness in vitro. The Potts couplings J(A, A) give the contribution of pairwise epistatic interactions between amino acids A and A at sites i and j, respectively. We compared the couplings among RAMs and among all other possible mutations on protease (Fig 3C). Couplings of other protease mutations clustered near 0, while those of RAMs are significantly more positive than that of other mutations (D = 0.22, p = 2.1 × 10−07, two-sided K-S test). Moreover, J(A, A) among RAMs were also more positive than those between RAMs and other residues (D = 0.22, p = 5.1 × 10−07, two-sided K-S test). Although the fields h(A) of RAMs are more negative than other mutations, the difference is not significant (Fig 3D, D = 0.25, p = 0.20, two-sided K-S test). We note that the magnitude and the variation of field parameters is much larger than that of coupling parameters (Fig 3C and 3D). The Interquartile Range (IQR, i.e. the middle 50%) of field parameters is 3.55, while the IQR of coupling parameters is 0.15. The standard deviation of field parameters is 2.29, while the standard deviation of coupling parameters is 0.37. Overall, analysis based on the Potts model is consistent with our experimental results that positive epistasis is enriched among RAMs, and lends support to our second hypothesis that epistasis among random mutations in HIV protease is on average zero.

Implications of positive epistasis in evolution

To study the role of epistasis in evolution, we analyzed the evolutionary pathways covering all genotypes with up to 4 amino acid substitutions from the wild-type virus (13 single mutants, 67 double mutants, 176 triple mutants and 290 quadruple mutants) (Fig 4A). Mutants are linked if they differ by one amino acid substitution.

Fig 4

Ruggedness in fitness landscapes prevents RAMs from reversion to wild-type.

Ruggedness in fitness landscapes prevents RAMs from reversion to wild-type.

(A) Fitness with possible evolutionary trajectories. Mutants are linked if they only have one residue difference. Red line represents an accessible path that a quadruple mutant can take and reverse to wild-type. Blue line represents an inaccessible reversal path to wild-type for that mutant. (B) Trajectory-based epistasis is calculated for each amino acid substitution and averaged over genetic backgrounds with a certain Hamming distance to the wild-type. The fitness effect of a single mutation becomes less deleterious on genetic backgrounds where other RAMs have been fixed. (C) The distribution of accessible paths for all genotypes with a certain hamming distance to wild type. We have found that all 13 RAMs are deleterious on the wild-type background (Fig 2A). However, the fitness effect of a single RAM becomes less deleterious on genetic backgrounds where other RAMs have been fixed (S4 Fig). Following the generalized definition of epistasis proposed by Shah et al. [93], we define trajectory-based epistasis ε that measures the deviation of the fitness effect if the order of mutations were reversed. ε = f − f − f, where f and f represent the relative fitness of background M and single mutant j [94]. For example, mutation j can be deleterious on the wild-type background but beneficial on another genetic background that mutation i has been fixed. Trajectory-based epistasis is calculated for each amino acid substitution and averaged over genetic backgrounds with a certain Hamming distance to the wild-type (Fig 4B). For all RAMs profiled in this study, we find that trajectory-based epistasis is overall positive and increases steadily with the number of substitutions, i.e. the fitness contribution of a specific amino acid substitution becomes more positive if more RAMs have been fixed. Our results are consistent with previous analyses of sequence co-variation in HIV-1 protease [65, 66], where inferred epistastic interactions among mutations at PI resistance associated sites lead to entrenchment of primary drug resistance mutations. In this study, we combine the analyses of co-variation (Potts model) with comprehensive experimental fitness data of HIV-1 protease mutants (including a large number of higher-order mutants) to provide direct evidence of positive epistasis among RAMs of second-generation PIs. We tested the hypothesis that positive epistasis prevented resistance associated genotypes from reverting to wild-type [41, 95, 96]. Although RAMs incurred significant fitness cost, some drug resistant mutants would not revert to wild-type after transmitting to a drug naïve patient. We quantified the frequency of accessible evolutionary pathways between mutants and wild-type in our experimentally measured fitness landscape of HIV protease RAMs. A reversal path is defined to be accessible if and only if the virus fitness increases monotonically along the path. For example, quadruple mutant V32I_M46I_I54L_V82F has many paths to revert to wild-type (Fig 4A). Among them, reversing V32I, I54L, V82F and M46I in order is an accessible path (Fig 4A, red line). On the contrary, reversing I54L, V82F, M46I and V32I is not an accessible path because there are 2 steps with decreasing fitness (Fig 4A, blue line). We found that among double mutants, 44 have two accessible reversal paths to the wild type, 20 have only one accessible reversal path, and interestingly 3 of them have none. These 3 mutants (I50V_T74P, M46I_I54M and L76V_V82F) represent local fitness peaks and the reversal to wild-type is blocked by a fitness valley. We found that the number of accessible reversal paths decreased with the accumulation of RAMs (Fig 4C). This indicates that protease mutants become less likely to revert to wild-type as the number of RAMs increases. Our results are consistent with clinical observations that protease inhibitor resistance associated mutations seldom reverted even when therapies were interrupted [25, 62] or drug-naïve patients were infected [63, 64]. The difficulty of reversal also explains the rising frequency of drug resistant HIV-1 viruses in acute phase patients [41, 96].

Discussion

In this study, we systematically quantified the fitness effect of RAMs of HIV-1 protease. While all RAMs reduced the virus replication fitness, pervasive positive epistasis among RAMs alleviated the fitness cost substantially. Moreover, we analyzed the HIV sequence data from patients by the Potts model. We found the statistical energy inferred from HIV sequences in vivo correlated well with the replication fitness measured in vitro. Based on our fitness data and the mutational couplings inferred by the Potts model, we showed that positive epistasis is enriched among RAMs of HIV-1 protease, in both local fitness landscape and evolutionary paths. Finally, we studied the role of epistasis in evolutionary pathways. We found that positive epistasis among RAMs entrenches drug resistance and blocks the reversal paths to wild-type virus, which has important implications for the design of anti-retroviral therapies. Through this project, we also established a high-throughput platform to quantify the genetic interactions among a group of mutations. Another independent study profiled the fitness effect of all single amino acid change on HIV protease [71]. The data showed significant correlation with our study (Fig 1E, Pearson’s correlation coefficient (R) is 0.79). There are a few limitations of this study. Firstly, we only measured the fitness effect of RAMs in the absence of protease inhibitors. We are not able to quantify drug resistance of RAMs because protease inhibitors block multiple rounds of virus infection and prevent us from accurate examination of mutant frequency under drug selection. Also, we did not sequence other genes of HIV-1. HIV-1 mutates rapidly due to low fidelity of reverse transcriptase [97, 98]. There might be compensatory mutations occurring on other proteins that rescued the protease RAMs. Secondly, the correlation between our validation experiments and high-throughput screening experiments was less than the correlation observed in similar experiments in bacteria and yeast [99, 100]. The correlation between Potts energy and experimental fitness is also lower than previous reports on Gag and Env regions [88, 90]. Mechanistic difference between logistic growth and viral growth may complicate the quantification of viral fitness [101]. Direct measurement of viral frequency may not linearly correlate to the probability of replication [102]. Moreover, we tested a large number of higher-order mutants (i.e. multiple mutations from the wild-type virus). Our experimental dataset not only contains clinically observed genotypes but also combinations of mutations that was not observed in patients, which are highly deleterious and may suffer from higher experimental errors. If we exclude higher-order mutants and very deleterious genotypes (S5 Fig), the Spearman’s correlation between fitness and Potts energy is higher (ρ = −0.54, compared to ρ = −0.46 in Fig 3B). Thirdly, we did not cover all clinically observed polymorphism, given the bottlenecks in virus library screening. We chose to prioritize for RAMs of second-generation protease inhibitors Darunavir (DRV) and Tipranavir (TPV), which are considered to have high genetic barriers (i.e. multiple RAMs are involved in the emergence and reversal of drug resistance) [52]. According to Stanford Drug Resistance Database [61, 69], the RAMs that we chose contribute to the resistance to DRV and TPV (S2 Table). The only exception is L90M, which is frequently found in drug resistant viruses. The RAMs and the combinatorial genotypes in our library are prevalent in patients and documented in Stanford Drug Resistance Databases (Table 1). Future work could be extended to cover more clinically observed polymorphism in HIV-1 protease and other drug-targeted proteins. Finally, the correlation between Potts energy and experimental fitness is confounded by many factors, like different selection pressures in vivo and in vitro, or phylogenetic bias. Nonetheless, we observe moderate but statistically significant correlation between the coupling parameters in the Potts model and the experimental epistasis (S6 Fig, Spearman’s correlation test, p = 6.8 × 10−3). We note that the coupling parameters in the Potts model and the experimental measure of epistasis (calculated for WT genetic background) are conceptually different, representing Fourier coefficients and Taylor coefficients of the fitness landscape [103]. Our findings are consistent with the literature that Potts model couplings are strongly associated with contact residues in the three-dimensional structure of protein families [104, 105]. We tested a series of different statistical models, including the binary (Ising) model inferred via ACE, the Potts model inferred via pseudo-likelihood maximization (a popular approach to analyzing sequence data from protein families), and the Potts model inferred via ACE, to examine the epistatic effects among drug resistance mutations (S7 Fig). We found that the Potts model inferred via ACE is the best choice to analyze epistasis in our study. Statistical models suggest a pervasive negative distribution of fitness effect for single mutations on HIV-1 [31, 88, 106]. Previous models also predicted the entrenchment of deleterious RAMs by positive epistasis [65, 66]. This dataset provides a unique chance to experimentally test these statistical hypotheses. The predominance of positive epistasis is also observed in HIV-1 [30] and in other organisms [39, 107]. However, they either relied on naturally-occurring resistant clones or indirectly activating gene functions. This report is the first dataset to systematically quantify the epistasis among functional residues in HIV-1 drug resistance evolution, without the bias of drug selection and in vivo evolution. Overall, our results are important for understanding drug resistance evolution. We found positive epistasis plays a critical role in HIV-1 gaining and maintaining drug resistance. Epistasis makes the fitness landscape rugged, preventing RAMs from reversion to wild-type, even when antiviral therapy is interrupted or virus transmits to a healthy individual [95, 108]. Positive epistasis involves many kinds of molecular mechanisms. We find that the relative fitness of single mutants is not a significant factor of positive epistasis. We compared h in the Potts model for all RAMs and other single mutants. They were not significantly different (p = 0.20, K-S test). Physical distance between residues is a significant factor contributing to positive epistasis. The physical distances between these residues were significantly less than those between any two random residues on HIV-1 protease (D = 0.32, p = 3.9 × 10−10, two-sided K-S test, S8 Fig), suggesting that physical contact among RAMs might contribute to the observed positive epistasis. Notably, their average distance was more than 10 Å, indicating most of them did not have direct contact. Some mutations may have structurally stabilizing effect to other residues. We used FoldX and Rosetta to predict the folding free energy (ΔΔG) as a quantification of protein stability [109, 110] for all mutants in our library (S8 Fig). We notice that mutation V82F contributed to the positive epistasis on many genetic backgrounds (Fig 4B), but it did not contribute much to the stabilizing effect. Thus, structurally stabilizing effects cannot fully explain the predominance of positive epistasis observed in this study. Future studies on the structure and function of HIV-1 protease mutants will help elucidate the molecular mechanisms underlying the interactions among RAMs.

Material and methods

Plasmid library construction

HIV-1 RAMs were picked according to their prevalence in protease inhibitor treated patients [3]. We chose 11 residues with 13 mutations to construct a combination of HIV-1 protease mutant library (Table 1). We used a ligation-PCR method to construct the library on NL4-3 backbone, which is an infectious subtype B strain. All possible combinations of these 13 mutations are 29 × 32 = 4608 genotypes. The mutagenesis region spanned 243 nucleotides on HIV-1 genome. We split the region into 5 oligonucleotides and ligate them in order by T4 ligase (from New England BioLabs). The sequence of oligonucleotides are shown in S3 Table. After each ligation, we recovered the product by PCR and used restriction enzyme BsaI-HF (from New England BioLabs) to generate a sticky end for the next step ligation. After making the 243-nucleotide mutagenesis fragment, we PCR amplified the upstream and downstream regions near this fragment and used overlap extension PCR to ligate them together. We then cloned it into full length HIV-1 NL4-3 background. We harvested more than 30,000 E. coli colonies to ensure sufficient coverage of the library complexity.

Virus production

The plasmid DNA was purified by HiPure Plasmid Midi Prep Kit (from Thermo Fisher Scientific). To produce virus, we used 16 μg plasmid DNA and 40 μL lipofectamine 2000 (from Thermo Fisher Scientific) to transfect 2 × 107 293T cells, in 3 independent biological replicates. We changed media 12 hours post transfection. The supernatant was harvested 48 hours post transfection, labeled as input virus and frozen at -80°C. We harvested 40mL viruses from each transfection. Virus was quantified by p24 antigen ELISA kit (from PerkinElmer).

Library screening

CEM cells were cultured in RMPI 1640 (from Corning) with 10% FBS (from Corning). To passage library in T cells, we added 25 mL viruses and 120 μg polybrene to 50 million CEM cells. We achieved 10 ng p24 (108 physical viral particles) for every million CEM cells during infection. We washed cells and completely changed media 6 hours post infection. We supplemented the cells with fresh media 3 days post infection and harvested supernatant 6 days post infection. We centrifuged supernatant at 500 × g for 3 minutes to remove the cells and cell debris. The rest of supernatant was frozen at -80°C. In summary, we carefully controlled the experiment scales to ensure the library complexity was maintained in every step. Briefly, we harvested >3 × 104 E. coli colonies during bacteria transformation, which ensured ∼6-fold coverage of the expected complexity (4608 genotypes). We then transfected 2 × 107 HEK 293T cells with 16 μg plasmid library to package infectious viruses. We used 25 mL viruses (500 ng p24, ∼ 5 × 109 viral particles) to infect 2 × 107 million CEM cells for each biological replicate.

Sequencing library preparation

We used QIAamp viral RNA mini kit (from QIAGEN) to extract virus RNA from supernatant. We then used DNase I (from Thermo Fisher Scientific) to remove the residual DNA. We used random hexamer and SuperScript III (from Thermo Fisher Scientific) to synthesize cDNA. The virus genome copy number was quantified by qPCR. The qPCR primers are 5’ -CCTTGTTGGTCCAAAATGCGAAC-3’ and 5’ -ATGGCCGGGTCCCCCCACTCCCT-3’. At least 2 × 105 copies of viral genome were used to make sequencing libraries. We PCR amplified the mutagenesis regions using the following primers: 5’ -CTAATCCTGGAGTCTTTGGCAGCGACCC-3’ and 5’ -GAAGACCTGGAGTGCAGCCAATCTGAGT-3’. We then used BpmI (from New England BioLabs) to cleave the primers and ligate the sequencing adapter to the amplicon. We used PE250 program on Illumina MiSeq platform to sequence the amplicon.

Calculation of fitness and epistasis

We used custom python codes to map the sequencing reads to reference NL4-3 genome. Mutations were called if both forward and reverse reads have the same mutation and phred quality scores are both above 30. All codes are available on https://github.com/Tian-hao/protease-inhibitor. All data were deposited in SRA (short read archive) database under accession PRJNA546460. For each replicate of the virus library from the transfected 293T cells, we reached 4.45 × 105 to 6.05 × 105 sequencing depth. We filtered out the genotypes with frequency fewer than 5 × 10−5 in any biological replicate and the genotypes whose frequency differ more than 10 folds between any two biological replicates. Relative fitness f of mutant m in experiment r (biological replicates) was defined as Eq 1. F is the frequency of mutant m before screening. F is the frequency of mutant m after passaging. F is the frequency of wild-type virus before screening. F is the frequency of wild-type virus after passaging. The relative fitness f was defined as the average of 3 biological replicates (Eq 2). However, if relative fitness was missing in one replicate, we only average the other two replicates. The relative fitness value of all mutants was shown in S1 Table. where R is the number of biological replicates. Pairwise epistasis ε between mutant i and mutant j was defined as: where f refers to the relative fitness of double mutant i and j. Trajectory-based epistasis ε between a multi-mutation genotype M and another genotype differ by one mutation j was defined as:

Potts model

Data used to infer parameters for the Potts model were downloaded from the Los Alamos National Laboratory HIV sequence database, as described in the main text. Sequences were processed as previously described [111]. Briefly, we first removed insertions relative to the HXB2 reference sequence. We also excluded sequences labeled as “problematic” in the database, and sequences with gaps or ambiguous amino acids present at >5% of residues were removed. Remaining ambiguous amino acids were imputed using simple mean imputation. Each sequence in the multiple sequence alignment (MSA) is represented as a vector of variables , where N = 99 is the length of the sequence. Each of the A represents a (set of) amino acid(s) present at residue i in the protein sequence. To choose the amino acids at each site that would be explicitly represented in the model, we first computed the frequency of each amino acid A at each site i in the MSA. To compute these frequencies, we weighted the sequences such that the weight of all sequences from each unique patient was equal to one, thereby avoiding overcounting in cases where many sequences were isolated from a single individual. We then explicitly modeled the q most frequently observed amino acids at each site that collectively capture at least 90% of the Shannon entropy of the distribution of amino acids at that site [111]. All remaining, rarely observed amino acids were grouped together into a single aggregate state. For these data, this choice resulted in an average of three explicitly modeled states at each site (minimum of 2, maximum of 6). The Potts model is a probabilistic model for the ‘compressed’ sequences , where the probability of observing a sequence is Here the normalizing factor ensures that the probability distribution is normalized. We used ACE [91] to infer the set of Potts fields h(A) and couplings J(A, A) that result in average frequencies and correlations between amino acids in the model (5) that match the frequencies and correlations observed in the data. We used a regularization strength of γ = 7 × 10−5 in the inference, which is roughly equal to one divided by the number of unique patients from which the sequence data were obtained. We used “consensus gauge,” where the fields and couplings for the most frequent residue at each site in the protein are set to zero. We confirmed that the parameters inferred by ACE resulted in a Potts model that accurately recovered the correlations present in the data.

Validation experiments

We constructed 7 single mutants by site-directed mutagenesis. The primers used this experiment are listed in S3 Table. We used overlap-extension PCR to amplify the fragment with mutated nucleotides. We ligated the fragment with NL4-3 backbone using ApaI and SbfI. We transformed competent E.coli and picked single colonies. We sequenced the protease region of plasmids to make sure there is only desired mutant in this region. 7 mutants were L10F, I47V, T74P, L76V, V82F, V82T, L90M. We produced mutant viruses in 293T cells, mixed them with wild-type and infected CEM cells. The frequencies of mutant virus before and after infection were quantified by deep sequencing. We did 2 biological replicates with each validation method. For validation 1, we pairwisely mixed the mutant and wild-type virus oor competition. For validation 2, we mixed all 7 mutants and wild-type virus.

Protein stability prediction

Mutants stability was predicted using either FoldX or Rosetta. For FoldX, we used the protease structure (PDB: 3S85) as reference and repaired the structure using the RepairPDB function. The free energy of the mutants was computed by using the BuildModel function under default parameters. For Rosetta analysis, we used the protease crystal structure (PDB: 6DGX) as reference and score function ddg_monomer to evaluate the effect of mutations. Each mutants were evaluated 10 times and the average score was used as ΔΔG.

Ethics statement

Reagents were acquired from the NIH AIDS Reagent program. The work is approved by UCLA IRB.

The correlation of relative fitness among biological replicates.

All single mutants, double mutants and triple mutants are shown. R stands for Pearson correlation coefficient. (TIFF) Click here for additional data file.

Coverage of protease mutant library.

(A) Fraction of expected protease mutants in each transfection virus library. (B) Number of mutant in each transfection virus library. Dashed line represents the number of all possible combinations of mutations. (TIFF) Click here for additional data file.

Relative fitness of different order of mutations.

(TIFF) Click here for additional data file.

Relative fitness of single RAMs on different genetic backgrounds.

(TIFF) Click here for additional data file.

Correlation between Potts energy and relative fitness for low order mutants.

Mutants with relative fitness higher than −2.5 and numbers of mutations lower than 4 is shown. The Pearson’s correlation coefficient is −0.57. The Spearman’s correlation coefficient is −0.54. (TIFF) Click here for additional data file.

The correlation between Potts’ coupling parameters with experimental epistasis.

The pairwise epistasis between all RAMs in our library was compared with Potts’ coupling parameters. The Spearman’s correlation coefficient is −0.33. The p value for the Spearman’s correlation coefficient is 6.8 × 10−3. (TIFF) Click here for additional data file.

Correlation between relative fitness and different statistical models.

(A, B & C)The correlation between relative fitness with (A, bin) binary (Ising) model inferred via ACE, (B, plm) the Potts model inferred via pseudo-likelihood maximization, or (C, potts) the Potts model inferred via ACE. (D) Spearman’s correlation coefficients for different models. Mutants were classified according to their HD to wild-type. HD, hamming distance. (TIFF) Click here for additional data file.

Structure insights on resistance associated mutations.

(A) Distribution of pairwise distance among resistance associated residues and other residues. The distance between the C-α of two residues was shown. (B & C) Correlation between mutants’ relative fitness and protein stability (ΔΔG). ΔΔG is predicted by FoldX (B) or Rosetta (C). The correlation coefficients were calculated for mutants with lower than 5 mutations. ρ stands for Spearman’s correlation coefficient. (TIFF) Click here for additional data file.

Relative fitness of all mutants in this research.

(PDF) Click here for additional data file.

Information of protease inhibitor resistance associated mutations covered in the library.

(PDF) Click here for additional data file.

Sequence of oligonucleotides used in this research.

(PDF) Click here for additional data file.

Protein stability simulated by Rosetta or FoldX.

(TSV) Click here for additional data file.

106 in total

Review 1. Antifungal drug resistance of pathogenic fungi.

Authors: Dimitrios P Kontoyiannis; Russell E Lewis
Journal: Lancet Date: 2002-03-30 Impact factor: 79.321

2. DNA deamination mediates innate immunity to retroviral infection.

Authors: Reuben S Harris; Kate N Bishop; Ann M Sheehy; Heather M Craig; Svend K Petersen-Mahrt; Ian N Watt; Michael S Neuberger; Michael H Malim
Journal: Cell Date: 2003-06-13 Impact factor: 41.582

3. Predictors of HIV drug-resistance mutations in a large antiretroviral-naive cohort initiating triple antiretroviral therapy.

Authors: P Richard Harrigan; Robert S Hogg; Winnie W Y Dong; Benita Yip; Brian Wynhoven; Justin Woodward; Chanson J Brumme; Zabrina L Brumme; Theresa Mo; Chris S Alexander; Julio S G Montaner
Journal: J Infect Dis Date: 2004-12-22 Impact factor: 5.226

4. The fitness cost of mutations associated with human immunodeficiency virus type 1 drug resistance is modulated by mutational interactions.

Authors: Mian-er Cong; Walid Heneine; J Gerardo García-Lerma
Journal: J Virol Date: 2006-12-27 Impact factor: 5.103

5. Large-scale identification of coevolution signals across homo-oligomeric protein interfaces by direct coupling analysis.

Authors: Guido Uguzzoni; Shalini John Lovis; Francesco Oteri; Alexander Schug; Hendrik Szurmant; Martin Weigt
Journal: Proc Natl Acad Sci U S A Date: 2017-03-13 Impact factor: 11.205

6. A systematic survey of an intragenic epistatic landscape.

Authors: Claudia Bank; Ryan T Hietpas; Jeffrey D Jensen; Daniel N A Bolon
Journal: Mol Biol Evol Date: 2014-11-03 Impact factor: 16.240

7. Prevalence of drug-resistant HIV-1 variants in untreated individuals in Europe: implications for clinical management.

Authors: Annemarie M J Wensing; David A van de Vijver; Gioacchino Angarano; Birgitta Asjö; Claudia Balotta; Enzo Boeri; Ricardo Camacho; Maire-Laure Chaix; Dominique Costagliola; Andrea De Luca; Inge Derdelinckx; Zehava Grossman; Osamah Hamouda; Angelos Hatzakis; Robert Hemmer; Andy Hoepelman; Andrzej Horban; Klaus Korn; Claudia Kücherer; Thomas Leitner; Clive Loveday; Eilidh MacRae; Irina Maljkovic; Carmen de Mendoza; Laurence Meyer; Claus Nielsen; Eline L Op de Coul; Vidar Ormaasen; Dimitris Paraskevis; Luc Perrin; Elisabeth Puchhammer-Stöckl; Lidia Ruiz; Mika Salminen; Jean-Claude Schmit; Francois Schneider; Rob Schuurman; Vincent Soriano; Grzegorz Stanczak; Maja Stanojevic; Anne-Mieke Vandamme; Kristel Van Laethem; Michela Violin; Karin Wilbe; Sabine Yerly; Maurizio Zazzi; Charles A Boucher
Journal: J Infect Dis Date: 2005-08-15 Impact factor: 5.226

8. Global fitness profiling of fission yeast deletion strains by barcode sequencing.

Authors: Tian Xu Han; Xing-Ya Xu; Mei-Jun Zhang; Xu Peng; Li-Lin Du
Journal: Genome Biol Date: 2010-06-10 Impact factor: 13.583

9. Broad antiretroviral defence by human APOBEC3G through lethal editing of nascent reverse transcripts.

Authors: Bastien Mangeat; Priscilla Turelli; Gersende Caron; Marc Friedli; Luc Perrin; Didier Trono
Journal: Nature Date: 2003-05-28 Impact factor: 49.962

10. A quantitative high-resolution genetic profile rapidly identifies sequence determinants of hepatitis C viral fitness and drug sensitivity.

Authors: Hangfei Qi; C Anders Olson; Nicholas C Wu; Ruian Ke; Claude Loverdo; Virginia Chu; Shawna Truong; Roland Remenyi; Zugen Chen; Yushen Du; Sheng-Yao Su; Laith Q Al-Mawsawi; Ting-Ting Wu; Shu-Hua Chen; Chung-Yen Lin; Weidong Zhong; James O Lloyd-Smith; Ren Sun
Journal: PLoS Pathog Date: 2014-04-10 Impact factor: 6.823

7 in total

Introduction

Results

Fitness profiling of RAMs in HIV protease

High-throughput fitness profiling of combinatorial HIV-1 protease mutant library.

List of protease inhibitor resistance associated mutations covered in the library.

Positive epistasis rescues the mutational load of RAMs

Positive epistasis is enriched among RAMs.

Enrichment of positive epistasis among RAMs

Positive epistasis rescues the mutational load of RAMs.

Implications of positive epistasis in evolution

Ruggedness in fitness landscapes prevents RAMs from reversion to wild-type.

Discussion

Material and methods

Plasmid library construction

Virus production

Library screening

Sequencing library preparation

Calculation of fitness and epistasis

Potts model

Validation experiments

Protein stability prediction

Ethics statement

The correlation of relative fitness among biological replicates.

Coverage of protease mutant library.

Relative fitness of different order of mutations.

Relative fitness of single RAMs on different genetic backgrounds.

Correlation between Potts energy and relative fitness for low order mutants.

The correlation between Potts’ coupling parameters with experimental epistasis.

Correlation between relative fitness and different statistical models.

Structure insights on resistance associated mutations.

Relative fitness of all mutants in this research.

Information of protease inhibitor resistance associated mutations covered in the library.

Sequence of oligonucleotides used in this research.

Protein stability simulated by Rosetta or FoldX.

Review 1. Antifungal drug resistance of pathogenic fungi.

Review 6. HIV Protease: Historical Perspective and Current Research.