Literature DB >> 33084449

Deep mutagenesis in the study of COVID-19: a technical overview for the proteomics community.

Abstract

INTRODUCTION: The spike (S) of SARS coronavirus 2 (SARS-CoV-2) engages angiotensin-converting enzyme 2 (ACE2) on a host cell to trigger viral-cell membrane fusion and infection. The extracellular region of ACE2 can be administered as a soluble decoy to compete for binding sites on the receptor-binding domain (RBD) of S, but it has only moderate affinity and efficacy. The RBD, which is targeted by neutralizing antibodies, may also change and adapt through mutation as SARS-CoV-2 becomes endemic, posing challenges for therapeutic and vaccine development. AREAS COVERED: Deep mutagenesis is a Big Data approach to characterizing sequence variants. A deep mutational scan of ACE2 expressed on human cells identified mutations that increase S affinity and guided the engineering of a potent and broad soluble receptor decoy. A deep mutational scan of the RBD displayed on the surface of yeast has revealed residues tolerant of mutational changes that may act as a source for drug resistance and antigenic drift. EXPERT OPINION: Deep mutagenesis requires a selection of diverse sequence variants; an in vitro evolution experiment that is tracked with next-generation sequencing. The choice of expression system, diversity of the variant library and selection strategy have important consequences for data quality and interpretation.

Entities: Chemical Disease Gene Species

Keywords: ACE2; Deep mutational scan; SARS coronavirus 2; decoy receptor; mutational landscape

Mesh：

Substances：

Year: 2020 PMID： 33084449 PMCID： PMC7594187 DOI： 10.1080/14789450.2020.1833721

Source DB: PubMed Journal: Expert Rev Proteomics ISSN： 1478-9450 Impact factor: 3.940

INTRODUCTION

Investigations of protein mutations have classically been approached by precision targeting, in which a small number of mutations are deliberately introduced and tested individually. This requires preconceived ideas or hypotheses on which residues and what changes to those residues might be relevant. When the important residues in a protein sequence are unknown, screens and selections can be used instead, in which a library of random mutations is in some way sorted to enrich for a small number of mutants with the intended phenotype. Both experiments are limited in the scale of information they provide. Deep mutagenesis or deep mutational scanning take advantage of next-generation sequencing to bring experimental protein mutagenesis to the realm of Big Data [1]. A screen or selection of a diverse library of variants is tracked by next-generation sequencing to observe how the population’s genetic makeup changes. Mutations with enhanced function are enriched, while deleterious mutations are depleted; the enrichment ratio comparing frequencies in the selected population with the naive library thus acts as a proxy for relative phenotype. Now, the relative effects of thousands of mutations can be assessed simultaneously in a single experiment and a comprehensive mutational landscape can be calculated from experimental data. Deep mutagenesis has been developed by multiple groups over the past decade [2-13] and has proven especially invaluable to meet three goals: assisting protein engineering, understanding mutational tolerance within a protein sequence, and predicting which mutations might be associated with adverse disease outcomes, especially in the context of cancer or drug resistance. Two recent and prominent studies of SARS coronavirus 2 (SARS-CoV-2) have used deep mutagenesis to address each of these problems [14,15]. This Special Report summarizes the two studies with a focus on experimental details and caveats that will be unfamiliar to those outside the deep mutational scanning community.

CONCLUSION

Two deep mutagenesis studies have determined how thousands of mutations within the SARS-CoV-2 spike or the virus’ human receptor affect their binding. The data have proven invaluable for engineering high affinity decoy receptors that are under preclinical development as a COVID-19 therapy, and have revealed the scope of mutational tolerance within the spike that may have bearing on genetic drift as the virus becomes endemic and changes over time. While these two studies focused on expression and binding between the viral spike and its receptor, the underlying selection strategies used in deep mutational scans are increasingly tied to more complex phenotypes, such as selections for structural stability based on protease-sensitivity [16], using competing ligands to engineer specificity into proteins including viral receptors [17-19], and selections based on catalytic or biological activity [20-23]. Undoubtedly there are more questions related to SARS-CoV-2 biology and the biochemistry of its encoded proteins that will be solved using deep mutagenesis as the scientific community rises to this historical moment.

EXPERT OPINION

Engineered, high affinity decoy receptors for SARS-CoV-2

While much attention has been given to isolating monoclonal antibodies with tight affinity for the SARS-CoV-2 spike (S) glycoprotein [24-30], an alternative is to use the entry receptor as a soluble decoy to neutralize infection. S is a class I viral fusion protein that is proteolytically processed into two subunits, S1 and S2, that are non-covalently associated and decorate the coronavirus envelope [31-33]. S recognizes angiotensin-converting enzyme 2 (ACE2) on host cells to initiate attachment and fusion of the viral and plasma membranes [33-38]. Soluble ACE2 (sACE2) blocks receptor-binding sites on S [15,37,39-42] and while escape mutations in S rapidly emerge in tissue culture in the presence of monoclonal antibodies [43], in principle the virus has limited mechanisms to escape a soluble decoy receptor without simultaneously losing affinity for the natural receptor. The decoy receptor might also have a virucidal effect by inducing conformational changes and S1 shedding, such that virus particles are inactivated even if sACE2 dissociates. However, monoclonal antibodies have superior affinity and neutralization efficacy. To improve the therapeutic potential of decoy receptors, my group used deep mutagenesis to find mutations in ACE2 that enhance affinity [15]. A library of over 2,000 single amino acid substitutions in ACE2 was constructed, focused on diversification of residues at the structurally defined interface with the receptor-binding domain (RBD) of S [44,45] and also within the ACE2 catalytic cleft. The library was expressed in a human cell line, with a c-myc epitope tag fused to the extracellular N-terminus of ACE2 for detection of surface expressed protein. Other than the presence of the epitope tag, ACE2 expressed in this experimental selection system matches native ACE2 in the human body. The culture expressing the ACE2 library was then selected by fluorescence activated cell sorting (FACS) to collect cells expressing ACE2 variants with tight affinity for fluorescently labeled RBD from S of SARS-CoV-2 (Figure 1A).

Figure 1.

Selection strategies for deep mutational scans of the RBD•ACE2 complex

Selection strategies for deep mutational scans of the RBD•ACE2 complex For the artificial selection to be successful, cells must express a single protein variant from a single sequence variant, thereby providing a tight physical link between the phenotype of ACE2 expressed at the plasma membrane and a single sequence within the cell. Getting human cells in culture to acquire and express a single coding variant is no trivial feat, as transfection methods typically introduce many plasmid copies. Different methods to solve this technical challenge have included the use of episomal plasmids that randomly partition to daughter cells during division until progeny harbor a single coding variant over time [4], the use of engineered integration sites in the genome [9,46,47], or the use of viral vectors at low multiplicities-of-infection [48,49]. My group used carrier DNA to sufficiently dilute the ACE2 plasmid library such that each cell typically acquired no more than a single coding variant [11]. An episomal plasmid is used for the library so that extrachromosomal replication within the cell enhances expression of the protein under investigation. (The carrier DNA, itself a modified episomal plasmid, further assists in this process [50].) The disadvantage to this simple solution for linking a single genotype to phenotype is that the coding sequence is so diluted with carrier DNA, most cells in the culture do not express ACE2 and FACS time is wasted on sorting a large number of negative cells. This has important consequences on the data, as time spent sampling negative cells is time not spent sampling cells expressing the protein under investigation, and consequently variants in the library may be under-sampled giving poor data accuracy. Under-sampling becomes exceptionally concerning as the library size increases, and for this reason the library was limited to single amino acid substitutions at just 117 positions in ACE2. Following FACS selection of the human culture to enrich a cell population with high binding activity for SARS-CoV-2 protein S, RNA transcripts were isolated and Illumina sequenced. An enrichment ratio is calculated for each mutation by dividing its frequency in the sorted cell transcripts by its frequency in the naive plasmid library [51]. Illumina sequencing did not cover the full length of ACE2 and instead the cDNA was sequenced as a series of fragments that together provided full coverage of the diversified regions. One assumes during analysis that there are no additional mutations outside a sequenced fragment, a reasonable assumption when a mutation is found because the library was constructed to have only one amino acid substitution per plasmid. However, the assumption breaks down when no mutations are observed in the sequenced fragment, as one cannot know whether there was a mutation elsewhere outside the sequenced region. As a consequence, the wild type sequence is not directly observed and is instead only estimated. There are strategies using the introduction and analysis of silent mutations that can resolve this issue [52]. Overall, there was close agreement between the mutation enrichment ratios from two independent replicates of the FACS experiments, indicating that the ACE2 library was well sampled and there was high confidence in the data [15]. The enrichment ratios calculated for each variant in the sorted ACE2 library provide a mutational landscape that defines the relative phenotypes of thousands of ACE2 mutations for binding to SARS-CoV-2 S [15]. The data in this experiment are qualitative and it is unclear how a log2 enrichment ratio of, say, −2 or +3 translates to an exact change in a biophysical parameter such as KD. Furthermore, mutations can impact not only binding affinity for the RBD of S but also ACE2 surface expression. To filter out the contribution of mutations to expression, two populations of cells were collected by FACS. In addition to collecting cells that express ACE2 and tightly bind RBD, cells were simultaneously collected in the same experiment that express ACE2 but have weak RBD binding. ACE2 mutants that were not expressed at the cell surface would be depleted from both sorted populations, which was apparent from tracking the depletion of nonsense mutations. In this way, information was collected on how ACE2 mutations impact expression and RBD binding from a single FACS experiment. The deep mutational scan of ACE2 revealed that mutations can indeed be found to enhance binding toward SARS-CoV-2 RBD (Figure 2), suitable for engineering high affinity soluble decoy receptors [15]. Mutations were found at the binding interface where they enhance specific atomic contacts, and were also found distally in the second shell and beyond where they may impact ACE2 conformation, folding and dynamics. A soluble ACE2 variant that combines three mutations, called sACE22.v2.4, was found to be highly expressed, is a stable monodisperse dimer, binds SARS-CoV-2 S with picomolar affinity and potently neutralizes infection of a susceptible cell line by authentic virus. Its properties rival affinity-matured monoclonal antibodies under commercial development for therapy and prophylaxis. Despite only affinity toward SARS-CoV-2 being considered during the engineering process, sACE22.v2.4 also potently neutralizes authentic SARS-CoV-1, and we speculate that it will have broad activity against betacoronaviruses that use ACE2 as an entry receptor. In unpublished work that has yet to be peer reviewed, we have found sACE22.v2.4 broadly and tightly binds bat coronaviruses that may be a source for future pandemics, supporting the concept of receptor-based decoys as antiviral biologics with exceptional breadth.

Figure 2.

Substitutions at the RBD ACE2 interface have different outcomes on binding

Sequence constraints on the RBD of SARS-CoV-2 S for binding ACE2

In Starr et al, deep mutagenesis was applied to the SARS-CoV-2 spike to assess mutational tolerance for expression and ACE2 interactions [14]. Instead of investigating the entire trimeric S protein expressed on a cellular or viral membrane, the isolated RBD was fused to the yeast mating factor Aga2p and displayed on the yeast surface [53] (Figure 1B). This is an artificial display platform that removes the RBD from its native context. N-Glycosylation in yeast is also of high-mannose type and lacks the complex, terminally sialylated glycans produced by human cells [54], which can be important when binding interactions are glycan-dependent as is seen for some antibodies targeting viral spikes [55]. However, this display platform harnesses yeast genetics to confer tremendous advantages for in vitro selection and evolution. Using yeast display, large diverse libraries can be readily sorted by FACS to provide high-quality data. Separate selections were completed at a range of different sACE2 concentrations to simulate a titration experiment, from which the data could be converted to quantitative changes in apparent KD on the yeast surface (Figure 2). As a surrogate for how RBD mutations may impact expression of the viral spike, the effects of mutations on RBD surface display were also assessed in a standalone FACS selection. Quality control pathways for protein secretion in yeast can be forgiving of misfolded protein sequences [16] and there are residues of the RBD that would ordinarily be buried in the context of the full S protein; it therefore remains to be seen how closely the yeast display data will correlate with equivalent experiments in more physiologically relevant expression systems. Nonetheless, the predicted effects by yeast display of some mutations were validated using full length S expressed in human cells and packaged in pseudovirus [14]. The library encoding nearly 4,000 single amino acid substitutions in the SARS-CoV-2 RBD was PacBio sequenced, providing long reads that match untranslated nucleotide barcodes to a specific protein variant. Following FACS-based selection, only the barcodes are read to determine how favorable sequence variants are enriched or deleterious sequence variants are depleted. This resolves issues with Illumina sequencing failing to cover the full cDNA length, and because multiple barcodes are associated with any given protein variant, there are additional internal checks for data quality and consistency. Despite the limitations of a yeast display platform, the deep mutational scan of the isolated RBD provides a high quality and useful data set from which several important conclusions were drawn. First, the ACE2 binding surface of SARS-CoV-2 RBD tolerates surprisingly high sequence diversity, even though it is a critical site for function [14]. High diversity is also seen in the ACE2-binding sites of S proteins from SARS-related bat coronaviruses, but this matches corresponding diversity in ACE2 from ecologically diverse bat species [56] and does not necessarily mean that the RBD tolerates mutations for binding ACE2 from a single species. The deep mutational scan addresses this uncertainty and is further supported by evidence showing that diverse RBD sequences from bat coronaviruses are all competent for binding human ACE2 with varying affinities [38]. Second, mutations were found in the RBD that enhance binding to ACE2, yet there does not appear to be positive selective pressure for these variants in the human population [14]. SARS-CoV-2 affinity for ACE2 is therefore ‘good enough,’ with no additional fitness benefit for higher affinity. It is worth noting that classical SARS-CoV-1 is also a highly infectious and virulent pathogen, despite having weaker ACE2 affinity [36,57]. The rapid spread of SARS-CoV-2 probably has more to do with asymptomatic and presymptomatic transmission than enhanced receptor binding. Third, mutations were found within the epitopes for monoclonal antibodies but maintain high ACE2 binding, and it is likely that SARS-CoV-2 can easily mutate to escape neutralization without losing infectivity [14]. This agrees with selection experiments of pseudovirus expressing SARS-CoV-2 S variants, in which escape mutants in the viral spike rapidly emerge to neutralizing antibodies in a single passage [43]. This has profound implications for antibody therapy, where the standard has become combinations of non-competing monoclonals in a cocktail to prevent rapid resistance. It is currently unknown whether an engineered soluble decoy receptor, such as sACE22.v2.4, will similarly be susceptible to the emergence of viral spike variants that can discriminate between the engineered decoy and the native receptor. We hypothesize that engineered decoys will be broadly active against SARS-CoV-2 variants and this remains an active area of investigation.

35 in total

1. Glycan-Dependent Neutralizing Antibodies Are Frequently Elicited in Individuals Chronically Infected with HIV-1 Clade B or C.

Authors: Yehuda Z Cohen; Christy L Lavine; Caroline A Miller; Jetta Garrity; Brittany R Carey; Michael S Seaman
Journal: AIDS Res Hum Retroviruses Date: 2015-08-05 Impact factor: 2.205

2. Affinity and cross-reactivity engineering of CTLA4-Ig to modulate T cell costimulation.

Authors: Zhenghai Xu; Veronica Juan; Alexander Ivanov; Zhiyuan Ma; Dixie Polakoff; David B Powers; Robert B Dubridge; Keith Wilson; Yoshiko Akamatsu
Journal: J Immunol Date: 2012-09-26 Impact factor: 5.422

3. Comprehensive Sequence-Flux Mapping of a Levoglucosan Utilization Pathway in E. coli.

Authors: Justin R Klesmith; John-Paul Bacik; Ryszard Michalczyk; Timothy A Whitehead
Journal: ACS Synth Biol Date: 2015-09-22 Impact factor: 5.110

4. A comprehensive, high-resolution map of a gene's fitness landscape.

Authors: Elad Firnberg; Jason W Labonte; Jeffrey J Gray; Marc Ostermeier
Journal: Mol Biol Evol Date: 2014-02-23 Impact factor: 16.240

5. Single-mutation fitness landscapes for an enzyme on multiple substrates reveal specificity is globally encoded.

Authors: Emily E Wrenbeck; Laura R Azouz; Timothy A Whitehead
Journal: Nat Commun Date: 2017-06-06 Impact factor: 14.919

6. Multiplex assessment of protein variant abundance by massively parallel sequencing.

Authors: Kenneth A Matreyek; Lea M Starita; Jason J Stephany; Beth Martin; Melissa A Chiasson; Vanessa E Gray; Martin Kircher; Arineh Khechaduri; Jennifer N Dines; Ronald J Hause; Smita Bhatia; William E Evans; Mary V Relling; Wenjian Yang; Jay Shendure; Douglas M Fowler
Journal: Nat Genet Date: 2018-05-21 Impact factor: 38.330

7. An improved platform for functional assessment of large protein libraries in mammalian cells.

Authors: Kenneth A Matreyek; Jason J Stephany; Melissa A Chiasson; Nicholas Hasle; Douglas M Fowler
Journal: Nucleic Acids Res Date: 2020-01-10 Impact factor: 16.971

8. High-resolution mapping of protein sequence-function relationships.

Authors: Douglas M Fowler; Carlos L Araya; Sarel J Fleishman; Elizabeth H Kellogg; Jason J Stephany; David Baker; Stanley Fields
Journal: Nat Methods Date: 2010-08-15 Impact factor: 28.547

9. Angiotensin-converting enzyme 2 is a functional receptor for the SARS coronavirus.

Authors: Wenhui Li; Michael J Moore; Natalya Vasilieva; Jianhua Sui; Swee Kee Wong; Michael A Berne; Mohan Somasundaran; John L Sullivan; Katherine Luzuriaga; Thomas C Greenough; Hyeryun Choe; Michael Farzan
Journal: Nature Date: 2003-11-27 Impact factor: 49.962

10. A computationally designed inhibitor of an Epstein-Barr viral Bcl-2 protein induces apoptosis in infected cells.

Authors: Erik Procko; Geoffrey Y Berguig; Betty W Shen; Yifan Song; Shani Frayo; Anthony J Convertine; Daciana Margineantu; Garrett Booth; Bruno E Correia; Yuanhua Cheng; William R Schief; David M Hockenbery; Oliver W Press; Barry L Stoddard; Patrick S Stayton; David Baker
Journal: Cell Date: 2014-06-19 Impact factor: 41.582

5 in total

1. Modeling mutational effects on biochemical phenotypes using convolutional neural networks: application to SARS-CoV-2.

Authors: Bo Wang; Eric R Gamazon
Journal: iScience Date: 2022-06-02

Review 2. ACE2-based decoy receptors for SARS coronavirus 2.

Authors: Wenyang Jing; Erik Procko
Journal: Proteins Date: 2021-05-18

Review 3. Yeast Surface Display System: Strategies for Improvement and Biotechnological Applications.

Authors: Karla V Teymennet-Ramírez; Fernando Martínez-Morales; María R Trejo-Hernández
Journal: Front Bioeng Biotechnol Date: 2022-01-10

Review 4. Therapeutic dilemmas in addressing SARS-CoV-2 infection: Favipiravir versus Remdesivir.

Authors: Paul Andrei Negru; Andrei-Flavius Radu; Cosmin Mihai Vesa; Tapan Behl; Mohamed M Abdel-Daim; Aurelia Cristina Nechifor; Laura Endres; Manuela Stoicescu; Bianca Pasca; Delia Mirela Tit; Simona Gabriela Bungau
Journal: Biomed Pharmacother Date: 2022-02-04 Impact factor: 6.529

5. Decreased Interfacial Dynamics Caused by the N501Y Mutation in the SARS-CoV-2 S1 Spike:ACE2 Complex.

Authors: Wesam S Ahmed; Angelin M Philip; Kabir H Biswas
Journal: Front Mol Biosci Date: 2022-07-22

5 in total