| Literature DB >> 36032687 |
Lucas S P Rudden1, Mahdi Hijazi1, Patrick Barth1.
Abstract
Following the hugely successful application of deep learning methods to protein structure prediction, an increasing number of design methods seek to leverage generative models to design proteins with improved functionality over native proteins or novel structure and function. The inherent flexibility of proteins, from side-chain motion to larger conformational reshuffling, poses a challenge to design methods, where the ideal approach must consider both the spatial and temporal evolution of proteins in the context of their functional capacity. In this review, we highlight existing methods for protein design before discussing how methods at the forefront of deep learning-based design accommodate flexibility and where the field could evolve in the future.Entities:
Keywords: deep learning; generative models; protein design; protein flexibility; protein switches
Year: 2022 PMID: 36032687 PMCID: PMC9399439 DOI: 10.3389/fmolb.2022.928534
Source DB: PubMed Journal: Front Mol Biosci ISSN: 2296-889X
FIGURE 1Three types of generative models are generally applied in DL-based protein design: (A) autoencoders/variational autoencoders (AE/VAEs), (B) generative adversarial networks (GANs), and (C) autoregressive models.
FIGURE 2Most DL protein design methods tackle design as either a (A) sequence generation or (B) structure generation problem, each accompanied by the general process outlined here. Examples of both methods used to assess the quality of generated samples and specific DL protein design examples are also indicated.
Summary of the key deep learning protein design methods discussed in this review, with their generation type and generative model type indicated by a *. ∼ in the structure design field suggests that some minor design coincides with sequence design. The design target of each method is also provided.
| Method | Generation type | Generative model | Design target | |||
|---|---|---|---|---|---|---|
| Sequence | Structure | VAE | GAN | Autoregressive | ||
|
| * | * | Antimicrobial peptides | |||
|
| * | * | Membranolytic anticancer peptides | |||
| PepCVAE | * | * | Antimicrobial peptides | |||
|
| * | * | Luciferase enzymes | |||
| ProteinGAN | * | * | MDH-like enzymes | |||
|
| * | * | Metalloproteins | |||
|
| * | * | Antimicrobial peptides | |||
|
| * | * | Human antibodies | |||
|
| * | * | Non-specific | |||
| ProteoGAN | * | * | Non-specific | |||
| ProteinSolver | * | * | Non-specific | |||
|
| * | * | Non-specific | |||
| Ig-VAE | * | * | Immunoglobulins | |||
|
| * | * | Non-specific | |||
|
| * | ∼ | Inverted structure prediction model | Non-specific | ||
|
| * | * | Inverted structure prediction model | Non-specific | ||
|
| * | ∼ | Inverted structure prediction model | Non-specific | ||
FIGURE 3(A) Current design methods either: (i) Produce new sequences corresponding to some structure with limited design objective conditioning that could be leveraged for conformational flexibility design. (ii) Produce novel folds that confer some function that must be stabilised through sequence design. Both these approaches are inherently negligent of conformational flexibility. (B) (i) The general goal of DL-based protein switch design is to connect multiple structures to one sequence, with conformational perturbation triggered by some controlled signal. I.e., Given some stimuli (e.g., palatinate peptide), the contacts of a designed sequence in one state (red) shift given some new fold (blue), providing novel functional capacity. This could be achieved through (ii) Conformational landscape optimisation of multiple states given some design objective, similar to Norn et al., (iii) Harnessing of implicit relationships between sequence and multiple structures contained within MSA data, as demonstrated by del Alamo et al. (2022). Here, co-evolving residues (denoted in the coloured blocks) in two different low-depth MSAs make distinct contacts (shown as C1, C2, etc.,) that change the overall fold state.