Literature DB >> 35976160

AlphaFold, Artificial Intelligence (AI), and Allostery.

Ruth Nussinov1,2, Mingzhen Zhang1, Yonglan Liu3, Hyunbum Jang1.   

Abstract

AlphaFold has burst into our lives. A powerful algorithm that underscores the strength of biological sequence data and artificial intelligence (AI). AlphaFold has appended projects and research directions. The database it has been creating promises an untold number of applications with vast potential impacts that are still difficult to surmise. AI approaches can revolutionize personalized treatments and usher in better-informed clinical trials. They promise to make giant leaps toward reshaping and revamping drug discovery strategies, selecting and prioritizing combinations of drug targets. Here, we briefly overview AI in structural biology, including in molecular dynamics simulations and prediction of microbiota-human protein-protein interactions. We highlight the advancements accomplished by the deep-learning-powered AlphaFold in protein structure prediction and their powerful impact on the life sciences. At the same time, AlphaFold does not resolve the decades-long protein folding challenge, nor does it identify the folding pathways. The models that AlphaFold provides do not capture conformational mechanisms like frustration and allostery, which are rooted in ensembles, and controlled by their dynamic distributions. Allostery and signaling are properties of populations. AlphaFold also does not generate ensembles of intrinsically disordered proteins and regions, instead describing them by their low structural probabilities. Since AlphaFold generates single ranked structures, rather than conformational ensembles, it cannot elucidate the mechanisms of allosteric activating driver hotspot mutations nor of allosteric drug resistance. However, by capturing key features, deep learning techniques can use the single predicted conformation as the basis for generating a diverse ensemble.

Entities:  

Mesh:

Substances:

Year:  2022        PMID: 35976160      PMCID: PMC9442638          DOI: 10.1021/acs.jpcb.2c04346

Source DB:  PubMed          Journal:  J Phys Chem B        ISSN: 1520-5207            Impact factor:   3.466


Introduction

AlphaFold has overcome age-long bottlenecks and forcefully bared the power of artificial intelligence (AI) in biological research.[1−3] AlphaFold has combined numerous deep learning innovations to predict the three-dimensional (3D) structures of proteins at or near experimental scale resolution, inspiring the community (including us) to rethink studies of function, evolution, and disease (e.g., refs (4−13)). The sheer volume of the rapidly generated accurate structures argues that new, ambitious, frontier-pushing studies will emerge. It also points to research projects that should be reconsidered. The richness of high quality data that are being compiled in databases (e.g., refs (5 and 14−25)) is already strengthening studies that require protein structures, such as mapping binding sites and interactions in signaling pathways, and identification of hot spots, including latent and rare cancer driver mutations.[26−34] The most profound impact will likely be in accelerating and improving production of new medications (e.g., ref (35)), and in generating data that can be used toward this vital aim (e.g., refs (5, 17, 18, and 36−39)). AI developments and applications[40] may further help foretell whether the signal propagating downstream will be strong enough to reach its genomic target to activate (suppress) gene expression,[41] and predict pathways.[42−49] Altogether, these powerful approaches and the databases that they create revamp and transform traditional and ongoing research involving the use of structures. They also embolden us to step back, rethink, and innovate our projects. AlphaFold’s achievements have been made possible by the protein databank (PDB), currently with a size nearing 200 000 experimentally determined structures. It has been trained on protein chains from the PDB and uses the input sequence to query databases of protein sequences to construct a multiple sequence alignment.[4] However, its striking success has not led us to a deeper mechanistic understanding of exactly how a protein sequence folds, thus not assisting in the folding of a protein from its sequence. Below, we first briefly describe the protein folding problem,[50,51] and strategies to predict protein structures. We describe key conceptual and computational developments and the transformative AlphaFold advances. We outline its strengths and some weaknesses. We emphasize what it has accomplished, and what it has not, and the magnitude of the challenges, underscoring the difference between the theoretical folding problem, which was not solved,[51−57] and practical predictions by incorporating additional evolution information that generally have been.[58−65] We proceed to AI approaches to the complementary problem of protein–protein interactions (PPIs) by these methods and others,[66−75] with the human–microbiome PPI as a relevant and topical example.[66] AI-powered prediction of human–microbe PPIs can accelerate research into questions such as how microbiota hijack cell signaling and provide drug targets.[76−80] We discuss how AI can reshape drug discovery, for example by amplifying repurposing of FDA-approved drugs,[81−91] an area which is already thriving. AI can also select combinations of drug targets, powerfully guiding and accelerating experiments by providing specific testable hypotheses. Machine learning has already proved its merit in the life and medical sciences.[1,92−98] Coupled with harnessed exascale computing,[99,100] advanced, AI-powered methods are set to revolutionize therapeutic development, providing prioritized drug combinations for the attending physicians. Finally, we note that AlphaFold, which predicts single ranked structures for a protein sequence, is unable to address directly allosteric mechanisms, which are based on the populations of conformational states in the ensembles.[101−117] Allostery, where the signal propagates dynamically with the shifts in the populations, underlies regulation and thus cell life.[118,119] Due to its higher specificity and consequently lower toxicity, which results from targeting nonconserved allosteric sites, allostery also increasingly features in allosteric drugs.[120−125] Can we then foresee AlphaFold assisting in unraveling the mechanisms of allosteric hotspot mutations and allosteric drug discovery? Indirectly it can and does, even in our hands. The rigid structures that AlphaFold predicts can be submitted to MD simulations that generate such ensembles (Figure ). At the same time, as we discuss here, other AI-based strategies can assist directly in such efforts, most effectively via accelerating and enhancing MD simulations. Efforts are also likely to persist in exploiting AI toward prediction of allosteric binding sites. Nevertheless, it behooves us to recall that the effectiveness of allosteric sites is determines by both stable interactions at the site, which is something that AI can help with, and initiation of effective allosteric signals, which would be more challenging. Current approaches to predict allosteric binding sites address only the former. In that sense, they resemble the characterization of orthosteric sites, except that their scoring is based on statistics of allosteric sites.
Figure 1

Current strategy of allosteric drug discovery in computational structural biology employing the AlphaFold program with artificial intelligence (AI)-powered methods (top panel). Experimental instruments, such as X-ray crystallography, cryo-electron microscopy (cryo-EM), and nuclear magnetic resonance (NMR) can resolve protein structures, but often miss the coordinates of highly fluctuating regions in the protein structure. AlphaFold can predict the missing coordinates of these regions. The resulting structure can be subjected to molecular dynamics (MD) simulations that provide conformational dynamics, conformational changes, and folding characteristics of the protein. An example is shown for Src homology region 2-containing protein tyrosine phosphatase 2 (SHP2) (bottom panel). The X-ray structure of SHP2 (PDB ID: 4DGP) misses residues in two flexible regions, which can be predicted by AlphaFold. SHP2 contains two Src homology 2 (SH2) domains (nSH2 and cSH2) and a protein tyrosine phosphatase (PTP) domain.

Current strategy of allosteric drug discovery in computational structural biology employing the AlphaFold program with artificial intelligence (AI)-powered methods (top panel). Experimental instruments, such as X-ray crystallography, cryo-electron microscopy (cryo-EM), and nuclear magnetic resonance (NMR) can resolve protein structures, but often miss the coordinates of highly fluctuating regions in the protein structure. AlphaFold can predict the missing coordinates of these regions. The resulting structure can be subjected to molecular dynamics (MD) simulations that provide conformational dynamics, conformational changes, and folding characteristics of the protein. An example is shown for Src homology region 2-containing protein tyrosine phosphatase 2 (SHP2) (bottom panel). The X-ray structure of SHP2 (PDB ID: 4DGP) misses residues in two flexible regions, which can be predicted by AlphaFold. SHP2 contains two Src homology 2 (SH2) domains (nSH2 and cSH2) and a protein tyrosine phosphatase (PTP) domain. Orthosteric drugs block the active site; allosteric drugs alter the population of the active state of the protein, including the active site, through binding at a site far away.[126] We suggested that allosteric drugs can constitute of “anchors” and “drivers” atoms, where the anchor atoms bind to the allosteric pocket, without changing the conformation of the binding site. The interactions of the anchor atoms stabilize ligand binding, resembling protein–ligand binding at the orthosteric site. The binding of driver atoms “pulls” or “pushes” atoms in the protein pocket. This initiates the allosteric signal, which shifts the receptor population from the inactive to the active state. Driver atoms can trigger agonism and antagonism. AlphaFold cannot handle population shifts. AI strategies can but will need to go beyond prediction of stabilizing interactions. Finally, not surprisingly, prediction of the structures of intrinsically disordered proteins (IDPs) and regions (IDRs) is another problem where AlphFold falls short. Disordered proteins (regions) are characterized by broad and heterogeneous ensembles where the differences in the relative conformational stabilities are small, or even minor and the barriers are low.[127−135] The conformations interconvert, leading to low probabilities of AlphaFold’s reliably capturing those most favored, or the conformational distribution. Nevertheless, leveraging, learning, and mining of the conformations can exploit AI.[136−138] AI-powered algorithms, which are fed vast compiled data, and enabled by the emerging massive compute power are propelling a revolution in computational biology (Figure ). Unlike quantum computing, in the case of AI and data-driven computing, the technological innovations at the requisite scales are already at hand.

Protein Folding versus Prediction of Protein Structure

Protein Folding

The protein folding problem embraces two questions:[51] first, the conceptual question of how a protein’s amino acid sequence dictates its 3D atomic structure, and second, how, starting from a single amino acid sequence, to successfully predict the 3D structure, without using information related to other available (homologous, same family) sequences nor structures of any related sequences. Such computational prediction methods are guided by the conceptual notion that this is how the protein folds in nature. Single sequence-based prediction in solution considers forces related to hydrogen bonds, ion pairs, van der Waals attractions,[139] and chiefly water-mediated hydrophobic interactions, with the hydrophobic effect the driving force for protein folding. This formal folding problem emerged six decades ago, alongside the first atomic-resolution protein (globins) structure. The structure led to thermodynamic questions of the balance of interatomic forces that determines the structure of the protein, how the protein can fold so quickly, that is the kinetics of the pathways, and the computational problem of protein structure prediction. The landmark thermodynamic hypothesis of Christian Anfinsen and his colleagues[140,141] stated that the native structure of a protein is its thermodynamically most stable structure, and it is determined only by its amino acid sequence and the conditions it is at, with kinetics playing no role. No other considerations are at play, that is, whether it is synthesized in the lab or on the ribosome or undergoes chaperone assisted folding. The folding paradigm stipulated that unfolded molecules will always spontaneously fold into the same shape; that is, the linear amino acid sequence specifies a protein’s folded native state.[58,142−144] Anfinsen’s thermodynamic hypothesis emphasized the shape of the energy landscape where the native state is the one with the lowest free energy.[141,145] Computationally, that description posed the problem of prediction of protein structure, forming the basis for approaches that dominated the field for scores of years. If only the sequence matters, along with the physicochemical forces, it should be possible for “good” algorithms to fold it. Assuming that the crystal structure represents the minimum energy state, the “goodness” of the predicted structure can then be assessed by comparison with it. Anfinsen’s description combines sampling of alternative conformations, ranking them by energy and identifying the lowest energy state.[51,146−148] Subsequent efforts focused on prediction of secondary structures, although the dominant role of the hydrophobic interactions suggested that secondary structure is an outcome of the 3D structure and its cause.[149,150] The small (5–10 kcal/mol) difference in the stability of the native structures as compared to the denatured states[151] compounded the challenge that predicting methods faced. Already early on, Cyrus Levinthal conceptualized the key problem facing the protein and the prediction algorithms:[152] the vast time scales for the protein to search the folding space and reach its most stable native state under biological conditions.[58] For prediction algorithms’ sampling backbone states, the search space size grows exponentially with chain length, becoming an impossibility. Levinthal argued that there is no need to search this vast space since the energy landscape is funnel-like, rather than flat, and thus can guide sampling toward the biological conformational basin.[51,153] Packed hydrophobic cores optimize their van der Waals (vdW) interactions, restrict torsion angles, and abolish internal “holes”, with hydrogen bonds and salt bridges balancing the loss of interactions with water. Harold Scheraga employed physical chemistry to pioneer studies to decipher how amino acid sequences influence the 3D folding pathways, thermodynamics, and biological activity of proteins. Neither AlphaFold nor broadly other protein structure prediction algorithms consider folding pathways. Physical chemistry is accounted for implicitly; in the case of AlphaFold, via AI.

Protein Structure Prediction

Prediction of protein structure can be template-based or template-free, which does not use global similarity to an experimental (protein data bank, PDB) structure.[58] Template-free modeling exploits physics-based energy functions. Both can exploit machine learning and AI to use data in the PDB. Template-based modeling selects a structural template and uses sequence alignment. Template-free modeling uses conformational sampling and ranking. It may start with multiple-sequence alignment to related sequences to predict local structural features, which will guide the 3D modeling followed by refining and ranking. Integrative modeling[154−156] that assembles structures from individual components may suffer from high false-positive rate. Computational integrative approach can combine data from experimental methods, bioinformatics, physics, and statistics for rapid and accurate structure determination of protein complexes. The algorithms can integrate experimental data, such as X-ray crystallography, NMR spectroscopy, 2D and 3D electron microscopy (EM), small-angle X-ray scattering (SAXS), mass spectrometry (MS), hydrogen–deuterium exchange (HDX), mutations, sequence conservation and covariation, and statistical analysis of known structures. Computationally, the algorithms can derive from computer vision, image processing, computational geometry, machine learning, robotics, and graph algorithms. Machine learning has however been used toward protein structure prediction.[157−163] AlphaFold is not the first in being a machine-learning model. Its remarkable success (with scores of near 90 even in the difficult targets in the 2020 Critical Assessment of Protein Structure Prediction, CASP) was influenced by its training not only on all the PDB structures but also on structures it predicted, and it uses the structure and correlation data to predict the pairs of amino acids that are in contact as well as all amino acid pairwise distances. It also ensured that the distances between the amino acids satisfy the triangle inequality, saving time at intermediate steps.[164] To date, AlphaFold illuminates half of the dark human proteins.[10] Still, questions remain, such as which structural states exist for a give protein, and what is the population of each state. Addressing these questions is vital to relate protein structure to function. This is where AlphaFold falls short. However, the models it produces can serve as input to generate ensembles, for example by MD simulations, which, if carried out at sufficiently longtime scales, in parallel, it should be able to produce. Simulations can sample the relevant states, can enumerate possible state combinations (multistate models), and can determine the population sizes for the states.

The Structure–Function Paradigm Overlooked Ensembles and Dynamic Energy Landscapes: AlphaFold Is Attuned but Is Unable to Address Them

The sequence–structure–function dogma was the touchstone of a generation. It dominated molecular biology for decades. It was introduced by physical chemists who explained that biological macromolecules function when they are folded. Thus, to understand how molecules function, one needs to consider their 3D structures, a transformative paradigm that became a tenet of modern biology. Today, it is broadly recognized that rigid molecules cannot perform a function, leading the way to the appreciation that to sustain life, molecular flexibility is a necessity. That however has not fully translated to the understanding of the powerful concept of the energy landscape.[165] That is, that biomolecules are dynamical objects that are always interconverting between a variety of structures with varying energies,[166] and that this is the origin of allosteric mechanisms.[167−169] This notion of flexibility as interconversion between conformations is critical for understanding biological processes and their regulation, such as protein activation as a shift of the ensemble from the inactive to the active state, how allosteric drugs work, cell signaling, and binding mechanisms through conformational selection rather than induced fit. The conceptual evolution from the classic structure–function paradigm to dynamic energy landscapes of biomolecular function and allosteric mechanisms, poses a challenge to AlphaFold’s powerful predictions. To understand biological regulation, structure should be linked to function through protein ensembles in terms of populations and relative energies, which is the foundation of allostery (Figure ). Despite its transformative power and vast broad impact, the AlphaFold predictions are unable to address it directly. It is only through their sampling that this functional aim can be accomplished.
Figure 2

Structural ensembles for B-Raf activation. The snapshots for B-Raf kinase domains (top panels) are generated from the protein databank (PDB). The representative inactive OFF-state conformation (PDB ID: 3SKC) and active ON-state conformation (PDB ID: 6UAN) are highlighted in blue and red, respectively. The free energy landscape of B-Raf kinase domain depicting the population shift from OFF-state to ON-state upon activation (middle panel). Highlighted activation segments of αC-helix and A-loop representing the side by side comparisons between the single structure predicted by AlphaFold and the representative B-Raf conformations of both inactive OFF-state and active ON-state (bottom panels). The AlphaFold structure falls into neither the active ON-state nor the inactive OFF-state.

Structural ensembles for B-Raf activation. The snapshots for B-Raf kinase domains (top panels) are generated from the protein databank (PDB). The representative inactive OFF-state conformation (PDB ID: 3SKC) and active ON-state conformation (PDB ID: 6UAN) are highlighted in blue and red, respectively. The free energy landscape of B-Raf kinase domain depicting the population shift from OFF-state to ON-state upon activation (middle panel). Highlighted activation segments of αC-helix and A-loop representing the side by side comparisons between the single structure predicted by AlphaFold and the representative B-Raf conformations of both inactive OFF-state and active ON-state (bottom panels). The AlphaFold structure falls into neither the active ON-state nor the inactive OFF-state. Around their native states, protein landscapes consist of rapidly interconverting conformations. The ensembles are “fuzzy”.[170,171] Events associated with their environments and functions, such as changes in pH, interactions with ions, water, and lipids, and binding of small or macromolecules, promote conformational changes. These are frustrated by their local restricted molecular environment.[172] The cooperative, accommodating structural changes shift the ensemble. The shifted, now populated states are frustrated by their current neighboring residues conformations. Binding and catalysis involve making and breaking covalent and noncovalent interactions at the interaction site. These propagate through frustration, influencing the conformational states of the ensemble. The shifts in the ensemble alter the relative stabilities, i.e., the populations of the states, thus influencing the allosteric transitions. Importantly, frustration does not create new conformations; instead, it alters the number of molecules populating it.[173]Frustration is thus a powerful tool harnessed by evolution for function.[174] Biomolecules must be described statistically, not statically.[166,175] Static descriptions were the norm for decades. Yet, a static description cannot capture function. It cannot describe protein activation from the inactive to the active state upon some activation event, such as binding a hormone, or being covalently modified by a post-translational modification, or the presence of oncogenic driver mutations. It is also unable to describe how high affinity binding to an activator shifts protein molecules to their active state.[176,177] It will further fail when attempting to describe how allosteric “rescue mutations”[178,179] work (albeit not other rescue mutations, e.g., refs (180−183)), how allosteric drugs are able to block the active site, and how mutations countering them can be overcome. All these processes which take place in the cell would not have been possible had the protein existed in a single structure or was flipping between only two states, active and inactive. While there is a single conformation that the active enzyme should adopt for productive catalysis, there are multiple ways to inactivate it and thus many inactive states. The notion of a single structure bred the concept of the “lock-and-key” binding mechanism. This view was superseded by the “induced fit” mechanism which considered the presence of only two states, an active and an inactive state. In an induced fit scenario, the ligands bind to the single “open” protein structure and the interaction between a protein and a rigid binding partner induces a conformational change in the protein.[184] In contrast, the conformational selection mechanism[167−169,185] theorizes that the energy surface hosts a very large number of conformations, and the one that fits best is selected, with subsequent minor induced fit optimization, largely by side chains. AlphaFold exploits AI to make template-free predictions of protein structures from their sequences, equipping biologists with structures with good resolution. The predictions that it yields, like those obtained by homology modeling, are rigid. Flexibility is implicitly captured by the absence of, or low confidence levels of predicted structure for certain regions, as in the case of intrinsically disordered proteins. Thus, computational methods once relegated to the periphery of biology, are now at the forefront, driving “the second molecular biology revolution”. AlphaFold can drive breakthroughs in fundamental problems in the life sciences, including precision medicine, with promise to transform research and accelerate drug discovery. It is driven by deep learning innovations, which appear poised to transform MD simulations.

Appications of Artificial Intelligence and Machine Language

AI and Machine Language in Simulations

Machine learning for molecular simulations—tools, strategies, and principles—have been reviewed recently.[186] As can be seen from this excellent review, machine learning has already been making a significant impact on the development of approximate methods for complex atomic systems. The innovation in the development and integration of MD simulations with deep learning can reproduce, interpret, predict, and generate data relating to the behavior of biological macromolecules.[187−192] Deep learning methods can help MD simulations excel in their efficiency and scales, with AI bridging between deep learning technologies and simulations. Challenges toward broad usage include smooth connection of AI and MD and automation of workflows. These could popularize novel deep learning tools in MD simulations toward efficiently exploiting both powerful methods. The number of publications in this area has been skyrocketing, emphasizing the recognition of the potency of AI and machine learning in simulation. As an example, MD simulations need to perform extensive sampling of the conformational space that require long time scales. Deep learning involving, e.g., variational autoencoders have been shown to be useful. The learned latent space in the variational autoencoders has been employed to generate unsampled protein conformations, and simulations starting from these conformations accelerated the sampling.[193] In another example, a deep learning framework with mixed classical and machine learning potentials (TorchMD) has been developed for molecular simulations.[188] The review cited above provides additional diverse examples. Deep learning has also been already exploited in structural modeling and design[60,194−196] and analysis[191] and linking these to function.[197]

AI and Machine Language in Prediction of Pathogen–Human Host PPIs

AI and deep learning are also being developed and applied to experimental determination and prediction of macromolecular structures,[198] as well as to PPIs.[199,200] Applications of AI approaches to human–microbiome protein–protein interactions have also been reviewed recently.[66] These interactions play important roles in human health and disease. There is a rapid increase of data that microbes, bacteria, and viruses impact human health. They can modulate human signaling and immune response by interacting with the human proteins. To decipher this modulation, it is important to identify the specific interactions, the human host proteins that are involved, and the structure of the complex. Identification of the interactions along with their structural details at atomic resolution permit understanding the mechanisms involved in pathogen survival and assist in drug discovery targeting these interactions. The interactions help the pathogens to elude and bypass the immune defense, with the pathogens hijacking host signaling. Mechanistically, pathogen proteins can have surfaces which resemble those of the host, allowing them to mimic and compete with host protein interactions (Figure ). They bind to the host protein and rewire its physiological signaling. Data, including structural details, are scarce and large-scale experimental detection is challenging. Efficient and robust computational strategies to predict the interactions is thus vital. We have developed an algorithm and server to predict these human host–microbial PPIs (HMIs) based on their protein structures, which can be experimental or modeled. In large scale applications, AlphaFold can now be used toward this aim. Machine learning permits both the large-scale efficient and generalizable application and addressing the complex dynamics of such relationships that the machine learning algorithms can decipher.
Figure 3

Human–microbiome PPIs promote GTPase activation. Human cell division control protein 42 homologue (Cdc42) is a small GTPase of the Rho family, involved in cell cycle. In human cells, it is activated by guanine-nucleotide exchange factors (GEFs), such as DOCK9 (PDB ID: 2WMO), by transforming the inactive GDP-bound to the active GTP-bound forms.[207] Bacterial secretes toxins or effectors mimicking the GEF proteins, such as SopE (PDB: 1GZS) from Salmonella[208] and MAP (PDB: 3GCG) from Escherichia coli,[209] can interact with Cdc42 and activate it. The interaction surfaces of these bacterial GEF mimicries resemble the host protein, allowing them to mimic and compete with the host protein interactions. PPIs, protein–protein interactions; HMIs, host–(or human−) microbiome interactions. Ongoing work incorporates AI into the HMI prediction algorithm. If the structures of the human or microbe are unable, AlphaFold can generate them.

Human–microbiome PPIs promote GTPase activation. Human cell division control protein 42 homologue (Cdc42) is a small GTPase of the Rho family, involved in cell cycle. In human cells, it is activated by guanine-nucleotide exchange factors (GEFs), such as DOCK9 (PDB ID: 2WMO), by transforming the inactive GDP-bound to the active GTP-bound forms.[207] Bacterial secretes toxins or effectors mimicking the GEF proteins, such as SopE (PDB: 1GZS) from Salmonella[208] and MAP (PDB: 3GCG) from Escherichia coli,[209] can interact with Cdc42 and activate it. The interaction surfaces of these bacterial GEF mimicries resemble the host protein, allowing them to mimic and compete with the host protein interactions. PPIs, protein–protein interactions; HMIs, host–(or human−) microbiome interactions. Ongoing work incorporates AI into the HMI prediction algorithm. If the structures of the human or microbe are unable, AlphaFold can generate them. Challenges in machine learning for PPI prediction relate to both data and method. With limited microbial but not human data, microbial sample sizes are small. In sequence-based algorithms, the dimensionality problem can be pronounced, where the difficulty exponentially grows as the feature size increases. Principal component analysis (PCA), uniform manifold approximation and projection (UMAP), or autoencoders can be used to embed the samples into lower-dimensional spaces,[201,202] and preprocessing and postprocessing pipelines can be employed for other data. In structure-based methods the problems may relate to the quantity and diversity of the representation. Data relating to host-microbe PPIs with 3D structures are sparse, thus facing a problem in training and evaluating the computational methods. Additional problems involve lack of gold standard test data set. Evaluation metrics are also unclear, the PPI networks are sparse, and more.[66] DeepMind’s AlphaFold2 success in sequence-based protein structure prediction,[4] as well as the RoseTTAFold[70] open-source counterpart, and the publicly available AlphaFold2 prediction of all human proteins[203] are major steps that benefit the scientific community.

Conclusions

AI and machine learning are appending projects. They are applied in diverse applications, including biological networks.[204] They impact disease biology, drug discovery, microbiome research, and synthetic biology. They also evolved a machine learning pipeline for molecular complex detection in protein-interaction networks,[205] as well as the relevance of major signaling pathways in cancer survival.[46] Here, we briefly overviewed the immense impact of AlphaFold, and of AI in structural biology, with some examples. We highlighted what AlphaFold can and cannot accomplish and why. Allosteric mechanisms fall into the latter category. Nevertheless, through MD simulations of models that AlphaFold produces, this aim can be accomplished as well. Still, however, even though simulations would address this dynamics problem, at such scales, the cost is prohibitive. A paradigm-shifting machine learning method is needed to model protein dynamics. AlphaFold and its underlying deep learning innovations have opened up the next frontiers in protein science,[164] including precision medicine.[206] Protein structures connect to cell biology, chemistry, biophysics, and medicine. To date, over 180 000 protein structures are available, open to all researchers across the world in the PDB database. Still, structures of pathogens are not within them, and neither are many others, which are essential to human health. The resource is now there, and with growing computational power, eventually, these will be there as well. Nonetheless, the availability of the structures is insufficient. For us, as biophysicists, the key is what are the significant questions to ask. What should the focus of our research be, such that we do not repeat what was so well done but instead exploit the new capabilities to ask the really important questions.
  196 in total

Review 1.  Driver and passenger mutations in cancer.

Authors:  Julia R Pon; Marco A Marra
Journal:  Annu Rev Pathol       Date:  2014-10-17       Impact factor: 23.472

Review 2.  Allostery without a conformational change? Revisiting the paradigm.

Authors:  Ruth Nussinov; Chung-Jung Tsai
Journal:  Curr Opin Struct Biol       Date:  2014-12-11       Impact factor: 6.809

Review 3.  Artificial intelligence techniques for integrative structural biology of intrinsically disordered proteins.

Authors:  Arvind Ramanathan; Heng Ma; Akash Parvatikar; S Chakra Chennubhotla
Journal:  Curr Opin Struct Biol       Date:  2021-01-06       Impact factor: 6.809

4.  'It will change everything': DeepMind's AI makes gigantic leap in solving protein structures.

Authors:  Ewen Callaway
Journal:  Nature       Date:  2020-12       Impact factor: 49.962

5.  Learning context-aware structural representations to predict antigen and antibody binding interfaces.

Authors:  Srivamshi Pittala; Chris Bailey-Kellogg
Journal:  Bioinformatics       Date:  2020-07-01       Impact factor: 6.937

6.  Artificial Intelligence for Biology.

Authors:  Soha Hassoun; Felicia Jefferson; Xinghua Shi; Brian Stucky; Jin Wang; Epaminondas Rosa
Journal:  Integr Comp Biol       Date:  2022-02-05       Impact factor: 3.326

7.  HMI-PRED: A Web Server for Structural Prediction of Host-Microbe Interactions Based on Interface Mimicry.

Authors:  Emine Guven-Maiorov; Asma Hakouz; Sukejna Valjevac; Ozlem Keskin; Chung-Jung Tsai; Attila Gursoy; Ruth Nussinov
Journal:  J Mol Biol       Date:  2020-02-13       Impact factor: 5.469

Review 8.  A primer on deep learning in genomics.

Authors:  James Zou; Mikael Huss; Abubakar Abid; Pejman Mohammadi; Ali Torkamani; Amalio Telenti
Journal:  Nat Genet       Date:  2018-11-26       Impact factor: 38.330

9.  Can AlphaFold2 predict the impact of missense mutations on structure?

Authors:  Gwen R Buel; Kylie J Walters
Journal:  Nat Struct Mol Biol       Date:  2022-01       Impact factor: 15.369

Review 10.  Protein design via deep learning.

Authors:  Wenze Ding; Kenta Nakai; Haipeng Gong
Journal:  Brief Bioinform       Date:  2022-05-13       Impact factor: 13.994

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.