| Literature DB >> 34665257 |
Romanos Fasoulis1, Georgios Paliouras2, Lydia E Kavraki1.
Abstract
The field of structural proteomics, which is focused on studying the structure-function relationship of proteins and protein complexes, is experiencing rapid growth. Since the early 2000s, structural databases such as the Protein Data Bank are storing increasing amounts of protein structural data, in addition to modeled structures becoming increasingly available. This, combined with the recent advances in graph-based machine-learning models, enables the use of protein structural data in predictive models, with the goal of creating tools that will advance our understanding of protein function. Similar to using graph learning tools to molecular graphs, which currently undergo rapid development, there is also an increasing trend in using graph learning approaches on protein structures. In this short review paper, we survey studies that use graph learning techniques on proteins, and examine their successes and shortcomings, while also discussing future directions.Entities:
Keywords: deep learning; graph learning; graphs; machine learning; protein structure; proteomics
Mesh:
Substances:
Year: 2021 PMID: 34665257 PMCID: PMC8786289 DOI: 10.1042/ETLS20210225
Source DB: PubMed Journal: Emerg Top Life Sci ISSN: 2397-8554
Fig. 1.Graphical representation of the binding interface of HLA-B*52:01 (PDB: 3W39, peptide removed).
Graphs are shown at an amino-acid level, each color corresponding to a different amino acid, and geometrical information is not expressed in 3D coordinates, but as relations/edges in the graph. (A) The 3D cartoon model, highlighted in yellow color, while the surface is shown in white color. (B) The 3D all-atom model, represented as sticks. (C) The distance graph, where each edge between two residues denotes that their actual distance is smaller than a cutoff (here equal to 6 Å). For calculating pairwise distances, -carbon atoms are used as centroids. (D) The -NN graph, where each residue is forced to connect to its closest residue neighbors. (E) The Delaunay graph, created using Delaunay triangulation. (F) The intramolecular graph; each edge denotes a chemical bond (covalent/non-covalent).
Fig. 2.Convolution operations performed on a protein subgraph.
(A) The residue of interest here is GLY16, and represents its biochemical features. (B) The first GNN layer. Biochemical features from residues belonging to GLY16’s immediate neighbors are aggregated and transformed in order to calculate the embedding , injecting topological information in the process. (C) Multiple convolutional operations can be used in series to take into account information from distant neighbors. This however can potentially cause oversmoothing on the GLY16 embedding [42].
List of recent graph-learning-based methods for structural proteomics tasks
| Category | Study | Graph type | Edge type | GNN layer type | Year | Reference |
|---|---|---|---|---|---|---|
| Protein–ligand prediction | GraphBAR | Atom graph (protein–ligand) | Distance edges (protein–ligand) | GCN-based | 2021 | [ |
| Lim et al. | Atom graph (protein–ligand) + atom graph (protein) + atom graph (ligand) | Distance edges (protein–ligand) + chemical bonds (protein) + chemical bonds (ligand) | GAT-based [ | 2019 | [ | |
| Torng et al. | Residue graph (protein) + atom graph (ligand) | Distance edges (protein) + chemical bonds (ligand) | GAE-based [ | 2019 | [ | |
| DGraphDTA | Residue graph (protein) + atom graph (ligand) | Distance edges (protein) + chemical bonds (ligand) | GCN-based | 2020 | [ | |
| GEFA | Residue graph (protein) + atom graph (ligand) | Distance edges (protein) + chemical bonds (ligand) | GCN-based + skip connections [ | 2020 | [ | |
| Binding site identification | Fout et al. | Residue graph (proteins) | GCN-based + edge features | 2017 | [ | |
| PECAN | Residue graph (proteins) | Distance edges (proteins) | GCN-based | 2020 | [ | |
| PepNN | Residue graph (proteins) | Graph transformer [ | 2021 | [ | ||
| Docking scoring | EGCN | Residue graph (protein–ligand) | Distance edges (protein–ligand) | GCN-based + edge features | 2020 | [ |
| GNN-DOVE | Atom graph (protein–ligand) + atom graph (protein) + atom graph (ligand) | Distance edges (protein–ligand) + chemical bonds (protein) + chemical bonds (ligand) | GAT-based | 2021 | [ | |
| InterPepRank | Residue graph (protein–ligand) | Distance edges (protein–ligand) + chemical bonds (protein–ligand) | Simonovsky et al. [ | 2020 | [ | |
| Protein model quality assessment | GraphQA | Residue graph (protein) | Distance edges (protein) + covalent bonds (protein) | Graph Nets [ | 2021 | [ |
| ProteinGCN | Atom graph (protein) | Xie et al. [ | 2020 | [ | ||
| VoroCNN | Atom graph (protein) | Voronota [ | GCN-based | 2021 | [ | |
| S-GCN | Residue graph (protein) | Voronota edges (protein) | Custom | 2020 | [ | |
| Protein function prediction | DeepFRI | Residue graph (protein) | Distance edges (protein) | GAT-based | 2021 | [ |
| PersGNN | Residue graph (protein) | Distance edges (protein) | GCN-based | 2020 | [ | |
| Gelman et al. | Residue graph (protein) | Distance edges (protein) | GCN-based | 2021 | [ | |
| Protein design | ProteinSolver | Residue graph (protein) | Distance edges (protein) | Wang et al. [ | 2020 | [ |
| MimNet | Residue graph (protein) | Distance edges (protein) | GCN-based + U-net [ | 2021 | [ | |
| Ingraham et al. | Residue graph (protein) | Graph transformer | 2019 | [ |
Fig. 3.Protein–ligand interaction studies using GNNs.
(A) Given a sufficient amount of protein–ligand structures, the graph of the whole complex can be given as an input to a GNN architecture. (B) Even when peptide–ligand complex data are not available, the protein complex and the ligand sequence can be processed as separate entities, retaining structural information during learning.
Fig. 4.Proof-of-concept pipeline for protein-peptide docking scoring using GNNs.
The filtering and ranking system, based on graph message passing GNN modules, takes graph conformations as input, and lists the top scoring ones.