| Literature DB >> 35181656 |
Gabriele Orlando1,2, Daniele Raimondi3, Ramon Duran-Romaña1,2, Yves Moreau3, Joost Schymkowitz4,5, Frederic Rousseau6,7.
Abstract
Structural bioinformatics suffers from the lack of interfaces connecting biological structures and machine learning methods, making the application of modern neural network architectures impractical. This negatively affects the development of structure-based bioinformatics methods, causing a bottleneck in biological research. Here we present PyUUL ( https://pyuul.readthedocs.io/ ), a library to translate biological structures into 3D tensors, allowing an out-of-the-box application of state-of-the-art deep learning algorithms. The library converts biological macromolecules to data structures typical of computer vision, such as voxels and point clouds, for which extensive machine learning research has been performed. Moreover, PyUUL allows an out-of-the box GPU and sparse calculation. Finally, we demonstrate how PyUUL can be used by researchers to address some typical bioinformatics problems, such as structure recognition and docking.Entities:
Mesh:
Year: 2022 PMID: 35181656 PMCID: PMC8857184 DOI: 10.1038/s41467-022-28327-3
Source DB: PubMed Journal: Nat Commun ISSN: 2041-1723 Impact factor: 14.919
Fig. 1Overview on the functionalities of PyUUL.
A Different 3D data representations available using PyUUL of bovine pancreatic trypsin inhibitor (PDB ID: 1BPI (10.2210/pdb1BPI/pdb)). From left to right: voxel representation (Subfigure 1), surface point-cloud representation (Subfigure 2), and volumetric point-cloud representation (Subfigure 3). B Schematic illustration of how the multichannel voxel representation works using PyUUL. The input 3D structure of a protein is represented in a voxel grid, where each channel shows the occupancy of nitrogen, carbon, and oxygen atoms. This 3-channel voxel representation can be used as input for a 3D convolutional neural network that iteratively samples the entire protein volume and processes this data to address problems such as protein pattern recognition or protein-ligand interaction learning.
Fig. 2Application of PyUUL to structural bioinformatics problems.
A The learning process of a 3D convolutional neural networks on a structural biology task: the picture shows the training procedure of a neural network on a voxel representation of a protein. The network iteratively learns to recognize alpha helices. The voxels that are predicted to be part of an alpha helix at each leaning stage are highlighted in gray. Supplementary Movie 1 shows the complete evolution of the network during training. B The learning process of a 3D Siamese NN trained with metric learning. The network assigns a 2D vector (two latent features) to every protein in accordance to its 3D representation obtained with PyUUL. Each point represents a protein and each color a different fold. With the training, it optimizes this signature in a way that proteins belonging to the same protein class have a similar signature. Supplementary Movie 2 shows the complete evolution of the network during training. C Optimization of the conformation of GTP. The figure shows a neural network optimizing the conformation of GTP in its binding pocket. This is done first training a network to recognize original GTP-protein complexes from the ones in which GTP has been randomly rotated, defining a scoring function that describes how likely the protein-ligand complex is to be the optimal one. The parameters of the network are then frozen and the network is asked to reposition GTP to the right position in the pocket of their respective protein. This protein-ligand optimization can be done simply calculating the gradient of the scoring function with respect to the GTP coordinates. Supplementary Movie 3 shows the complete evolution of the network during training.