Ermal Elbasani1, Soualihou Ngnamsie Njimbouom1, Tae-Jin Oh2,3,4, Eung-Hee Kim5, Hyun Lee1, Jeong-Dong Kim6,7. 1. Department of Computer Science and Engineering, Sun Moon University, Asan, 31460, South Korea. 2. Genome-Based BioIT Convergence Institute, Sun Moon University, Asan, 31460, South Korea. 3. Department of Pharmaceutical Engineering and Biotechnology, Sun Moon University, Asan, 31460, South Korea. 4. Department of BT-Convergent Pharmaceutical Engineering, Sun Moon University, Asan, 31460, South Korea. 5. Department of Artificial Intelligence and Software Technology, Sun Moon University, Asan, 31460, South Korea. 6. Department of Computer Science and Engineering, Sun Moon University, Asan, 31460, South Korea. kjdvhu@gmail.com. 7. Genome-Based BioIT Convergence Institute, Sun Moon University, Asan, 31460, South Korea. kjdvhu@gmail.com.
Abstract
BACKGROUND: Compound-protein interaction prediction is necessary to investigate health regulatory functions and promotes drug discovery. Machine learning is becoming increasingly important in bioinformatics for applications such as analyzing protein-related data to achieve successful solutions. Modeling the properties and functions of proteins is important but challenging, especially when dealing with predictions of the sequence type. RESULT: We propose a method to model compounds and proteins for compound-protein interaction prediction. A graph neural network is used to represent the compounds, and a convolutional layer extended with a bidirectional recurrent neural network framework, Long Short-Term Memory, and Gate Recurrent unit is used for protein sequence vectorization. The convolutional layer captures regulatory protein functions, while the recurrent layer captures long-term dependencies between protein functions, thus improving the accuracy of interaction prediction with compounds. A database of 7000 sets of annotated compound protein interaction, containing 1000 base length proteins is taken into consideration for the implementation. The results indicate that the proposed model performs effectively and can yield satisfactory accuracy regarding compound protein interaction prediction. CONCLUSION: The performance of GCRNN is based on the classification accordiong to a binary class of interactions between proteins and compounds The architectural design of GCRNN model comes with the integration of the Bi-Recurrent layer on top of CNN to learn dependencies of motifs on protein sequences and improve the accuracy of the predictions.
BACKGROUND: Compound-protein interaction prediction is necessary to investigate health regulatory functions and promotes drug discovery. Machine learning is becoming increasingly important in bioinformatics for applications such as analyzing protein-related data to achieve successful solutions. Modeling the properties and functions of proteins is important but challenging, especially when dealing with predictions of the sequence type. RESULT: We propose a method to model compounds and proteins for compound-protein interaction prediction. A graph neural network is used to represent the compounds, and a convolutional layer extended with a bidirectional recurrent neural network framework, Long Short-Term Memory, and Gate Recurrent unit is used for protein sequence vectorization. The convolutional layer captures regulatory protein functions, while the recurrent layer captures long-term dependencies between protein functions, thus improving the accuracy of interaction prediction with compounds. A database of 7000 sets of annotated compound protein interaction, containing 1000 base length proteins is taken into consideration for the implementation. The results indicate that the proposed model performs effectively and can yield satisfactory accuracy regarding compound protein interaction prediction. CONCLUSION: The performance of GCRNN is based on the classification accordiong to a binary class of interactions between proteins and compounds The architectural design of GCRNN model comes with the integration of the Bi-Recurrent layer on top of CNN to learn dependencies of motifs on protein sequences and improve the accuracy of the predictions.
Authors: Derek Jones; Hyojin Kim; Xiaohua Zhang; Adam Zemla; Garrett Stevenson; W F Drew Bennett; Daniel Kirshner; Sergio E Wong; Felice C Lightstone; Jonathan E Allen Journal: J Chem Inf Model Date: 2021-03-23 Impact factor: 4.956