Literature DB >> 24209914

Predicting protein-binding RNA nucleotides using the feature-based removal of data redundancy and the interaction propensity of nucleotide triplets.

Sungwook Choi1, Kyungsook Han.   

Abstract

Several learning approaches have been used to predict RNA-binding amino acids in a protein sequence, but there has been little attempt to predict protein-binding nucleotides in an RNA sequence. One of the reasons is that the differences between nucleotides in their interaction propensity are much smaller than those between amino acids. Another reason is that RNA exhibits less diverse sequence patterns than protein. Therefore, predicting protein-binding RNA nucleotides is much harder than predicting RNA-binding amino acids. We developed a new method that removes data redundancy in a training set of sequences based on their features. The new method constructs a larger and more informative training set than the standard redundancy removal method based on sequence similarity, and the constructed dataset is guaranteed to be redundancy-free. We computed the interaction propensity (IP) of nucleotide triplets by applying a new definition of IP to an extensive dataset of protein-RNA complexes, and developed a support vector machine (SVM) model to predict protein binding sites in RNA sequences. In a 5-fold cross-validation with 812 RNA sequences, the SVM model predicted protein-binding nucleotides with an accuracy of 86.4%, an F-measure of 84.8%, and a Matthews correlation coefficient of 0.66. With an independent dataset of 56 RNA sequences that were not used in training, the resulting accuracy was 68.1% with an F-measure of 71.7% and a Matthews correlation coefficient of 0.35. To the best of our knowledge, this is the first attempt to predict protein-binding RNA nucleotides in a given RNA sequence from the sequence data alone. The SVM model and datasets are freely available for academics at http://bclab.inha.ac.kr/primer.
Copyright © 2013 Elsevier Ltd. All rights reserved.

Keywords:  Data redundancy removal; Interaction propensity; Protein-binding nucleotide; Protein–RNA interaction

Mesh:

Substances:

Year:  2013        PMID: 24209914     DOI: 10.1016/j.compbiomed.2013.08.011

Source DB:  PubMed          Journal:  Comput Biol Med        ISSN: 0010-4825            Impact factor:   4.589


  6 in total

1.  A MOTIF-BASED METHOD FOR PREDICTING INTERFACIAL RESIDUES IN BOTH THE RNA AND PROTEIN COMPONENTS OF PROTEIN-RNA COMPLEXES.

Authors:  Usha Muppirala; Benjamin A Lewis; Carla M Mann; Drena Dobbs
Journal:  Pac Symp Biocomput       Date:  2016

2.  PNImodeler: web server for inferring protein-binding nucleotides from sequence data.

Authors:  Jinyong Im; Narankhuu Tuvshinjargal; Byungkyu Park; Wook Lee; De-Shuang Huang; Kyungsook Han
Journal:  BMC Genomics       Date:  2015-01-29       Impact factor: 3.969

3.  RPI-Bind: a structure-based method for accurate identification of RNA-protein binding sites.

Authors:  Jiesi Luo; Liang Liu; Suresh Venkateswaran; Qianqian Song; Xiaobo Zhou
Journal:  Sci Rep       Date:  2017-04-04       Impact factor: 4.379

4.  Predicting protein-binding regions in RNA using nucleotide profiles and compositions.

Authors:  Daesik Choi; Byungkyu Park; Hanju Chae; Wook Lee; Kyungsook Han
Journal:  BMC Syst Biol       Date:  2017-03-14

5.  A boosting approach for prediction of protein-RNA binding residues.

Authors:  Yongjun Tang; Diwei Liu; Zixiang Wang; Ting Wen; Lei Deng
Journal:  BMC Bioinformatics       Date:  2017-12-01       Impact factor: 3.169

Review 6.  Comprehensive Survey and Comparative Assessment of RNA-Binding Residue Predictions with Analysis by RNA Type.

Authors:  Kui Wang; Gang Hu; Zhonghua Wu; Hong Su; Jianyi Yang; Lukasz Kurgan
Journal:  Int J Mol Sci       Date:  2020-09-19       Impact factor: 5.923

  6 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.