Literature DB >> 20483814

Prediction of protein-RNA binding sites by a random forest method with combined features.

Zhi-Ping Liu1, Ling-Yun Wu, Yong Wang, Xiang-Sun Zhang, Luonan Chen.   

Abstract

MOTIVATION: Protein-RNA interactions play a key role in a number of biological processes, such as protein synthesis, mRNA processing, mRNA assembly, ribosome function and eukaryotic spliceosomes. As a result, a reliable identification of RNA binding site of a protein is important for functional annotation and site-directed mutagenesis. Accumulated data of experimental protein-RNA interactions reveal that a RNA binding residue with different neighbor amino acids often exhibits different preferences for its RNA partners, which in turn can be assessed by the interacting interdependence of the amino acid fragment and RNA nucleotide.
RESULTS: In this work, we propose a novel classification method to identify the RNA binding sites in proteins by combining a new interacting feature (interaction propensity) with other sequence- and structure-based features. Specifically, the interaction propensity represents a binding specificity of a protein residue to the interacting RNA nucleotide by considering its two-side neighborhood in a protein residue triplet. The sequence as well as the structure-based features of the residues are combined together to discriminate the interaction propensity of amino acids with RNA. We predict RNA interacting residues in proteins by implementing a well-built random forest classifier. The experiments show that our method is able to detect the annotated protein-RNA interaction sites in a high accuracy. Our method achieves an accuracy of 84.5%, F-measure of 0.85 and AUC of 0.92 prediction of the RNA binding residues for a dataset containing 205 non-homologous RNA binding proteins, and also outperforms several existing RNA binding residue predictors, such as RNABindR, BindN, RNAProB and PPRint, and some alternative machine learning methods, such as support vector machine, naive Bayes and neural network in the comparison study. Furthermore, we provide some biological insights into the roles of sequences and structures in protein-RNA interactions by both evaluating the importance of features for their contributions in predictive accuracy and analyzing the binding patterns of interacting residues. AVAILABILITY: All the source data and code are available at http://www.aporc.org/doc/wiki/PRNA or http://www.sysbio.ac.cn/datatools.asp CONTACT: lnchen@sibs.ac.cn SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Mesh:

Substances:

Year:  2010        PMID: 20483814     DOI: 10.1093/bioinformatics/btq253

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


  56 in total

1.  Highly accurate and high-resolution function prediction of RNA binding proteins by fold recognition and binding affinity prediction.

Authors:  Huiying Zhao; Yuedong Yang; Yaoqi Zhou
Journal:  RNA Biol       Date:  2011-11-01       Impact factor: 4.652

Review 2.  Proteome-wide prediction of protein-protein interactions from high-throughput data.

Authors:  Zhi-Ping Liu; Luonan Chen
Journal:  Protein Cell       Date:  2012-06-22       Impact factor: 14.870

3.  Sequence conservation in the prediction of catalytic sites.

Authors:  Yongchao Dou; Xingbo Geng; Hongyun Gao; Jialiang Yang; Xiaoqi Zheng; Jun Wang
Journal:  Protein J       Date:  2011-04       Impact factor: 2.371

4.  Protein-protein interaction sites prediction by ensemble random forests with synthetic minority oversampling technique.

Authors:  Xiaoying Wang; Bin Yu; Anjun Ma; Cheng Chen; Bingqiang Liu; Qin Ma
Journal:  Bioinformatics       Date:  2019-07-15       Impact factor: 6.937

5.  Methods for Molecular Modelling of Protein Complexes.

Authors:  Tejashree Rajaram Kanitkar; Neeladri Sen; Sanjana Nair; Neelesh Soni; Kaustubh Amritkar; Yogendra Ramtirtha; M S Madhusudhan
Journal:  Methods Mol Biol       Date:  2021

6.  Individually double minimum-distance definition of protein-RNA binding residues and application to structure-based prediction.

Authors:  Wen Hu; Liu Qin; Menglong Li; Xuemei Pu; Yanzhi Guo
Journal:  J Comput Aided Mol Des       Date:  2018-11-26       Impact factor: 3.686

7.  Incorporating significant amino acid pairs and protein domains to predict RNA splicing-related proteins with functional roles.

Authors:  Justin Bo-Kai Hsu; Kai-Yao Huang; Tzu-Ya Weng; Chien-Hsun Huang; Tzong-Yi Lee
Journal:  J Comput Aided Mol Des       Date:  2014-01-19       Impact factor: 3.686

8.  Prediction of nucleic acid binding probability in proteins: a neighboring residue network based score.

Authors:  Zhichao Miao; Eric Westhof
Journal:  Nucleic Acids Res       Date:  2015-05-04       Impact factor: 16.971

Review 9.  Random forests for genomic data analysis.

Authors:  Xi Chen; Hemant Ishwaran
Journal:  Genomics       Date:  2012-04-21       Impact factor: 5.736

Review 10.  Prediction of RNA binding proteins comes of age from low resolution to high resolution.

Authors:  Huiying Zhao; Yuedong Yang; Yaoqi Zhou
Journal:  Mol Biosyst       Date:  2013-10
View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.