| Literature DB >> 30800250 |
Mehdi Poursheikhali Asghari1, Parviz Abdolmaleki1.
Abstract
BACKGROUND: Nucleic acid-binding proteins play major roles in different biological processes, such as transcription, splicing and translation. Therefore, the nucleic acid-binding function prediction of proteins is a step toward full functional annotation of proteins. The aim of our research was the improvement of nucleic-acid binding function prediction.Entities:
Keywords: DNA-binding proteins; Machine-learning algorithms; RNA-binding proteins
Year: 2019 PMID: 30800250 PMCID: PMC6359699
Source DB: PubMed Journal: Avicenna J Med Biotechnol ISSN: 2008-2835
Performance measures of nine different classification algorithms applied on the RNA-binding protein chains and Ctrl chains, in a LOOCV analysis
| 0.828 | 0.969 | 0.944 | 0.996 | 0.941 | |
| 0.840 | 0.971 | 0.951 | 0.993 | 0.945 | |
| 0.786 | 0.969 | 0.939 | 1.000 | 0.939 | |
| 0.836 | 0.971 | 0.949 | 0.993 | 0.943 | |
| 0.811 | 0.968 | 0.941 | 0.997 | 0.939 | |
| 0.819 | 0.969 | 0.941 | 1.000 | 0.940 | |
| 0.968 | 0.939 | 1.000 | 0.939 | ||
| 0.819 | 0.968 | 0.938 | 1.000 | 0.938 | |
| 0.699 | 0.969 | 0.942 | 0.998 | 0.940 | |
| 0.780 | 0.370 | 0.310 | 0.450 | 0.910 |
Alternating Decision Tree;
K-Nearest Neighbor;
L1 Regularized Logistic Regression;
L2 Regularized Logistic Regression;
Multilayer Perceptron Classifier;
Random Forest;
Radial Basis Function Classifier;
Sequential Minimal Optimization;
Neural Network;
Area Under the receiver operating characteristic Curve;
Leave-One-Out Cross-Validation;
Data obtained from Ahmad and Sarai work 53.
Performance measures of nine different classification algorithms applied on the DNA-binding protein chains and Ctrl chains, in a LOOCV procedure
| 0.816 | 0.977 | 0.957 | 0.998 | 0.956 | |
| 0.829 | 0.972 | 0.945 | 1.000 | 0.945 | |
| 0.838 | 0.972 | 0.949 | 0.997 | 0.947 | |
| 0.842 | 0.972 | 0.949 | 0.997 | 0.946 | |
| 0.846 | 0.972 | 0.945 | 1.000 | 0.945 | |
| 0.824 | 0.972 | 0.946 | 0.999 | 0.946 | |
| 0.852 | 0.972 | 0.949 | 0.997 | 0.946 | |
| 0.812 | 0.978 | 0.957 | 1.000 | 0.958 | |
| 0.832 | 0.972 | 0.945 | 1.000 | 0.945 | |
| 0.720 | 0.220 | 0.200 | 0.260 | 0.900 |
Alternating Decision Tree;
K-Nearest Neighbor;
L1 Regularized Logistic Regression;
L2 Regularized Logistic Regression;
Multilayer Perceptron Classifier;
Random Forest;
Radial Basis Function Classifier;
Sequential Minimal Optimization;
Neural Network;
Area Under the receiver operating characteristic Curve;
Leave-One-Out Cross-Validation;
Data obtained from Ahmad and Sarai work 53.
Performance measures of nine different classification algorithms applied on the RNA-binding protein chains and DNA-binding protein chains, in a LOOCV process
| 0.575 | 0.715 | 0.614 | 0.856 | 0.640 | |
| 0.609 | 0.699 | 0.541 | 0.988 | 0.551 | |
| 0.605 | 0.699 | 0.539 | 0.994 | 0.548 | |
| 0.607 | 0.695 | 0.546 | 0.956 | 0.558 | |
| 0.701 | 0.557 | 0.944 | 0.574 | ||
| 0.546 | 0.697 | 0.553 | 0.944 | 0.568 | |
| 0.615 | 0.699 | 0.566 | 0.913 | 0.584 | |
| 0.495 | 0.696 | 0.533 | 1.000 | 0.538 | |
| 0.607 | 0.691 | 0.528 | 1.000 | 0.528 | |
| 0.580 | 0.690 | 0.530 | 1.000 | 0.530 |
Alternating Decision Tree;
K-Nearest Neighbor;
L1 Regularized Logistic Regression;
L2 Regularized Logistic Regression;
Multilayer Perceptron Classifier;
Random Forest;
Radial Basis Function Classifier;
Sequential Minimal Optimization;
Neural Network;
Area Under the receiver operating characteristic Curve;
Leave-One-Out Cross-Validation;
Data obtained from Ahmad and Sarai work 53.
Figure 1.ROC curves of nine machine-learning algorithms employed on RNA-binding protein chains versus ctrl protein chains dataset (consisting of 2601 protein chains) using the LOOCV test. Abbreviations: ADTree, Alternating Decision Tree; K-NN, K-Nearest Neighbor; L1 RLR, L1 Regularized Logistic Regression; L2 RLR, L2 Regularized Logistic Regression; MLPClassifier, Multilayer Perceptron Classifier; RBFClassifier, Radial Basis Function Classifier; SMO, Sequential Minimal Optimization; LOOCV, Leave-One-Out Cross-Validation.
Figure 3.ROC curves of nine machine-learning algorithms employed on RNA-binding protein chains versus DNA-binding protein chains dataset (consisting of 303 protein chains) using the LOOCV test. Abbreviations: ADTree, Alternating Decision Tree; K-NN, K-Nearest Neighbor; L1 RLR, L1 Regularized Logistic Regression; L2 RLR, L2 Regularized Logistic Regression; MLPClassifier, Multilayer Perceptron Classifier; RBFClassifier, Radial Basis Function Classifier; SMO, Sequential Minimal Optimization; LOOCV, Leave-One-Out Cross-Validation.
Figure 4.The overall workflow for practical implementation. Firstly, the query protein is represented numerically by three kinds of features. Secondly, the first round of the classification is done using the best-selected classifier trained on combined full dataset (i.e. the RBFClassifier). Thirdly, if the function of query protein was predicted as nucleic acid-binding (or RNA/DNA-binding), the second round of the classification is attempted based on the best-selected classifier trained on the nucleic acid-binding proteins dataset (i.e. the MLPClassifier). The final predicted function identifies the query protein as either RNA-binding or DNA-binding. Abbreviations: MLPClassifier, Multilayer Perceptron Classifier; RBFClassifier, Radial Basis Function Classifier.