| Literature DB >> 32348325 |
Chi-Hua Tung1, Ching-Hsuan Chien2,3, Chi-Wei Chen3,4, Lan-Ying Huang3, Yu-Nan Liu3, Yen-Wei Chu3,5,6,7,8,9.
Abstract
Many proteins exist in natures as oligomers with various quaternary structural attributes rather than as single chains. Predicting these attributes is an essential task in computational biology for the advancement of proteomics. However, the existing methods do not consider the integration of heterogeneous coding and the accuracy of subunit categories with limited data. To this end, we proposed a tool that can predict more than 12 subunit protein oligomers, QUATgo. Meanwhile, three kinds of sequence coding were used, including dipeptide composition, which was used for the first time to predict protein quaternary structural attributes, and protein half-life characteristics, and we modified the coding method of the functional domain composition proposed by predecessors to solve the problem of large feature vectors. QUATgo solves the problem of insufficient data for a single subunit using a two-stage architecture and uses 10-fold cross-validation to test the predictive accuracy of the classifier. QUATgo has 49.0% cross-validation accuracy and 31.1% independent test accuracy. In the case study, the accuracy of QUATgo can reach 61.5% for predicting the quaternary structure of influenza virus hemagglutinin proteins. Finally, QUATgo is freely accessible to the public as a web server via the site http://predictor.nchu.edu.tw/QUATgo.Entities:
Year: 2020 PMID: 32348325 PMCID: PMC7190164 DOI: 10.1371/journal.pone.0232087
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Data numbers of each subunit before and after CD-Hit processing.
| Before CD-hit | After CD-hit | |||
|---|---|---|---|---|
| Homomer | Heteromer | Homomer | Heteromer | |
| Monomer | 10461 | 10461 | 1815 | 1815 |
| 2 mer | 5995 | 993 | 2115 | 278 |
| 3 mer | 747 | 245 | 312 | 96 |
| 4 mer | 1907 | 486 | 655 | 156 |
| 5 mer | 64 | 7 | 63 | 7 |
| 6 mer | 474 | 91 | 201 | 91 |
| 7 mer | 20 | 9 | 20 | 9 |
| 8 mer | 170 | 51 | 170 | 51 |
| 9 mer | 0 | 32 | 0 | 32 |
| 10 mer | 53 | 20 | 53 | 20 |
| 11 mer | 4 | 1 | 4 | 1 |
| 12 mer | 98 | 78 | 98 | 78 |
| mt12 mer | 74 | 79 | 74 | 79 |
| Total | 20067 | 12553 | 5580 | 2713 |
* = more than 12 mer
Fig 1System flow.
The cross-validation results of each step 1 classifier compared with different learning algorithms.
| Algorithms | ACC |
|---|---|
| LIBSVM | 0.203 |
| Simple Logistic | 0.410 |
| Hoeffding | 0.343 |
| Random Forest | 0.468 |
| Random Tree | 0.351 |
| REPtree | 0.315 |
| Decision Stump | 0.101 |
| DecisionTable | 0.280 |
| LogitBoost | 0.381 |
| J48 | 0.350 |
| OneR | 0.243 |
The cross-validation results of each classifier in stage 1 with different feature selection methods.
| Selection method/classifier | Simple Logistic | LogitBoost | Random Forest |
|---|---|---|---|
| libsvm_fs | 0.388 | 0.379 | 0.486 |
| mid_maxrel | 0.373 | 0.383 | 0.458 |
| mid_mrmr | 0.235 | 0.340 | 0.391 |
| miq_maxrel | 0.373 | 0.383 | 0.458 |
| miq_mrmr | 0.391 | 0.388 | 0.485 |
| all_fs_rk | 0.479 | 0.376 | 0.386 |
| Weka attribute selection | 0.368 | 0.371 | 0.490 |
Predicting results of the best classifier of the stage 1 classifier with cross verification.
| Sn | Sp | Precision | ACC | MCC | |
|---|---|---|---|---|---|
| Monomer | 0.480 | 0.927 | 0.304 | 0.899 | 0.330 |
| Homo-2 mer | 0.320 | 0.945 | 0.281 | 0.906 | 0.250 |
| Homo-3 mer | 0.220 | 0.945 | 0.212 | 0.900 | 0.162 |
| Homo-4 mer | 0.060 | 0.960 | 0.091 | 0.904 | 0.024 |
| Homo-6 mer | 0.360 | 0.955 | 0.346 | 0.918 | 0.309 |
| Homo-8 mer | 0.480 | 0.971 | 0.522 | 0.940 | 0.469 |
| Homo-10 mer | 0.760 | 0.980 | 0.717 | 0.966 | 0.720 |
| Homo-12 mer | 0.660 | 0.975 | 0.635 | 0.955 | 0.623 |
| Homo-mt12 mer | 0.760 | 0.992 | 0.864 | 0.978 | 0.798 |
| Hetero-2 mer | 0.540 | 0.937 | 0.365 | 0.913 | 0.399 |
| Hetero-3 mer | 0.360 | 0.955 | 0.346 | 0.918 | 0.309 |
| Hetero-4 mer | 0.140 | 0.959 | 0.184 | 0.908 | 0.112 |
| Hetero-6 mer | 0.420 | 0.979 | 0.568 | 0.944 | 0.459 |
| Hetero-8 mer | 0.620 | 0.995 | 0.886 | 0.971 | 0.727 |
| Hetero-12 mer | 0.780 | 0.993 | 0.886 | 0.980 | 0.821 |
| Hetero-mt12 mer | 0.880 | 0.989 | 0.846 | 0.983 | 0.854 |
| Avg | 0.490 | 0.966 | 0.503 | 0.936 | 0.460 |
* = More than 12 mer
Predicting results of best classifier of stage 1 classifier with independent testing.
| Sn | Sp | Precision | ACC | MCC | |
|---|---|---|---|---|---|
| Monomer | 0.472 | 0.818 | 0.381 | 0.752 | 0.269 |
| Homo-2 mer | 0.218 | 0.882 | 0.304 | 0.755 | 0.114 |
| Homo-3 mer | 0.271 | 0.922 | 0.278 | 0.856 | 0.195 |
| Homo-4 mer | 0.154 | 0.955 | 0.450 | 0.802 | 0.174 |
| Homo-6 mer | 0.245 | 0.952 | 0.239 | 0.911 | 0.195 |
| Homo-8 mer | 0.483 | 0.951 | 0.320 | 0.929 | 0.358 |
| Homo-10 mer | 0.667 | 0.990 | 0.071 | 0.990 | 0.216 |
| Homo-12 mer | 0.500 | 0.963 | 0.202 | 0.954 | 0.298 |
| Homo-mt12 mer | 0.750 | 0.987 | 0.353 | 0.985 | 0.508 |
| Hetero-2 mer | 0.285 | 0.962 | 0.417 | 0.903 | 0.294 |
| Hetero-3 mer | 0.370 | 0.936 | 0.093 | 0.926 | 0.158 |
| Hetero-4 mer | 0.104 | 0.950 | 0.081 | 0.916 | 0.048 |
| Hetero-6 mer | 0.512 | 0.981 | 0.304 | 0.974 | 0.382 |
| Hetero-8 mer | 1.000 | 0.988 | 0.030 | 0.988 | 0.173 |
| Hetero-12 mer | 0.893 | 0.993 | 0.595 | 0.992 | 0.726 |
| Hetero mt12 mer | 0.776 | 0.995 | 0.731 | 0.990 | 0.748 |
| Avg | 0.481 | 0.951 | 0.303 | 0.914 | 0.303 |
* = More than 12 mer
The cross-validation results of each stage 2 classifier with each comparing learning algorithms.
| Algorithms | ACC |
|---|---|
| LIBSVM | 0.700 |
| Simple Logistic | 0.925 |
| Hoeffding | 0.950 |
| Random Forest | 0.950 |
| Random Tree | 0.950 |
| REPtree | 0.950 |
| Decision Stump | 0.950 |
| DecisionTable | 0.925 |
| LogitBoost | 0.950 |
| J48 | 0.950 |
| OneR | 0.900 |
The cross-validation results of each stage 2 classifier with different feature selection methods.
| 10-fold cross-validation | Hoeffding | Random Forest | Random Tree | REPtree | Decision stump | LogitBoost | J48 |
|---|---|---|---|---|---|---|---|
| libsvm_fs | 0.95 | 0.925 | 0.9 | 0.9 | 0.85 | 0.9 | 0.875 |
| mid_maxrel | 0.925 | 0.95 | 0.95 | 0.975 | 0.95 | 0.95 | 0.95 |
| mid_mrmr | 0.95 | 0.925 | 0.925 | 0.95 | 0.95 | 0.925 | 0.95 |
| miq_maxrel | 0.925 | 0.95 | 0.95 | 0.975 | 0.95 | 0.95 | 0.95 |
| miq_mrmr | 0.95 | 0.95 | 0.925 | 0.975 | 0.975 | 0.975 | 0.95 |
| all_fs_rk | 0.95 | 0.95 | 0.95 | 0.975 | 0.975 | 0.975 | 0.95 |
| Weka attribute selection | 0.95 | 1 | 0.95 | 0.975 | 0.975 | 1 | 0.975 |
Predicting results of the best classifier of stage 2 classifiers with cross-validation and independent testing.
| Independent testing | Random Forest | LogitBoost |
|---|---|---|
| Sn | 0.959 | 0.796 |
| Sp | - | - |
| Precision | 1.000 | 1.000 |
| ACC | 0.959 | 0.796 |
| MCC | - | - |
Comparing other prediction tools.
| QuarIdent | QUATgo | QuarBingo | QuarIdent | QUATgo | QuarBingo | QuarIdent | QUATgo | QuarBingo | QuarIdent | QUATgo | QuarBingo | QuarIdent | QUATgo | QuarBingo | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| mer | Precision | Sn | Sp | ACC | MCC | ||||||||||
| Monomer | 0.438 | 0.331 | 0.143 | 0.320 | 0.430 | 0.880 | 0.953 | 0.900 | 0.390 | 0.887 | 0.851 | 0.441 | 0.314 | 0.294 | 0.171 |
| Homo-2 mer | 0.224 | 0.223 | 0.221 | 0.520 | 0.270 | 0.190 | 0.792 | 0.892 | 0.923 | 0.764 | 0.827 | 0.847 | 0.223 | 0.149 | 0.121 |
| Homo-3 mer | 0.313 | 0.260 | 0.641 | 0.310 | 0.260 | 0.410 | 0.922 | 0.915 | 0.973 | 0.858 | 0.847 | 0.915 | 0.233 | 0.175 | 0.470 |
| Homo-4 mer | 0.452 | 0.278 | 0.588 | 0.380 | 0.200 | 0.100 | 0.947 | 0.940 | 0.992 | 0.888 | 0.863 | 0.900 | 0.353 | 0.162 | 0.213 |
| Homo-6 mer | 0.689 | 0.313 | 0.500 | 0.310 | 0.250 | 0.010 | 0.984 | 0.937 | 0.999 | 0.914 | 0.866 | 0.897 | 0.425 | 0.206 | 0.059 |
| Homo-8 mer | 0.887 | 0.686 | N/A | 0.550 | 0.350 | 0.000 | 0.992 | 0.982 | 1.000 | 0.946 | 0.916 | 0.897 | 0.674 | 0.452 | N/A |
| Homo-10 mer | N/A | 0.105 | N/A | 0.000 | 0.667 | 0.000 | 1.000 | 0.982 | 1.000 | 0.997 | 0.981 | 0.997 | N/A | 0.260 | N/A |
| Homo-12 mer | 1.000 | 0.500 | 0.263 | 0.604 | 0.500 | 0.750 | 1.000 | 0.974 | 0.890 | 0.980 | 0.950 | 0.883 | 0.769 | 0.474 | 0.399 |
| Hetero-2 mer | 0.386 | 0.387 | 0.538 | 0.220 | 0.290 | 0.140 | 0.960 | 0.947 | 0.986 | 0.883 | 0.879 | 0.899 | 0.232 | 0.270 | 0.237 |
| Hetero-3 mer | 0.400 | 0.174 | 0.333 | 0.043 | 0.348 | 0.022 | 0.997 | 0.917 | 0.998 | 0.951 | 0.890 | 0.951 | 0.119 | 0.192 | 0.075 |
| Hetero-4 mer | 0.429 | 0.193 | 0.333 | 0.060 | 0.110 | 0.010 | 0.991 | 0.947 | 0.998 | 0.895 | 0.860 | 0.896 | 0.129 | 0.074 | 0.042 |
| Hetero-6 mer | 0.000 | 0.391 | 0.250 | 0.000 | 0.439 | 0.024 | 0.997 | 0.970 | 0.997 | 0.954 | 0.947 | 0.956 | -0.012 | 0.387 | 0.066 |
| Hetero-8 mer | 0.000 | 0.063 | N/A | 0.000 | 1.000 | 0.000 | 0.999 | 0.984 | 1.000 | 0.998 | 0.984 | 0.999 | -0.001 | 0.248 | N/A |
| Hetero-12 mer | 0.000 | 0.813 | 0.000 | 0.000 | 0.929 | 0.000 | 0.998 | 0.994 | 0.995 | 0.969 | 0.992 | 0.966 | -0.008 | 0.864 | -0.012 |
| Avg | 0.401 | 0.337 | 0.346 | 0.237 | 0.432 | 0.181 | 0.966 | 0.949 | 0.939 | 0.920 | 0.904 | 0.889 | 0.265 | 0.301 | 0.167 |