| Literature DB >> 18315843 |
Paul D Yoo1, Abdur R Sikder, Bing Bing Zhou, Albert Y Zomaya.
Abstract
BACKGROUND: Protein domains present some of the most useful information that can be used to understand protein structure and functions. Recent research on protein domain boundary prediction has been mainly based on widely known machine learning techniques, such as Artificial Neural Networks and Support Vector Machines. In this study, we propose a new machine learning model (IGRN) that can achieve accurate and reliable classification, with significantly reduced computations. The IGRN was trained using a PSSM (Position Specific Scoring Matrix), secondary structure, solvent accessibility information and inter-domain linker index to detect possible domain boundaries for a target sequence.Entities:
Mesh:
Substances:
Year: 2008 PMID: 18315843 PMCID: PMC2259413 DOI: 10.1186/1471-2105-9-S1-S12
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1Bias and variance dilemma of parametric and non-parametric models.
The prediction accuracy, generalisation variance and computational efficiency of machine learning models.
| IGRN II | IGRN | GRNN | RBFN | MLP | |
| Win 7 | 68.3 | 65.6 | 67.1 | 65.4 | 66.7 |
| Win 11 | 66.7 | 65.4 | 65.1 | 61.3 | 63.8 |
| Win 19 | 67.1 | 63.7 | 65.4 | 62.1 | 66.5 |
| Win 27 | 65.5 | 64.5 | 63.8 | 59.7 | 62.1 |
| Avg Accuracy | 67.0 | 64.79 | 65.36 | 62.13 | 64.78 |
| St. Dev | 1.16 | 0.86 | 1.37 | 2.41 | 2.22 |
| Training Time | 43.2 | 31.5 | 91.3 | 28.1 | 91.8 |
| Testing Time | 7.4 | 5.3 | 8.7 | 5.2 | 6.3 |
Accuracy of domain boundary placement on CASP7 benchmark dataset.
| 1-domain | 2-domains | 3-domains | Overall Accuracy (%) | |
| DOMpro | 84.4 | 0.0 | 0.0 | 62.64 |
| DomPred | 85.9 | 9.5 | 33.0 | 66.28 |
| DomSSEA | 80.5 | 19.1 | 33.0 | 62.06 |
| DomCut | 85.3 | 9.5 | 7.5 | 65.21 |
| DomainDiscovery | 80.5 | 31.0 | 29.2 | 67.34 |
| IGRN II | 87.9 | 19.1 | 7.5 | 69.44 |
| IGRN | 85.2 | 21.9 | 7.5 | 68.10 |
| GRNN | 88.3 | 14.8 | 0.0 | 68.52 |
Figure 2Comparison of domain boundary prediction scores between IGRN and GRNN. Domain Boundary is at the residue 155.
Figure 3A Basic architecture of GRNN where f(x) represents an arbitrary radial basis function.