| Literature DB >> 19958485 |
Paul D Yoo1, Bing Bing Zhou, Albert Y Zomaya.
Abstract
BACKGROUND: In this paper, we introduce a novel inter-range interaction integrated approach for protein domain boundary prediction. It involves (1) the design of modular kernel algorithm, which is able to effectively exploit the information of non-local interactions in amino acids, and (2) the development of a novel profile that can provide suitable information to the algorithm. One of the key features of this profiling technique is the use of multiple structural alignments of remote homologues to create an extended sequence profile and combines the structural information with suitable chemical information that plays an important role in protein stability. This profile can capture the sequence characteristics of an entire structural superfamily and extend a range of profiles generated from sequence similarity alone.Entities:
Mesh:
Substances:
Year: 2009 PMID: 19958485 PMCID: PMC2788374 DOI: 10.1186/1471-2164-10-S3-S21
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Predictive performance of machine-learning models
| Models | Sensitivity | Specificity | Correlation- | Accuracy | ||
|---|---|---|---|---|---|---|
| HMEHE | 0.77 ± 0.015 | 0.79 ± 0.026 | 0.78 ± 0.002 | 0.78 ± 0.012 | 0.56 ± 0.016 | 0.78 ± 0.015 |
| HMEPSSM | 0.74 ± 0.019 | 0.74 ± 0.018 | 0.75 ± 0.010 | 0.73 ± 0.045 | 0.48 ± 0.023 | 0.74 ± 0.016 |
| SVMHE | 0.71 ± 0.008 | 0.73 ± 0.010 | 0.70 ± 0.003 | 0.74 ± 0.017 | 0.44 ± 0.011 | 0.72 ± 0.020 |
| SVMPSSM | 0.71 ± 0.004 | 0.67 ± 0.008 | 0.65 ± 0.012 | 0.72 ± 0.006 | 0.37 ± 0.007 | 0.69 ± 0.003 |
| MLPHE | 0.69 ± 0.009 | 0.72 ± 0.012 | 0.61 ± 0.027 | 0.75 ± 0.019 | 0.40 ± 0.013 | 0.70 ± 0.025 |
| MLPPSSM | 0.67 ± 0.017 | 0.71 ± 0.032 | 0.61 ± 0.013 | 0.76 ± 0.027 | 0.37 ± 0.022 | 0.68 ± 0.011 |
Mean testing data ± standard deviation obtained using ANOVA test using optimal settings for each model.
Accuracy of domain boundary placement on the Benchmark_3 dataset and the CASP8
| Predictors | B_3 | B_3 | B_3 | CASP8 | CASP8 |
|---|---|---|---|---|---|
| MKA | 82.1% | 50.9% | 31.5% | 86.4% | 51.9% |
| DomNet | 83.0% | 54.3% | 21.0% | 84.7% | 46.8% |
| DOMpro | 79.2% | 48.1% | 29.8% | 75.1% | 40.6% |
| DomPred | 86.7% | 9.5% | 31.7% | 87.4% | 29.2% |
Benchmark_3: 1-domain (39.1%), 2-domain (39.9%), and 3-domain and more (21%); CASP8: single-domain (68.6%), and multi-domail (31.4%).
Figure 1A basic architecture of Hierarchical mixture of experts network.
Hydrophobicity scale: nonpolar → polar distributions of amino-acids chains, pH7 (kcal/mol)
| Amino acid | Feature value | Amino acid | Feature value | ||
|---|---|---|---|---|---|
| 1 | I | 4.92 | 11 | Y | --0.14 |
| 2 | L | 4.92 | 12 | T | --2.57 |
| 3 | V | 4.04 | 13 | S | --3.40 |
| 4 | P | 4.04 | 14 | H | --4.66 |
| 5 | F | 2.98 | 15 | Q | --5.54 |
| 6 | M | 2.35 | 16 | K | --5.55 |
| 7 | W | 2.33 | 17 | N | --6.64 |
| 8 | A | 1.81 | 18 | E | --6.81 |
| 9 | C | 1.28 | 19 | D | --8.72 |
| 10 | G | 0.94 | 20 | R | --14.92 |
Rose hydrophobicity scale
| Amino acid | Feature value | Amino acid | Feature value | ||
|---|---|---|---|---|---|
| 1 | A | 0.74 | 11 | L | 0.85 |
| 2 | R | 0.64 | 12 | K | 0.52 |
| 3 | N | 0.63 | 13 | M | 0.85 |
| 4 | D | 0.62 | 14 | F | 0.88 |
| 5 | C | 0.91 | 15 | P | 0.64 |
| 6 | Q | 0.62 | 16 | S | 0.66 |
| 7 | E | 0.62 | 17 | T | 0.70 |
| 8 | G | 0.72 | 18 | W | 0.85 |
| 9 | H | 0.78 | 19 | Y | 0.76 |
| 10 | I | 0.88 | 20 | V | 0.86 |
SARAH1 Scale
| Amino acid | Binary code | Amino acid | Binary code | ||
|---|---|---|---|---|---|
| 1 | C | 1,1,0,0,0 | 11 | G | 0,0,0,--1,--1 |
| 2 | F | 1,0,1,0,0 | 12 | T | 0,0, --1,0, --1 |
| 3 | I | 1,0,0,1,0 | 13 | S | 0,0, --1, --1,0 |
| 4 | V | 1,0,0,0,1 | 14 | R | 0, --1,0,0, --1 |
| 5 | L | 0,1,1,0,0 | 15 | P | 0, --1,0, --1,0 |
| 6 | W | 0,1,0,1,0 | 16 | N | 0, --1, --1,0,0 |
| 7 | M | 0,1,0,0,1 | 17 | D | --1,0,0,0, --1 |
| 8 | H | 0,0,1,1,0 | 18 | Q | --1,0,0, --1,0 |
| 9 | Y | 0,0,1,0,1 | 19 | E | --1,0, --1,0,0 |
| 10 | A | 0,0,0,1,1 | 20 | K | --1, --1,0,0,0 |
Figure 2The flowchart of MKA showing the stepwise procedure. Figure 1 shows the stepwise procedure we have performed. (1) data collection, building Benchmark_3 and pre-processing datasets; (2) profile construction, such as PSSM, Sarah1 and EH-profile; (3) the information obtained in (2) and (3) were combined and normalised to fall in the interval [--1, 1] to be fed into networks; (4) target levels were assigned to each profile (positive, +1, for domain boundary residues and negative, --1, for non-boundary residues); (5) a hold-out method, to divide the combined dataset into seven subsets (training and testing sets); (6) model training on each set, to create a model; (7) simulation of each model on the test set, to obtain predicted outputs; and (8) post-processing to find predicted domain boundary locations. The procedure from (6) to (8) was performed iteratively until we obtained the most suitable kernel and the optimal hyperparameters for HME for Benchmark_3 dataset.