| Literature DB >> 28855832 |
Dongmei Li1, Xiuzhen Hu1, Xingxing Liu1, Zhenxing Feng1, Changjiang Ding1.
Abstract
β-Hairpins in enzyme, a kind of special protein with catalytic functions, contain many binding sites which are essential for the functions of enzyme. With the increasing number of observed enzyme protein sequences, it is of especial importance to use bioinformatics techniques to quickly and accurately identify the β-hairpin in enzyme protein for further advanced annotation of structure and function of enzyme. In this work, the proposed method was trained and tested on a non-redundant enzyme β-hairpin database containing 2818 β-hairpins and 1098 non-β-hairpins. With 5-fold cross-validation on the training dataset, the overall accuracy of 90.08% and Matthew's correlation coefficient (Mcc) of 0.74 were obtained, while on the independent test dataset, the overall accuracy of 88.93% and Mcc of 0.76 were achieved. Furthermore, the method was validated on 845 β-hairpins with ligand binding sites. With 5-fold cross-validation on the training dataset and independent test on the test dataset, the overall accuracies were 85.82% (Mcc of 0.71) and 84.78% (Mcc of 0.70), respectively. With an integration of mRMR feature selection and SVM algorithm, a reasonable high accuracy was achieved, indicating the method to be an effective tool for the further studies of β-hairpins in enzymes structure. Additionally, as a novelty for function prediction of enzymes, β-hairpins with ligand binding sites were predicted. Based on this work, a web server was constructed to predict β-hairpin motifs in enzymes (http://202.207.29.251:8080/).Entities:
Keywords: Enzymes; Ligand binding site; Minimum redundancy maximum; Support vector machine; β-Hairpin motif
Year: 2016 PMID: 28855832 PMCID: PMC5562482 DOI: 10.1016/j.sjbs.2016.11.014
Source DB: PubMed Journal: Saudi J Biol Sci ISSN: 2213-7106 Impact factor: 4.219
Fig. 1The distribution of the numbers of motifs with different loop lengths.
Fig. 2Hydropathy characteristics for amino acids.
The number of features of six groups after selection by mRMR.
| Feature | Original number | Selected number |
|---|---|---|
| 1. AACP | 315 | 74 |
| 2. HCP | 105 | 30 |
| 3. PSSP | 60 | 23 |
| 4. ACC | 20 | 5 |
| 5. HC | 6 | 4 |
| 6. AACD | 400 | 109 |
| Total | 906 | 245 |
AACP: amino acid compositions of each position; HCP: hydropathy characteristics for amino acid of each position; PSSP: predicted secondary structures of each position; ACC: amino acid composition; HC: hydropathy characteristics for amino acid; AACD: amino acid contiguous dipeptides composition.
The predictive results with different dimensions of features selected by mRMR.
| Dimension | ||||||
|---|---|---|---|---|---|---|
| 20 | 86.97 | 0.67 | 92.49 | 72.81 | 89.72 | 79.08 |
| 50 | 86.59 | 0.66 | 91.43 | 74.18 | 90.08 | 77.13 |
| 100 | 87.28 | 0.68 | 92.17 | 74.72 | 90.34 | 78.81 |
| 150 | 89.04 | 0.72 | 93.98 | 76.36 | 91.07 | 83.18 |
| 200 | 89.96 | 0.74 | 94.94 | 77.18 | 91.44 | 85.60 |
| 245 | 90.08 | 0.74 | 95.47 | 76.22 | 91.15 | 86.78 |
| 300 | 89.77 | 0.73 | 95.47 | 75.13 | 90.78 | 86.61 |
| 350 | 89.73 | 0.73 | 96.22 | 73.08 | 90.17 | 88.28 |
| 400 | 89.19 | 0.74 | 97.01 | 72.50 | 88.97 | 91.41 |
| 450 | 88.16 | 0.69 | 97.87 | 63.25 | 87.23 | 92.04 |
| 500 | 84.45 | 0.59 | 98.88 | 47.40 | 82.83 | 94.29 |
Fig. 3Flowchart of the prediction process for 5-fold cross-validation and independent test.
The prediction results for 5-fold cross-validation and independent test.
| Training dataset | 90.08 | 0.74 | 95.47 | 76.22 | 91.15 | 86.78 |
| Testing dataset | 88.93 | 0.76 | 90.61 | 85.18 | 93.16 | 80.29 |
| Hu’s (ArchDB) | 83.1 | 0.59 | 91.3 | 64.3 | 85.4 | 76.4 |
| Hu’s (EVA) | 80.7 | 0.61 | 83.4 | 77.4 | 81.8 | 79.3 |
The testing results of β-hairpins in the enzyme experimental sequence dataset.
| DSSP | 85.93 | 0.74 | 79.15 | 97.57 | 98.24 | 73.18 |
| PSIPRED | 70.67 | 0.41 | 73.07 | 68.64 | 66.27 | 75.14 |
Fig. 4A testing sample [PDB: protein 1OID (A)] of the sequence level in the testing set.
The predictive results of β-hairpins with ligand binding sites for 5-fold cross-validation and independent test.
| Training dataset | 85.82 | 0.71 | 82.09 | 88.66 | 84.79 | 86.53 |
| Testing dataset | 84.78 | 0.70 | 85.39 | 86.05 | 81.13 | 89.34 |