| Literature DB >> 19615046 |
James R Green1, Michael J Korenberg, Mohammed O Aboul-Magd.
Abstract
BACKGROUND: Since the function of a protein is largely dictated by its three dimensional configuration, determining a protein's structure is of fundamental importance to biology. Here we report on a novel approach to determining the one dimensional secondary structure of proteins (distinguishing alpha-helices, beta-strands, and non-regular structures) from primary sequence data which makes use of Parallel Cascade Identification (PCI), a powerful technique from the field of nonlinear system identification.Entities:
Mesh:
Substances:
Year: 2009 PMID: 19615046 PMCID: PMC2720391 DOI: 10.1186/1471-2105-10-222
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1Sequence-to-structure PCI classifier. Creating a 3-ary sequence-to-structure PCI-MSE classifier from 3 binary PCI-MSE classifiers.
Optimal PCI Architectural parameters
| 3 ≤ ( | [ | [5,50] | |||
| 10 | 9 | 3 | 35 | ||
| 9 | 15 | 3 | 50 | ||
| 5 | 7 | 2 | 40 | ||
| 4 | 7 | 6 | 50 | ||
| 4 | 1 | 7 | 45 | ||
| 1 | 3 | 7 | 20 | ||
Optimal binary PCI architectural parameters following GA optimization over S1 dataset.
PCI accuracy over S5 test dataset
| 0.661 | 0.572 | 0.530 | 73.9% | 2.72 | 61.8 | |
| 0.693 | 0.595 | 0.547 | 75.5% | 1.89 | 67.1 | |
Sequence-to-structure (see Figure 1) and cascaded (see Figure 2) PCI classifier results over S5 test dataset of 543 chains.
Figure 2Cascaded PCI classifier. Cascaded PCI classifier formed from PCI sequence-to-structure models followed by a cascaded sequence-to-structure (post-PCI) classifier.
Figure 3PCI consensus classifier structure. A 6-input cascaded PCI classifier is used to combine 3 outputs from binary PCI sequence-to-structure models with 3 distance outputs from PSIPRED [6].
Combination of PCI with PSIPRED
| 0.727 | 0.646 | 0.585 | 77.8% | 1.49 | 68.9 | |
| 0.740 | 0.647 | 0.592 | 78.5% | 1.12 | 69.8 | |
Prediction accuracy over the S5 test dataset (543 chains) for PSIPRED alone and for the post-PCI combination of PSIPRED with 3 binary PCI model outputs.
Results over the final EVA test
| 0.619 | 76.09 | 2.65 | 0.631 | 75.96 | 2.82 | 71.4 | |
| 0.619 | 76.09 | 2.65 | 0.631 | 75.96 | 2.82 | 71.4 | |
| 0.577 | 72.65 | 3.70 | 0.594 | 72.86 | 3.38 | 66.8 | |
| 0.651 | 77.72 | 2.36 | 0.659 | 77.70 | 2.49 | 75.3 | |
| 0.668 | 79.02 | 2.01 | 78.86 | 2.12 | 76.1 | ||
| 0.633 | 76.74 | 2.62 | 0.634 | 76.50 | 2.74 | 73.9 | |
| 0.651 | 77.95 | 1.86 | 0.644 | 77.45 | 2.05 | 73.0 | |
| 0.616 | 76.07 | 3.24 | 0.622 | 76.15 | 3.20 | 70.6 | |
| 0.643 | 77.69 | 2.43 | 0.642 | 77.58 | 2.46 | 72.0 | |
| 0.632 | 76.45 | 2.70 | 0.624 | 76.31 | 2.69 | 72.0 | |
| 0.676 | 79.44 | 2.13 | 0.658 | 79.36 | 2.20 | 75.5 | |
| 0.656 | |||||||
| 0.679 | 1.98 | 0.659 | 2.10 | 75.7 | |||
Results over the final EVA test set of 125 new protein chains dissimilar to all training data. CC denotes the average Matthews' correlation coefficient observed for the three classes. 'Avg per residue' results are calculated over the pool of all residues in the dataset whereas 'Avg per chain' results are compiled for each chain prior to computing the average. PSIPRED-local refers to the output of PSIPRED v2.45 run locally when provided with PSSM data generated from the filtered NCBI non-redundant nr database as frozen on 3 May 2004. PSIPRED-live indicates the performance of the actual PSIPRED server [31] as of the day that each protein chain was added to the EVA system. Bad-Score-Rule in the last row shows the results when a rule is applied to combine postPCI with PSIPRED to optimize Q3 score at the cost of BAD score (see text for details).
Figure 4PCI-SS webserver screen capture. Screen capture of human-readable PCI-SS webserver results page. Users can download predicted structure in tab-separated value or XML format. Users can also view a comparison of the consensus prediction against the component classifiers, and can download PSI-BLAST search results and PSSM data.