| Literature DB >> 21283518 |
Lele Hu1, Tao Huang, Xiaohe Shi, Wen-Cong Lu, Yu-Dong Cai, Kuo-Chen Chou.
Abstract
BACKGROUND: With the huge amount of uncharacterized protein sequences generated in the post-genomic age, it is highly desirable to develop effective computational methods for quickly and accurately predicting their functions. The information thus obtained would be very useful for both basic research and drug development in a timely manner. METHODOLOGY/PRINCIPALEntities:
Mesh:
Substances:
Year: 2011 PMID: 21283518 PMCID: PMC3023709 DOI: 10.1371/journal.pone.0014556
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Functional classification of proteins of mouse.
| Functional number | Functional Category | Number of proteins |
| 1 | METABOLISM | 2714 |
| 2 | ENERGY | 605 |
| 3 | CELL CYCLE AND DNA PROCESSING | 1123 |
| 4 | TRANSCRIPTION | 2128 |
| 5 | PROTEIN SYNTHESIS | 490 |
| 6 | PROTEIN FATE (folding, modification, destination) | 2490 |
| 7 | PROTEIN WITH BINDING FUNCTION OR COFACTOR REQUIREMENT (structural or catalytic) | 8414 |
| 8 | REGULATION OF METABOLISM AND PROTEIN FUNCTION | 1115 |
| 9 | CELLULAR TRANSPORT, TRANSPORT FACILITIES AND TRANSPORT ROUTES | 2411 |
| 10 | CELLULAR COMMUNICATION/SIGNAL TRANSDUCTION MECHANISM | 4077 |
| 11 | CELL RESCUE, DEFENSE AND VIRULENCE | 778 |
| 12 | INTERACTION WITH THE ENVIRONMENT | 1492 |
| 13 | SYSTEMIC INTERACTION WITH THE ENVIRONMENT | 2086 |
| 14 | TRANSPOSABLE ELEMENTS, VIRAL AND PLASMID PROTEINS | 11 |
| 15 | CELL FATE | 1313 |
| 16 | DEVELOPMENT (Systemic) | 1044 |
| 17 | BIOGENESIS OF CELLULAR COMPONENTS | 980 |
| 18 | CELL TYPE DIFFERENTIATION | 370 |
| 19 | TISSUE DIFFERENTIATION | 426 |
| 20 | ORGAN DIFFERENTIATION | 559 |
| 21 | SUBCELLULAR LOCALIZATION | 9767 |
| 22 | CELL TYPE LOCALIZATION | 274 |
| 23 | TISSUE LOCALIZATION | 366 |
| 24 | ORGAN LOCALIZATION | 620 |
Biochemical and physicochemical features of proteins and their dimensionality.
| Property name | Number of feature | Total | ||
| C | T | D | ||
| Hydrophobicity | 3 | 3 | 15 | 21 |
| Secondary structure | 3 | 3 | 15 | 21 |
| Solvent accessibility | 1 | 1 | 5 | 7 |
| Normalized van der Waals Volume | 3 | 3 | 15 | 21 |
| Polarity | 3 | 3 | 15 | 21 |
| Polarizability | 3 | 3 | 15 | 21 |
| Amino Acid Composition | 20 | 20 | ||
The 24-order prediction accuracies of the three methods on the training/test sets.
| Network-based method | Hybrid-property based method | Motif-based method | Overall prediction | |||||||
| Order |
|
|
|
|
|
|
|
|
|
|
| 1 | 75.93% | 78.21% | 47.15% | 35.37% | 57.12% | 42.67% | 31.83% | 32.69% | 69.07% | 70.23% |
| 2 | 64.38% | 70.11% | 40.71% | 32.52% | 51.79% | 40.04% | 30.48% | 27.88% | 58.74% | 63.11% |
| 3 | 50.52% | 53.91% | 35.26% | 24.80% | 45.65% | 39.57% | 28.48% | 30.77% | 46.89% | 48.48% |
| 4 | 31.09% | 35.10% | 26.14% | 21.54% | 34.68% | 32.33% | 24.22% | 29.81% | 29.91% | 32.58% |
| 5 | 20.07% | 24.21% | 20.16% | 24.39% | 25.64% | 27.07% | 22.82% | 18.27% | 20.09% | 24.24% |
| 6 | 14.71% | 17.60% | 14.07% | 16.67% | 17.95% | 20.96% | 16.87% | 15.87% | 14.56% | 17.42% |
| 7 | 11.33% | 12.76% | 11.86% | 13.41% | 14.42% | 18.42% | 14.46% | 12.98% | 11.45% | 12.88% |
| 8 | 8.37% | 9.68% | 10.70% | 15.85% | 10.88% | 14.19% | 14.41% | 12.98% | 8.92% | 10.83% |
| 9 | 6.82% | 9.87% | 8.97% | 14.63% | 9.11% | 13.16% | 12.11% | 12.98% | 7.33% | 10.76% |
| 10 | 6.16% | 6.61% | 8.27% | 13.01% | 8.18% | 11.75% | 13.21% | 8.65% | 6.66% | 7.80% |
| 11 | 4.76% | 5.49% | 7.00% | 6.50% | 6.69% | 12.31% | 11.41% | 11.06% | 5.30% | 5.68% |
| 12 | 4.65% | 5.87% | 6.33% | 5.28% | 5.95% | 10.15% | 9.61% | 9.62% | 5.05% | 5.76% |
| 13 | 3.86% | 4.56% | 5.77% | 5.28% | 5.30% | 9.40% | 8.96% | 8.65% | 4.32% | 4.70% |
| 14 | 3.66% | 3.54% | 6.30% | 3.66% | 5.42% | 9.02% | 7.11% | 8.17% | 4.29% | 3.56% |
| 15 | 3.04% | 4.10% | 4.33% | 3.25% | 4.43% | 8.74% | 8.56% | 7.21% | 3.34% | 3.94% |
| 16 | 2.64% | 3.35% | 4.22% | 2.85% | 3.67% | 8.74% | 6.91% | 1.92% | 3.02% | 3.26% |
| 17 | 2.36% | 2.51% | 3.52% | 1.22% | 3.77% | 8.83% | 5.21% | 2.40% | 2.64% | 2.27% |
| 18 | 2.13% | 1.86% | 4.26% | 2.44% | 3.19% | 6.86% | 5.46% | 1.92% | 2.64% | 1.97% |
| 19 | 1.67% | 2.23% | 3.87% | 3.66% | 2.84% | 6.11% | 5.61% | 1.92% | 2.20% | 2.50% |
| 20 | 1.63% | 2.05% | 2.78% | 2.03% | 2.34% | 4.32% | 4.50% | 0.96% | 1.90% | 2.05% |
| 21 | 1.59% | 1.49% | 2.74% | 4.47% | 2.07% | 4.51% | 3.55% | 0.48% | 1.87% | 2.05% |
| 22 | 1.46% | 1.30% | 1.83% | 0.41% | 1.64% | 4.51% | 4.50% | 0.48% | 1.55% | 1.14% |
| 23 | 1.07% | 1.12% | 1.90% | 1.22% | 1.10% | 3.57% | 3.20% | 0.48% | 1.27% | 1.14% |
| 24 | 0.78% | 1.12% | 0.49% | 0.41% | 0.06% | 0.66% | 2.75% | 0.48% | 0.71% | 0.98% |
Figure 1The IFS curve of 132 hybrid features used in hybrid-property based method.
It shows that the first order prediction accuracy by the hybrid-property based method varies with the increment of the features. The curve arises to the apogee when the number of features is 90.
Figure 2Distribution of the subtype of hybrid properties in the optimized 90 features.
X-coordinates represent seven kinds of biochemical and physicochemical attributes, and Y-coordinates correspond with the frequency of each attribute occurring in the selected the 90 features.
The jackknife test prediction accuracies of network-based method with STRING data and IntAct data on the selected 1939 proteins.
| Order | STRING data | IntAct data |
| 1 | 83.50% | 57.50% |
| 2 | 75.45% | 53.12% |
| 3 | 62.09% | 41.98% |
| 4 | 40.23% | 33.94% |
| 5 | 25.84% | 27.64% |
| 6 | 18.05% | 19.65% |
| 7 | 13.67% | 16.25% |
| 8 | 12.27% | 12.94% |
| 9 | 9.95% | 11.04% |
| 10 | 7.79% | 10.06% |
| 11 | 6.34% | 9.80% |
| 12 | 6.29% | 9.33% |
| 13 | 5.83% | 8.46% |
| 14 | 5.16% | 7.68% |
| 15 | 3.92% | 7.68% |
| 16 | 3.82% | 8.05% |
| 17 | 2.94% | 8.56% |
| 18 | 3.09% | 7.01% |
| 19 | 1.70% | 8.20% |
| 20 | 1.81% | 8.51% |
| 21 | 1.39% | 6.29% |
| 22 | 1.08% | 6.19% |
| 23 | 0.83% | 6.14% |
| 24 | 0.88% | 7.89% |