| Literature DB >> 28234929 |
Hafeez Ur Rehman1, Nouman Azam1, JingTao Yao2, Alfredo Benso3.
Abstract
The knowledge of protein functions plays an essential role in understanding biological cells and has a significant impact on human life in areas such as personalized medicine, better crops and improved therapeutic interventions. Due to expense and inherent difficulty of biological experiments, intelligent methods are generally relied upon for automatic assignment of functions to proteins. The technological advancements in the field of biology are improving our understanding of biological processes and are regularly resulting in new features and characteristics that better describe the role of proteins. It is inevitable to neglect and overlook these anticipated features in designing more effective classification techniques. A key issue in this context, that is not being sufficiently addressed, is how to build effective classification models and approaches for protein function prediction by incorporating and taking advantage from the ever evolving biological information. In this article, we propose a three-way decision making approach which provides provisions for seeking and incorporating future information. We considered probabilistic rough sets based models such as Game-Theoretic Rough Sets (GTRS) and Information-Theoretic Rough Sets (ITRS) for inducing three-way decisions. An architecture of protein functions classification with probabilistic rough sets based three-way decisions is proposed and explained. Experiments are carried out on Saccharomyces cerevisiae species dataset obtained from Uniprot database with the corresponding functional classes extracted from the Gene Ontology (GO) database. The results indicate that as the level of biological information increases, the number of deferred cases are reduced while maintaining similar level of accuracy.Entities:
Mesh:
Substances:
Year: 2017 PMID: 28234929 PMCID: PMC5325230 DOI: 10.1371/journal.pone.0171702
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Fig 1Logical view of the architecture with three-way decisions for protein function classification.
Fig 2Physical view of the architecture with three-way decisions.
An information table for proteins.
| Objects | Localization available at | Interacting proteins available at | No. of Domains available at | Function |
|---|---|---|---|---|
| Mitochondria | 0 | 0 | Yes | |
| Mitochondria | 0 | 1 | No | |
| CytoPlasm | 1 | 0 | Yes | |
| CytoPlasm | 2 | 0 | No | |
| CytoPlasm | 2 | 0 | No | |
| CytoPlasm | 0 | 0 | Yes | |
| CytoPlasm | 2 | 1 | No | |
| Mitochondria | 0 | 1 | No |
Property of the three regions with evolving information.
| Localization time | Interacting proteins time | No. of Domains time | |
|---|---|---|---|
| POS(C) | ∅ | { | { |
| NEG(C) | ∅ | { | { |
| BND(C) | { | { | ∅ |
Fig 3The three regions with evolving information.
The sub-figures from left to right should be read as a, b and c respectively.
Fig 4Visualization of iterative three-way decision making algorithm.
Results of accuracy and generality for GTRS.
| Features | GTRS( | GTRS | GTRS | |||
|---|---|---|---|---|---|---|
| Accuracy | Generality | Accuracy | Generality | Accuracy | Generality | |
| 0.8031 | 0.2377 | 0.2276 | 0.7969 | 0.2913 | ||
| 0.3108 | 0.8077 | 0.2888 | 0.3467 | |||
| 0.7808 | 0.6654 | 0.7853 | 0.6737 | 0.7807 | 0.6724 | |
| 0.7815 | 0.7797 | 0.7807 | ||||
Results of accuracy and generality for ITRS.
| Features | ITRS | ITRS | ||
|---|---|---|---|---|
| Accuracy | Generality | Accuracy | Generality | |
| 0.6008 | 0.5972 | |||
| 0.8043 | 0.6296 | 0.8101 | 0.631 | |
| 0.7927 | 0.7394 | 0.7878 | 0.7411 | |
| 0.791 | 0.7865 | |||
Fig 5Results of the positive, negative and boundary regions.
Fig 6Accuracy and generality results of the GTRS based approaches.
Fig 7Accuracy and generality results of the ITRS based approaches.
Comparison of the proposed three way classification method with top performing methods of the field.
The target classes comprise of broader gene ontology terms for Saccharomyces cerevisiae species proteins.
| Method’s Name | Generality | Accuracy |
|---|---|---|
| Three way decision using GTRS | 68% | 78.40% |
| INGA (Interaction Network GO Annotator) tool [ | 60% | 57% |
| Jones-UCL [ | 62% | 59.5% |
| Argot [ | 61% | 59.4% |
| BLAST Annotation Transfer (baseline method) [ | 78% | 38% |