| Literature DB >> 30736739 |
Kamal Taha1, Youssef Iraqi2, Amira Al Aamri2.
Abstract
BACKGROUND: A large number of computational methods have been proposed for predicting protein functions. The underlying techniques adopted by most of these methods revolve around predicting the functions of an unannotated protein p from already annotated proteins that have similar characteristics as p. Recent Information Extraction methods take advantage of the huge growth of biomedical literature to predict protein functions. They extract biological molecule terms that directly describe protein functions from biomedical texts. However, they consider only explicitly mentioned terms that co-occur with proteins in texts. We observe that some important biological molecule terms pertaining functional categories may implicitly co-occur with proteins in texts. Therefore, the methods that rely solely on explicitly mentioned terms in texts may miss vital functional information implicitly mentioned in the texts.Entities:
Mesh:
Substances:
Year: 2019 PMID: 30736739 PMCID: PMC6368809 DOI: 10.1186/s12859-019-2594-y
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
A sample of known protein characteristics represented in a form similar to predicate logic’s premises and used as specification rules. The abbreviations in Table 3 are used in the formation of these premises. Ri denotes premise number i. The following Logic Symbols are used: “∧” for Conjunction; “∨” for Logical Disjunction; “→” for implies
Notations and abbreviations of the terms used in the formation of the premises presented in Table 1
|
|
|
|---|---|
| ST(Px) | Structure of protein Px |
| FD(Px) | Folding of protein Px |
| Ly | Ligand y |
| F(Px) | Function of protein Px |
| AAS(Px) | Amino Acid Sequence of protein Px |
| CBND(Px, Ly) | Covalent bond between Ligand y and protein Px |
| PPI(Px, Py) | Protein-Protein Interaction of proteins Px and Py |
| NCBND(Px, Py) | Non-covalent bond between proteins Px and Py |
| PCF(Px, Py) | Protein Complex of Functions of proteins Px and Py |
The standard inference rules for predicate logic
|
|
|
|---|---|
| ¬ | Modus |
|
| Modus |
| Simplification | |
|
| Conjunction |
| Disjunctive Syllogism | |
|
| Disjunctive Amplification |
| ¬ | Contradiction |
| Conditional Proof | |
| Proof by Cases | |
| Law of Syllogism |
Inferring the function of protein Pu described in example 1
|
|
|
|---|---|
| 1. FD(Px) | Given premise (based on its co-occurrence with Pu) |
| 2. ST(Px) | Given premise (based on its co-occurrence with Pu) |
| 3. FD(Px) ∧ ST(Px) | Conjunction using steps 1 and 2 |
| 4. FD(Px)→(ST(Px) →F(Px)) | Premise R1 from Table |
| 5. F(Px) | Conditional Proof using steps 3 and 4 |
Inferring the function of protein Pu described in example 2
|
|
|
|---|---|
| 1. AAS(Px) | Given premise (based on its co-occurrence with Pu) |
| 2. AAS(Py) | Given premise (based on its co-occurrence with Pu) |
| 3. AAS(Px) ∧ AAS(Py) | Conjunction using steps 1 & 2 |
| 4. AAS(Px) → ST(Px) | Premise R2 from Table |
| 5. ST(Px) | Modus Ponens using steps 1 & 4 |
| 6. (AAS(Px) ∧ AAS(Py)) ∧ ST(Px) | Conjunction using steps 3 & 5 |
| 7. (AAS(Px) ∧ AAS(Py))→((ST(Px)→F(Py)) | Premise R10 from Table |
| 8. F(Py) | Conditional Proof using steps 6 & 7 |
| 9. AAS(Py) → ST(Py) | Premise R2 from Table |
| 10. ST(Py) | Modus Ponens using steps 2 & 9 |
| 11. (AAS(Px) ∧ AAS(Py)) ∧ ST(Py) | Conjunction using steps 3 &10 |
| 12. (AAS(Px) ∧ AAS(Py))→((ST(Py)→F(Px)) | Premise M10 from Table |
| 13. F(Px) | Conditional Proof using steps 11&12 |
Inferring the function of protein Pu described in example 3
|
|
|
|---|---|
| 1. ST(Px) | Given premise (based on its co-occurrence with Pu) |
| 2. ST(Px) →AAS(Px) | Premise R13 from Table |
| 3. AAS(Px) | Modus Ponens using steps 1 and 2 |
| 4. AAS(Px) → F(Px) | Premise R3 from Table |
| 5. F(Px) | Modus Ponens using steps 3 and 4 |
Inferring the function of protein Pu described in example 4
|
|
|
|---|---|
| 1. NCBND(Px, Py) | Given premise (based on its co-occurrence with Pu) |
| 2. F(Px) | Given premise (based on its co-occurrence with Pu) |
| 3. NCBND(Px, Py)→PPI(Px, Py) | Premise R12 from Table |
| 4. PPI(Px, Py) → PCF(Px, Py) | Premise R6 from Table |
| 5. NCBND(Px, Py) → PCF(Px, Py) | Law of Syllogism using steps 1 and 5 |
| 6. PCF(Px, Py) | Modus Ponens using steps 6 and 7 |
| 7. PCF(Px, Py) ∧ F(Px) | Conjunction using steps 2 and 6 |
| 8. PCF(Px, Py)→(F(Px)→F(Py)) | Premise R7 from Table |
| 9. F(Py) | Conditional Proof using steps 7 and 8 |
Number of GO terms and proteins downloaded for the experiments
|
|
| |
|---|---|---|
|
| 70 | 30 |
|
| 584, 973 | 604,625 |
|
| 62,386 | 16,576 |
a We selected for the evaluations only proteins that satisfy the following: (1) associated with at least one PubMed publication based on their entries in UniProtKB [6], and (2) have experimental evidence code: IC, IDA, IPI, IEP, EXP, TAS, IMP, IGI, or IC.
Fig. 1The systems’ performances for predicting GO functions after applying 5-fold cross validation
Fig. 2The systems’ performances for predicting SGD functions after applying 5-fold cross validation
Number and percentage of valid and invalid co-occurrences identified by PL-PPF in the GO and SDG datasets
|
|
|
|
|
|---|---|---|---|
|
| Number of valid co-occurrences identified | 39,928 | 9614 |
| Number of invalid co-occurrences identified | 22,458 | 6962 | |
| Percentage of valid co-occurrences identified | 64% | 58% | |
|
| Number of valid co-occurrences identified | 2152 | 858 |
| Number of invalid co-occurrences identified | 1986 | 1090 | |
| Percentage of valid co-occurrences identified | 52% | 44% |
Fig. 3The Recalls of the systems for predicting the functional categories of the set of GO terms positioned at the same hierarchical level of the GO ontology
Fig. 4The Precisions of the systems for predicting the functional categories of the set of GO terms positioned at the same hierarchical level of the GO ontology
Fig. 5The Recalls of the systems for predicting the functional categories of GO terms using a cumulative set of proteins, whose associate biomedical texts are used as a training dataset
Fig. 6The Precisions of the systems for predicting the functional categories of GO terms using a cumulative set of proteins, whose associate biomedical texts are used as a training dataset
Fig. 7The protein centric maximum F-measure of PL-PPF and DeepGO [15] for predicting the functional categories of the GO dataset described in [15]
The percentages of valid explicit and implicit terms that PL-PPF identified in the datasets. For each of the GO and SDG datasets used in the experiments, the table presents the percentages of valid terms in the Biological Process and Molecular Function Ontologies identified by PL-PPF
| GO | SDG | |||
|---|---|---|---|---|
| Biological Process | Molecular Function | Biological Process | Molecular Function | |
| % of explicit terms | 72% | 64% | 62% | 76% |
| % of implicit terms | 28% | 36% | 38% | 24% |