| Literature DB >> 29642842 |
Jonathan Mercier1, Adrien Josso2, Claudine Médigue2, David Vallenet2.
Abstract
BACKGROUND: High quality functional annotation is essential for understanding the phenotypic consequences encoded in a genome. Despite improvements in bioinformatics methods, millions of sequences in databanks are not assigned reliable functions. The curation of protein functions in the context of biological processes is a way to evaluate and improve their annotation.Entities:
Keywords: Curation; Expert system; Genome annotation; Knowledge representation; Metabolic pathways; Paraconsistent logic
Mesh:
Substances:
Year: 2018 PMID: 29642842 PMCID: PMC5896057 DOI: 10.1186/s12859-018-2126-1
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Fig. 1Illustration of the GROOLS conceptual graph model. This figure gives an example of a combination of entities and relations defined in the GROOLS model. The model is made of two types of concepts (rectangle nodes): Prior Knowledge (labelled PK) and Observation (labelled O). Relations between concepts are represented by edges named PK-PK (between two PK concepts) or O-PK (between O and PK concepts). Four types of relations are available in the model (i.e. ’subtype’, ’part’, ’expectation’ and ’prediction’) and are labelled with the ’-of’ suffix or the ’has-’ prefix according to the edge direction. This generic model is used to represent any biological processes as a conceptual graph with different combinations of concepts and relations
The sixteen truth value sets and their attributes
| Truth value sets ∖Attributes | Degree of | Approximation | ||||
|---|---|---|---|---|---|---|
| Truth | Falsehood | Information | Belief rank | Prediction | Expectation | |
|
| 0 | 0 | 0 | 13 | None | None |
| {{ | 0 | 0 | 1 | 14 | None | None |
| {{ | 0 |
| 1 | 16 | False | False |
| {{ |
| 0 | 1 | 1 | True | True |
| {{ |
|
| 1 | 5 | Both | Both |
| {{ | 0 |
| 2 | 15 | None | False |
| {{ |
| 0 | 2 | 3 | None | True |
| {{ |
|
| 2 | 10 | Both | Both |
| {{ |
|
| 2 | 6 | Both | Both |
| {{ |
|
| 2 | 11 | Both | Both |
| {{ |
|
| 2 | 2 | Both | Both |
| {{ |
|
| 3 | 7 | Both | Both |
| {{ |
|
| 3 | 9 | Both | Both |
| {{ |
|
| 3 | 12 | Both | Both |
| {{ |
|
| 3 | 4 | Both | Both |
| {{ |
|
| 4 | 8 | Both | Both |
Conclusion table
| Expectation ∖Prediction | True | False | Both | None |
|---|---|---|---|---|
| True | Confirmed P. | Unexpected A. | Contradictory A. | Missing |
| False | Unexpected P. | Confirmed A. | Contradictory P. | Absent |
| Both | Ambiguous P. | Ambiguous A. | Ambiguous C. | Ambiguous |
| None | Unconfirmed P. | Unconfirmed A. | Unconfirmed C. | Unexplained |
P.: Presence, A.: Absence, C.: Contradiction
GROOLS statistics on pathways and functional units for Genome Properties and UniPathway. Statistics are computed for the 14 studied organisms
| Genome properties | UniPathway | ||||||
|---|---|---|---|---|---|---|---|
| UniProtKB | UniProtKB | UniProtKB | MicroScope | UniProtKB | MicroScope | ||
| falsehood | specific | specific | |||||
| Pathway conclusions | Exp. / Pred. | ||||||
| Confirmed presence | True / True | 195 | 195 | 241 | 277 | 339 | 393 |
| Unexpected absence | True / False | 0 | 12 | 0 | 0 | 0 | 0 |
| Contradictory absence | True / Both | 0 | 56 | 0 | 5 | 0 | 5 |
| Missing | True / None | 68 | 0 | 232 | 191 | 137 | 80 |
| Unexpected presence | False / True | 13 | 13 | 33 | 50 | 79 | 121 |
| Confirmed absence | False / False | 0 | 66 | 0 | 0 | 0 | 0 |
| Contradictory presence | False / Both | 0 | 40 | 0 | 0 | 0 | 0 |
| Absent | False / None | 106 | 0 | 275 | 258 | 229 | 187 |
| Ambiguous presence | Both / True | 15 | 15 | 47 | 68 | 121 | 178 |
| Ambiguous absence | Both / False | 0 | 13 | 0 | 0 | 0 | 0 |
| Ambiguous contradiction | Both / Both | 0 | 24 | 0 | 3 | 0 | 3 |
| Ambiguous | Both / None | 37 | 0 | 243 | 218 | 172 | 120 |
| Unconfirmed presence | None / True | 1222 | 1222 | 728 | 946 | 1130 | 1409 |
| Unconfirmed absence | None / False | 0 | 3453 | 0 | 0 | 0 | 0 |
| Unconfirmed contradiction | None / Both | 0 | 726 | 0 | 5 | 0 | 4 |
| Unexplained | None / None | 4238 | 59 | 2359 | 2137 | 1951 | 1658 |
| Pathway statistics | |||||||
| True positive | 184 | 184 | 200 | 238 | 272 | 310 | |
| True negative | 106 | 66 | 188 | 176 | 159 | 130 | |
| False positive | 13 | 53 | 19 | 31 | 48 | 77 | |
| False negative | 65 | 65 | 190 | 152 | 118 | 80 | |
| Precision | 93.40% | 77.64% | 91.32% | 88.48% | 85.00% | 80.10% | |
| Recall | 73.90% | 73.90% | 51.28% | 61.03% | 69.74% | 79.49% | |
| Accuracy | 78.80% | 67.93% | 64.99% | 69.35% | 72.19% | 73.70% | |
| F1 score | 82.51% | 75.72% | 65.68% | 72.23% | 76.62% | 79.79% | |
| Functional unit statistics | |||||||
| True positive | 1236 | 1236 | 969 | 1719 | 982 | 1742 | |
| True negative | 710 | 710 | 568 | 505 | 568 | 504 | |
| False positive | 117 | 117 | 100 | 139 | 100 | 140 | |
| False negative | 396 | 396 | 511 | 342 | 465 | 283 | |
| Precision | 91.35% | 91.35% | 90.65% | 92.52% | 90.76% | 92.56% | |
| Recall | 75.74% | 75.74% | 65.47% | 83.41% | 67.86% | 86.02% | |
| Accuracy | 79.14% | 79.14% | 71.55% | 82.22% | 73.29% | 84.15% | |
| F1 score | 82.81% | 82.81% | 76.03% | 87.73% | 77.66% | 89.17% | |
Fig. 2Evaluation of pathway and functional unit predictions using GROOLS reasoning. Accuracy for pathway and functional unit predictions was evaluated for Genome Properties (GP) and UniPathway (UPA) in 14 organisms. For Genome Properties, GROOLS falsehood mode is activated. For UniPathway, the specific mode is activated and annotations are from MicroScope
Fig. 3GROOLS results for cysteine biosynthesis in Kytococcus sedentarius. The cysteine biosynthesis in Kytococcus sedentarius was evaluated using Genome Properties and falsehood mode reasoning. Rounded boxes are Prior Knowledge concepts and ovals are Observations. Edges between Observations and Prior Knowledge concepts are labelled by the type of Observation whereas, between Prior Knowledge concepts, labels correspond to relation types. The color code corresponds to TRUE (green), FALSE (red), BOTH (purple) and NONE (white) values. For Prior Knowledge concepts, the colored left part corresponds to expectation value and the right part to prediction value. An interactive view of this figure is available at http://www.genoscope.cns.fr/agc/grools/paper/fig3.html
Fig. 4GROOLS results for asparagine biosynthesis in Acinetobacter baylyi. The asparagine biosynthesis in Acinetobacter baylyi was evaluated using UniPathway and UniProtKB. For GROOLS reasoning, the specific mode was used. Part a corresponds to the original pathway definition in UniPathway. Part b corresponds to the enhanced pathway definition with an additional variant. See Fig. 3 for the legend and these links (http://www.genoscope.cns.fr/agc/grools/paper/fig4A.html, http://www.genoscope.cns.fr/agc/grools/paper/fig4B.html) for an interactive view