| Literature DB >> 26511083 |
Pietro Di Lena1, Giacomo Domeniconi2, Luciano Margara3, Gianluca Moro4.
Abstract
BACKGROUND: Functional annotation of genes and gene products is a major challenge in the post-genomic era. Nowadays, gene function curation is largely based on manual assignment of Gene Ontology (GO) annotations to genes by using published literature. The annotation task is extremely time-consuming, therefore there is an increasing interest in automated tools that can assist human experts.Entities:
Mesh:
Year: 2015 PMID: 26511083 PMCID: PMC4625458 DOI: 10.1186/s12859-015-0777-8
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Fig. 1GOTA workflow Graphical representation of Eq. 1
Performances over a test set of 15,000 publications
| Method | Info | IT | CAFA | BC | TREC | ||
|---|---|---|---|---|---|---|---|
|
|
|
|
|
|
| ||
| GOTA | PM |
|
|
|
|
|
|
| GOTA | T+A | 0.42 |
|
| 0.68 | 0.39 | 0.45 |
| GOTA | T | 0.41 | 0.63 | 0.42 | 0.68 | 0.39 | 0.44 |
| RandFR | N/A | 0.20 | 0.33 | 0.20 | 0.33 | 0.18 | 0.15 |
| RandIC | N/A | 0.21 | 0.27 | 0.18 | 0.31 | 0.03 | 0.08 |
| GOTA | PM | 0.37 | 0.64 | 0.41 | 0.67 | 0.38 | 0.44 |
| GOTA | T+A | 0.35 | 0.62 | 0.40 | 0.66 | 0.36 | 0.41 |
| GOTA | T | 0.35 | 0.62 | 0.40 | 0.66 | 0.36 | 0.41 |
| GOTA | PM | 0.28 | 0.41 | 0.30 | 0.49 | 0.16 | 0.17 |
| GOTA | T+A | 0.24 | 0.37 | 0.27 | 0.46 | 0.11 | 0.12 |
| GOTA | T | 0.22 | 0.35 | 0.26 | 0.44 | 0.09 | 0.10 |
Method used for the classification. RandFR and RandIC are baseline predictors, based on the distribution of GO terms in the training set
Informations used in prediction: PM = title, abstract, references and publication year (PubMed); T+A = title and abstract; T = title; N/A = no information
Metrics definitions are in the “Evaluation metrics” section. In top section of the table, for each metric, the best result is highlighted in italic
Performance comparison over species-specific knowledge bases
| Species | KB | Info | IT | CAFA | BC | TREC | ||
|---|---|---|---|---|---|---|---|---|
|
|
|
|
|
|
| |||
| Human | Human | PM |
|
|
|
|
|
|
| Human | Full | PM | 0.44 |
| 0.44 |
| 0.44 | 0.48 |
| Human | Human | T | 0.42 | 0.60 | 0.45 | 0.66 | 0.46 | 0.47 |
| Human | Full | T | 0.44 | 0.61 | 0.44 | 0.68 | 0.45 | 0.47 |
| Mouse | Mouse | PM |
|
|
|
|
|
|
| Mouse | Full | PM |
| 0.61 | 0.44 | 0.66 | 0.43 | 0.42 |
| Mouse | Mouse | T | 0.42 | 0.63 | 0.44 | 0.65 | 0.43 | 0.42 |
| Mouse | Full | T | 0.44 | 0.60 | 0.43 | 0.64 | 0.42 | 0.41 |
| Rat | Rat | PM |
|
|
|
|
| 0.44 |
| Rat | Full | PM | 0.34 | 0.61 | 0.37 | 0.67 | 0.33 | 0.42 |
| Rat | Rat | T | 0.37 | 0.62 | 0.40 | 0.67 | 0.34 | 0.42 |
| Rat | Full | T | 0.33 | 0.61 | 0.37 | 0.66 | 0.33 | 0.42 |
| Yeast | Yeast | PM |
|
|
|
|
|
|
| Yeast | Full | PM | 0.43 | 0.70 |
| 0.75 | 0.39 | 0.49 |
| Yeast | Yeast | T | 0.41 | 0.68 | 0.44 | 0.74 | 0.37 | 0.45 |
| Yeast | Full | T | 0.41 | 0.68 | 0.44 | 0.73 | 0.35 | 0.46 |
Only publications related to the specified Species are considered for the evaluation. Human: 3575 publications; Mouse: 2825 publications; Rat: 2380 publications; Yeast: 1290 publications
Knowledge base used for prediction. Full = all available publications in the KB. Human/Mouse/Rat/Yeast = only publications related to Human/Mouse/Rat/Yeast
Informations used in prediction: PM = title, abstract, references and publication year (PubMed); T = title
Metrics definitions are in the “Evaluation metrics” section. For each metric and Species, the best result is highlighted in italic
Performance comparison with different approaches
| Method | Info | IT | CAFA | BC | TREC | ||
|---|---|---|---|---|---|---|---|
|
|
|
|
|
|
| ||
| GOTA | PM |
|
|
|
|
|
|
| GOTA | T+A | 0.37 | 0.68 | 0.41 |
| 0.35 | 0.48 |
| GOTA | T | 0.39 | 0.66 | 0.39 | 0.70 | 0.34 | 0.44 |
| GOCat | T+A | 0.34 | 0.64 | 0.37 | 0.69 | 0.29 | 0.40 |
| GOCat | T | 0.30 | 0.64 | 0.36 | 0.69 | 0.28 | 0.40 |
| RandFR | N/A | 0.08 | 0.21 | 0.10 | 0.23 | 0.03 | 0.05 |
| RandIC | N/A | 0.22 | 0.23 | 0.19 | 0.30 | 0.00 | 0.01 |
Method used for prediction. RandFR and RandIC are baseline predictors, based on the distribution of GO terms in the training set
Informations used in prediction: PM = title, abstract, references and publication year (PubMed); T+A = title and abstract; T = title; N/A = no information
Metrics definitions are in the “Evaluation metrics” section. For each metric, the best result is highlighted in italic
1-to-1 comparison between GOTA (PM) and GOCat (T+A)
| Metric | GOTA = GOCat | GOTA > GOCat | GOTA < GOCat |
|---|---|---|---|
|
| 0.44 (0.13) | 0.36 (0.15) | 0.20 (0.07) |
|
| 0.29 (0.16) | 0.41 (0.15) | 0.30 (0.07) |
|
| 0.41 (0.25) | 0.36 (0.17) | 0.23 (0.09) |
|
| 0.42 (0.13) | 0.37 (0.16) | 0.20 (0.07) |
|
| 0.61 (0.21) | 0.25 (0.17) | 0.13 (0.08) |
Metrics definitions are in the “Evaluation metrics” section
Fraction of publications on which GOTA and GOCat get exactly the same score. In parenthesis, fraction of publications on which the score is equal to 1 (maximum)
Fraction of publications on which GOTA gets a score strictly higher than GOCat’s. In parenthesis, fraction of publications on which the score is equal to 1 (maximum)
Fraction of publications on which GOCat gets a score strictly higher than GOTA’s. In parenthesis, fraction of publications on which the score is equal to 1 (maximum)