| Literature DB >> 15960835 |
Simon B Rice1, Goran Nenadic, Benjamin J Stapley.
Abstract
BACKGROUND: Text mining has spurred huge interest in the domain of biology. The goal of the BioCreAtIvE exercise was to evaluate the performance of current text mining systems. We participated in Task 2, which addressed assigning Gene Ontology terms to human proteins and selecting relevant evidence from full-text documents. We approached it as a modified form of the document classification task. We used a supervised machine-learning approach (based on support vector machines) to assign protein function and select passages that support the assignments. As classification features, we used a protein's co-occurring terms that were automatically extracted from documents.Entities:
Mesh:
Substances:
Year: 2005 PMID: 15960835 PMCID: PMC1869015 DOI: 10.1186/1471-2105-6-S1-S22
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1Subtasks in BioCreAtIvE Task 2. We considered Task 2 subtasks as nested problems: for solving Task 2.3, tasks 2.2 and 2.1 are approached as subtasks.
Figure 2Examples of terminological features (synterms). Various lexical representations of terms were conflated in sets called synterms, which were used as features for assignment of protein function.
Figure 3Extracting terminological features. Term features were extracted through several steps, including extraction of term candidates and acronyms, their inflectional and orthographic normalisation, and estimation of termhoods for synterms.
Task 2.1: precision of passage selection. The number and precision of selected passages (paragraphs) that were relevant to a given (protein,GO term) pair.
| relevance to | Submission 1 | Submission 2 | |||
| passages | Precision | Passages | precision | ||
| high | high | 59 | 12.9% | 125 | 14.0% |
| high | general | 19 | 4.1% | 38 | 4.2% |
| general | high | 28 | 6.1% | 69 | 7.7% |
Task 2.2: precision of GO term prediction and passage selection from a single specified document. The number and precision of selected pairs (GO term,passage) that were relevant to a given protein. The prediction of the GO terms was based only on a specified document.
| relevance of | Submission 1 | Submission 2 | |||
| pairs | precision | Pairs | precision | ||
| high | high | 3 | 0.6% | 16 | 3.3% |
| high | general | 2 | 0.4% | 2 | 0.4% |
| general | high | 8 | 1.6% | 26 | 5.4% |
Task 2.3: precision of GO term prediction and passage selection from a corpus. The number and precision of selected pairs (GO term,passage) that were relevant to a given protein. The prediction of GO terms was based on a set of retrieved documents.
| relevance of | Submission 1 | Submission 2 | |||
| pairs | precision | pairs | precision | ||
| high | high | 11 | 30.6% | 11 | 21.2% |
| high | general | 0 | 0% | 0 | 0% |
| general | high | 7 | 19.4% | 6 | 11.5% |
Results for specific proteins from Task 2.3. Individual prediction results for proteins evaluated in Task 2.3.
| protein | predictions | ||||
| PAC | name | evaluated | high | high | general |
| Q99728 | 14 | 10 (71.4%) | 0 | 4 (28.6%) | |
| P08247 | 1.1.1.1.1 synaptophysin | 3 | 0 | 0 | 3 (100%) |
| P30153 | 6 | 1 (16.7%) | 0 | 0 | |
| Q9BYW1 | 11 | 0 | 0 | 0 | |