| Literature DB >> 15960826 |
Daniel Hanisch1, Katrin Fundel, Heinz-Theodor Mevissen, Ralf Zimmer, Juliane Fluck.
Abstract
BACKGROUND: Identification of gene and protein names in biomedical text is a challenging task as the corresponding nomenclature has evolved over time. This has led to multiple synonyms for individual genes and proteins, as well as names that may be ambiguous with other gene names or with general English words. The Gene List Task of the BioCreAtIvE challenge evaluation enables comparison of systems addressing the problem of protein and gene name identification on common benchmark data.Entities:
Mesh:
Substances:
Year: 2005 PMID: 15960826 PMCID: PMC1869006 DOI: 10.1186/1471-2105-6-S1-S14
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1First example of impact of token classes. Candidate synonym I is a correct synonym match, whereas candidate II is not. Appropriate weighting of tokens allows to detect the differences correctly.
Figure 2Second example of impact of token classes. Both candidates are wrong matches because the significant token "receptor" is present in the text. Naive matching would accept both candidates.
Results of the BioCreative competition. Summary of submitted search runs. The table contains details of parameter sets (D# denotes disambiguation threshold, O(+/-) use of organism disambiguation, S(+/-) significance of dash at end of synonym) and resulting performance. Values where ProMiner achieved the best rank of all participants are printed in bold.
| Fly | Mouse | Yeast | ||||||
| 1 | 2 | 3 | 1 | 2 | 3 | 1 | 2 | |
| Disambiguation (D#) | D3 | D1 | D3 | D3 | D1 | D5 | D3 | D1 |
| Organism O(+/-) | O+ | O+ | O+ | O- | O+ | O- | O- | O- |
| Dash significance S(+/-) | S- | S+ | S+ | S- | S+ | S+ | S- | S- |
| F-measure | 0.781 | 0.787 | 0.771 | 0.776 | 0.897 | 0.899 | ||
| Precision | 0.728 | 0.744 | 0.752 | 0.809 | 0.766 | 0.951 | 0.966 | |
| Recall | 0.8 | 0.834 | 0.79 | 0.746 | 0.814 | 0.848 | 0.84 | |
Figure 3Impact of ProMiner components for the fly organism. For each result, the F-measure as determined from the published gold-standards is given in brackets. Details on this figure are provided in the Results and Discussion section.
Figure 4Impact of ProMiner components for the mouse organism. For each result, the F-measure as determined from the published gold-standards is given in brackets. Details on this figure are provided in the Results and Discussion section.
Examples for false positive matches in the BioCreative competition. Samples of false positive matches in the best submitted mouse search run. Detected matches are printed in bold.
| Description | Examples |
| Unspecific synonym | |
| Wrong context | |
| Unknown ambiguity | high dose set at MTD or |
| Unclear, why marked as incorrect |