| Literature DB >> 15960823 |
Lynette Hirschman1, Marc Colosimo, Alexander Morgan, Alexander Yeh.
Abstract
BACKGROUND: Our goal in BioCreAtIve has been to assess the state of the art in text mining, with emphasis on applications that reflect real biological applications, e.g., the curation process for model organism databases. This paper summarizes the BioCreAtIvE task 1B, the "Normalized Gene List" task, which was inspired by the gene list supplied for each curated paper in a model organism database. The task was to produce the correct list of unique gene identifiers for the genes and gene products mentioned in sets of abstracts from three model organisms (Yeast, Fly, and Mouse).Entities:
Mesh:
Year: 2005 PMID: 15960823 PMCID: PMC1869004 DOI: 10.1186/1471-2105-6-S1-S11
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1Abstract with gene list and synonym list excerpt.
Task 1B training and test data sets
| Training (noisy annotation) | 5000 | 5000 | 5000 |
| Development test (hand corrected) | 108 | 110 | 250 |
| Blind Test (extensively corrected) | 250 | 250 | 250 |
Task 1B results on Yeast gene list task
| user5_1B_1 | 0.819 | 0.948 | 0.721 | 442 | 24 | 171 |
| user5_1B_2 | 0.848 | 0.915 | 0.79 | 484 | 45 | 129 |
| user5_1B_3 | 0.848 | 0.969 | 0.754 | 462 | 15 | 151 |
| user6_1B_1 | 0.857 | 0.912 | 0.809 | 496 | 48 | 117 |
| user6_1B_2 | 0.858 | 0.907 | 0.814 | 499 | 51 | 114 |
| user8_1B_1 | 0.921 | 0.950 | 0.894 | 548 | 29 | 65 |
| user8_1B_2 | 0.910 | 0.950 | 0.873 | 535 | 28 | 78 |
| user16_1B_1 | 0.897 | 0.951 | 0.848 | 520 | 27 | 93 |
| user16_1B_2 | 0.899 | 0.966 | 0.840 | 515 | 18 | 98 |
| user16_1B_3 | 0.897 | 0.951 | 0.848 | 520 | 27 | 93 |
| user18_1B_1 | 0.904 | 0.94 | 0.871 | 534 | 34 | 79 |
| user19_1B_1 | 0.773 | 0.646 | 0.962 | 590 | 324 | 23 |
| user19_1B_2 | 0.77 | 0.642 | 0.962 | 590 | 329 | 23 |
| user19_1B_3 | 0.763 | 0.661 | 0.902 | 553 | 284 | 60 |
| user24_1B_1 | 0.897 | 0.917 | 0.878 | 538 | 49 | 75 |
Task 1B Results on Fly gene list task
| user5_1B_1 | 0.661 | 0.592 | 0.748 | 321 | 221 | 108 |
| user5_1B_2 | 0.612 | 0.659 | 0.571 | 245 | 127 | 184 |
| user5_1B_3 | 0.602 | 0.693 | 0.531 | 228 | 101 | 201 |
| user8_1B_1 | 0.665 | 0.638 | 0.695 | 298 | 169 | 131 |
| user8_1B_2 | 0.726 | 0.692 | 0.765 | 328 | 146 | 101 |
| user16_1B_1 | 0.781 | 0.728 | 0.841 | 361 | 135 | 68 |
| user16_1B_2 | 0.815 | 0.831 | 0.800 | 343 | 70 | 86 |
| user16_1B_3 | 0.787 | 0.744 | 0.834 | 358 | 123 | 71 |
| user18_1B_1 | 0.417 | 0.463 | 0.380 | 163 | 189 | 266 |
| user19_1B_1 | 0.284 | 0.224 | 0.389 | 167 | 580 | 262 |
| user23_1B_1 | 0.440 | 0.315 | 0.732 | 314 | 684 | 115 |
Task 1B results on Mouse gene list task
| user5_1B_1 | 0.672 | 0.767 | 0.598 | 323 | 98 | 217 |
| user5_1B_2 | 0.737 | 0.811 | 0.676 | 365 | 85 | 175 |
| user5_1B_3 | 0.619 | 0.798 | 0.506 | 273 | 69 | 267 |
| user6_1B_1 | 0.739 | 0.813 | 0.678 | 366 | 84 | 174 |
| user6_1B_2 | 0.745 | 0.785 | 0.709 | 383 | 105 | 157 |
| user8_1B_1 | 0.744 | 0.828 | 0.676 | 365 | 76 | 175 |
| user8_1B_2 | 0.661 | 0.635 | 0.689 | 372 | 214 | 168 |
| user16_1B_1 | 0.772 | 0.750 | 0.794 | 429 | 143 | 111 |
| user16_1B_2 | 0.777 | 0.807 | 0.750 | 405 | 97 | 135 |
| user16_1B_3 | 0.791 | 0.765 | 0.819 | 442 | 136 | 98 |
| user18_1B_1 | 0.686 | 0.728 | 0.648 | 350 | 131 | 190 |
| user19_1B_1 | 0.580 | 0.428 | 0.898 | 485 | 648 | 55 |
| user19_1B_2 | 0.571 | 0.418 | 0.898 | 485 | 674 | 55 |
| user19_1B_3 | 0.606 | 0.489 | 0.798 | 431 | 451 | 109 |
| user24_1B_1 | 0.767 | 0.735 | 0.802 | 433 | 156 | 107 |
| user24_1B_2 | 0.776 | 0.764 | 0.787 | 425 | 131 | 115 |
Figure 2Task 1B results for all organisms: precision vs. recall.
Lexical Resources: synonymy for Yeast, Mouse, Fly
| Yeast | 7,928 | 14,756 | 1.861 (1.01) | 1.001 (0.05) |
| Mouse | 52,594 | 130,548 | 2.482 (1.12) | 2.772 (2.57) |
| Fly | 27,749 | 81,711 | 2.944 (3.88) | 1.470 (0.97) |
Lexical resources for Yeast, Fly and Mouse: identifiers, terms, and ambiguity
| Yeast | 7,928 | 14,756 | 168 | 1.013 (0.14) | 2 | 0.00014 |
| Mouse | 52,594 | 130,548 | 1919 | 1.017 (0.18) | 205 | 0.00171 |
| Fly | 27,749 | 81,711 | 2736 | 1.085 (1.03) | 396 | 0.00650 |
Figure 3Distribution of ambiguous synonyms in Fly, Mouse and Yeast task 1B lexical resources.