| Literature DB >> 22759457 |
Robert Bossy1, Julien Jourde, Alain-Pierre Manine, Philippe Veber, Erick Alphonse, Maarten van de Guchte, Philippe Bessières, Claire Nédellec.
Abstract
BACKGROUND: We present the BioNLP 2011 Shared Task Bacteria Track, the first Information Extraction challenge entirely dedicated to bacteria. It includes three tasks that cover different levels of biological knowledge. The Bacteria Gene Renaming supporting task is aimed at extracting gene renaming and gene name synonymy in PubMed abstracts. The Bacteria Gene Interaction is a gene/protein interaction extraction task from individual sentences. The interactions have been categorized into ten different sub-types, thus giving a detailed account of genetic regulations at the molecular level. Finally, the Bacteria Biotopes task focuses on the localization and environment of bacteria mentioned in textbook articles. We describe the process of creation for the three corpora, including document acquisition and manual annotation, as well as the metrics used to evaluate the participants' submissions.Entities:
Mesh:
Year: 2012 PMID: 22759457 PMCID: PMC3384254 DOI: 10.1186/1471-2105-13-S11-S3
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1Example of gene renaming relations.
Figure 2Example of gene interaction relations.
Figure 3Common types of renaming sentences.
Rename corpus size.
| Documents | (1,146 + 246) 1,392 | 252 (15%) |
| Gene names | 18,503 | 3,375 (15%) |
| Renamings | 373 | 88 (24%) |
Negative examples of the Rename task.
Entity names of diverse types (gene, protein, operon) are underlined.
Positive examples of the Rename task.
The underlined names are the former names, and the emphasized names are the new names.
List of molecular entities and actions in the Gene Interaction corpus.
| cotA | |
| sigX-ypuN | |
| class III heat shock genes | |
| yvyD gene product | |
| CotA | |
| SigK RNA polymerase | |
| DNA-binding protein | |
| upstream site | |
| promoter regions | |
| regulon | |
| activity | level | presence | |
| expression | |
| transcription | |
List of relations in the Gene Interaction corpus.
| the | |
Distribution of entities and actions in the Gene Interaction corpus.
| Documents | (105+15) 120 | 42 |
| 219 | 85 | |
| 173 | 56 | |
| 53 | 21 | |
| 49 | 10 | |
| 45 | 22 | |
| 43 | 14 | |
| 29 | 6 | |
| 22 | 8 | |
| 19 | 4 | |
| 12 | 3 | |
| 11 | 2 | |
| 10 | 3 | |
| 6 | 5 |
Distribution of the relations in the Gene Interaction corpus.
| 208 | 64 | |
| 173 | 47 | |
| 44 | 8 | |
| 39 | 4 | |
| 36 | 4 | |
| 36 | 8 | |
| 23 | 6 | |
| 17 | 2 | |
| 14 | 2 | |
| 12 | 1 |
Figure 4Example of bacteria biotopes relations.
Figure 5Example of a coreference.
Bacteria Biotope corpus size.
| Documents | 78 (65 + 13) | 27 (26%) |
| Bacteria | 538 | 121 (18%) |
| Environment | 62 | 16 (21%) |
| Host | 486 | 101 (17%) |
| HostPart | 217 | 84 (28%) |
| Geographical | 111 | 25 (18%) |
| Water | 70 | 21 (23%) |
| Food | 46 | 0 (0%) |
| Medical | 24 | 2 (8%) |
| Soil | 26 | 20 (43%) |
| Coreferences | 484 | 100 (17%) |
| Total entities | 1,580 | 390 |
| Localization | 998 | 250 (20%) |
| Part of Host | 204 | 78 (28%) |
| Total relations | 1,202 | 328 |
Inter Annotator Agreement for the Bacteria Biotopes corpus.
| 81 | ||||
| 88 | 76 | 77 | 77 | |
| 85 | 78 | 83 | 80 | |
| 88 | 73 | 75 | 74 | |
| 54 | 51 | 52 | 52 | |
| 69 | 46 | 52 | 49 | |
| 87 | 74 | 80 | 77 | |
| 83 | 75 | 67 | 71 | |
| 83 | 71 | 70 | 71 | |
| 72 | 76 | 74 | ||
| 73 | 75 | 74 | ||
Bacteria entity recall of the participants of the Bacteria Biotope task.
| Bibliome | 84 |
| JAIST | 55 |
| UTurku | 16 |
Participant scores at the Rename task.
| Univ. of Turku | |||
| Concordia Univ. | 74.4 | 65.9 | 69.9 |
| INRA | 57.0 | 73.9 | 64.4 |
University of Turku global scores at the Gene Interaction task.
| Global Precision | 85 |
| Global Recall | 71 |
| Global F-score | 77 |
| Interaction Precision | 75 |
| Interaction Recall | 56 |
| Interaction F-score | 64 |
University of Turku detailed scores at the Gene Interaction task.
| Global | 85 | 71 | 77 |
| 94 | 92 | 93 | |
| 75 | 75 | 75 | |
| 75 | 56 | 64 | |
| 100 | 100 | 100 | |
| 100 | 100 | 100 | |
| 100 | 100 | 100 | |
| 100 | 50 | 67 | |
| 100 | 17 | 29 | |
| 67 | 50 | 57 | |
| 100 | 100 | 100 |
Figure 6Examples of commonly missed gene interactions.
Results of the participants of the Bacteria Biotope task.
| Bibliome | 45 | ||
| JAIST | 27 | 42 | 33 |
| UTurku | 17 | 26 |
Location entity recall of the participants of the Bacteria Biotope task.
| Host | 82 | 49 | 28 |
| Host part | 72 | 36 | 28 |
| Geographical | 29 | 60 | 53 |
| Environment | 53 | 10 | 11 |
| Water | 83 | 32 | 2 |
| Soil | 86 | 37 | 34 |
Relaxed scores of the participants of the Bacteria Biotope task.
| Bibliome | 54 | 54 | 54 | +9 |
| JAIST | 29 | 45 | 35 | +2 |
| UTurku | 19 | 56 | 28 | +2 |
Detailed scores of the participants of the Bacteria Biotope task.
| Host | 48 | 30 | 43 | 36 | 15 | 23 | |||
| HostPart | 42 | 18 | 28 | 9 | 40 | 15 | |||
| Geographical | 13 | 38 | 19 | 35 | 32 | 36 | |||
| Environment | 24 | 5 | 0 | 0 | 6 | 11 | |||
| Water | 19 | 27 | 23 | 1 | 7 | 2 | |||
| Soil | 21 | 42 | 28 | 12 | 21 | 15 | |||
| PartOf | 23 | 79 | 36 | 31 | 61 | 41 | |||