| Literature DB >> 26202448 |
Robert Bossy, Wiktoria Golik, Zorana Ratkovic, Dialekti Valsamou, Philippe Bessières, Claire Nédellec.
Abstract
BACKGROUND: We present the two Bacteria Track tasks of BioNLP 2013 Shared Task (ST): Gene Regulation Network (GRN) and Bacteria Biotope (BB). These tasks were previously introduced in the 2011 BioNLP-ST Bacteria Track as Bacteria Gene Interaction (BI) and Bacteria Biotope (BB). The Bacteria Track was motivated by a need to develop specific BioNLP tools for fine-grained event extraction in bacteria biology. The 2013 tasks expand on the 2011 version by better addressing the biological knowledge modeling needs. New evaluation metrics were designed for the new goals. Moving beyond a list of gene interactions, the goal of the GRN task is to build a gene regulation network from the extracted gene interactions. BB'13 is dedicated to the extraction of bacteria biotopes, i.e. bacterial environmental information, as was BB'11. BB'13 extends the typology of BB'11 to a large diversity of biotopes, as defined by the OntoBiotope ontology. The detection of entities and events is tackled by distinct subtasks in order to measure the progress achieved by the participant systems since 2011.Entities:
Mesh:
Year: 2015 PMID: 26202448 PMCID: PMC4511173 DOI: 10.1186/1471-2105-16-S10-S1
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1Sketching out regulatory bacterial gene transcription molecular mechanisms. (1) The RNA polymerase must bind to a transcription factor called sigma factor, to be able to transcribe DNA to RNA. (2.a) The sigma factor specifically recognizes a transcription promoter DNA sequence motif, upstream part of the gene, and drives the RNA polymerase to it. (2.b) DNA is copied into RNA from the Transcriptional Start Site (TSS), while the sigma factor is released and available for another RNA polymerase. (3.a) A transcriptional regulator binds to a specific motif around the promoter site, and in this example (3.b) activates transcription.
Figure 2From molecular mechanisms to biological interaction. 1) the network arc between gerE and cwlH in Ex. 3 is inferred from the Interaction:Regulation event. Since the interaction target is an event, the target is reduced to its participant (cwlH); 2) a) Bind_to and Site_of represent low-level biological phenomena from which we can deduce the Master_of_promoter relation of 2) b); c) the Interaction:Binding relation can be deduced from Master_of_Promoter and Promoter_of; 3) the combined interactions from examples 1) and 2) produce the network with 3 edges and two arcs.
Figures of the GRN corpus.
| Sentences | 201 |
|---|---|
| Words | 4 936 |
| Molecular events | 495 |
| Biological Interactions | 334 |
| Events | 819 |
| Entities | 917 |
| Network nodes | 133 |
| Network edges | 242 |
Figure 3Typology of errors in the GRN network.
Figure 4Examples of the provided information and the expected prediction in BB'13.
Figures of the Bacteria Biotope corpus annotation.
| Document | 131 |
|---|---|
| Word | 43,851 |
| Bacteria | 2,220 |
| Geographical | 288 |
| Habitat | 2,675 |
| OntoBiotope cat. | 2,097 |
| 5,183 | |
| Localization | 1,837 |
| Part of Host | 475 |
| 2,312 | |
Distribution of discontinuous annotations in BB'13 corpus.
| Linguistic structure | Frequency |
|---|---|
| and | 68 |
| or | 11 |
| enumeration | 6 |
| range | 5 |
| combination of insertion and coordination | 1 |
| tmesis (insertion) | 6 |
| Total | 97 |
Distribution of the event arguments in the text.
| % intra sentence events | % intra paragraph events | |
|---|---|---|
| Localization | 0.54 | 0.94 |
| PartOf | 0.63 | 0.98 |
| All events | 0.56 | 0.95 |
Rate of candidate arguments that belong to an event (Localization or PartOf).
| Percentage of entities involved in events | 63% | 44% | 65% | 55% |
Figure 5OntoBiotope category assignment with AlvisAE annotation editor.
Automatic pre-annotation scores by AlvisNLP.
| SER | Recall | Precision | F1 | |
|---|---|---|---|---|
| Detection & categorization | 0.44 | 0.58 | 0.78 | 0.66 |
| Entity detection | 0.37 | 0.64 | 0.86 | 0.74 |
| Entity categorization | 0.34 | 0.67 | 0.90 | 0.77 |
| Categorization with reference entity | NA | 0.90 | 0.90 | 0.90 |
Figure 6Example of two possible matches between the reference and the prediction.
Official scores of the GRN task.
| Participant | Submission | ML algorithm | SER | Recall | Precision | Shape SER | Effect SER |
|---|---|---|---|---|---|---|---|
| U. of Ljubljana | Interaction +low-level | LC-CRF | 0.73 | 0.34 | 0.68 | 0.60 | 0.74 |
| K.U.Leuven | Network | SVM | 0.83 | 0.23 | 0.50 | 0.64 | 0.83 |
| TEES-2.1 | Interaction + low-level | SVM | 0.86 | 0.23 | 0.54 | 0.74 | 0.84 |
| IRISA | Interaction | kNN | 0.91 | 0.41 | 0.40 | 0.51 | 0.87 |
| EVEX | Network | - | 0.92 | 0.13 | 0.44 | 0.79 | 0.91 |
Participation to the bacteria biotope task.
| LIMSI | LIPN | TEES 2.1 | IRISA | Boun | |
|---|---|---|---|---|---|
| Sub task 1 | ✓ | ✓ | ✓ | ✓ | |
| Sub task 2 | ✓ | ✓ | ✓ | ✓ | |
| Sub task 3 | ✓ | ✓ |
Sub-task 1 results in BB'13.
| Official results (a) | Habitat Detection (b) | Category assignment with relaxed habitat boundaries (c) | ||||
|---|---|---|---|---|---|---|
| LIPN | 0.661 | 0.608 | 0.629 | 0.639 | 0.550 | 0.718 |
| Boun | 0.676 | 0.595 | 0.617 | 0.653 | 0.554 | 0.715 |
| LIMSI | 0.678 | 0.444 | 0.467 | 0.714 | 0.637 | 0.496 |
| IRISA | 0.932 | 0.574 | 0.895 | 0.603 | 0.814 | 0.668 |
Sub-task 1 entity detection results, with relaxed boundaries.
| SER | Recall | Precision | F1 | |
|---|---|---|---|---|
| LIMSI | 0.308 | 0.716 | 0.920 | 0.819 |
| Boun | 0.479 | 0.824 | 0.804 | 0.814 |
| LIPN | 0.487 | 0.803 | 0.803 | 0.803 |
| IRISA | 0.775 | 0.909 | 0.601 | 0.724 |
Official results of sub-task 2 for BB'13 task.
| Participant | Recall | Precision | F1 | F1 PartOf | F1 Localization |
|---|---|---|---|---|---|
| TEES 2.1 | 0.28 | 0.82 | 0.42 | 0.22 | 0.49 |
| IRISA | 0.36 | 0.46 | 0.40 | 0.2 | 0.45 |
| Boun | 0.21 | 0.38 | 0.27 | 0.2 | 0.29 |
| LIMSI | 0.04 | 0.19 | 0.06 | 0.0 | 0.07 |
Results of sub-task 2 measured on intra-sentence events.
| Intra-sentence | Scores | Difference with the full task scores | ||||
|---|---|---|---|---|---|---|
| TEES-2.1 | 0.51 | 0.82 | 0.63 | +0.23 | 0 | +0.21 |
| IRISA | 0.37 | 0.38 | 0.37 | +0.01 | -0.08 | -0.03 |
| Boun | 0.19 | 0.29 | 0.23 | -0.02 | -0.09 | -0.04 |
| LIMSI | 0.03 | 0.17 | 0.05 | -0.01 | -0.02 | -0.01 |
| TEES-2.1 | 0.66 | 0.82 | 0.73 | +0.31 | 0 | +0.24 |
| IRISA-TexMex | 0.48 | 0.38 | 0.42 | +0.04 | -0.08 | -0.03 |
| Boun | 0.18 | 0.27 | 0.22 | -0.05 | -0.11 | -0.07 |
| LIMSI | 0.03 | 0.30 | 0.06 | -0.01 | 0.01 | -0.01 |
Scores on sub-task 3 of BB'13 Task.
| Official scores | Scores with relaxed biotope boundaries | Scores with relaxed bacteria boundaries | |||||||
|---|---|---|---|---|---|---|---|---|---|
| TEES 2.1 | 0.12 | 0.18 | 0.14 | 0.41 | 0.61 | 0.49 | 0.28 | 0.52 | 0.36 |
| LIMSI | 0.04 | 0.12 | 0.06 | 0.09 | 0.82 | 0.15 | 0.07 | 0.71 | 0.10 |
Scores on biotope detection in sub-task 3 of BB'13 Task.
| Participant | SER | Recall | Precision | F1 |
|---|---|---|---|---|
| LIMSI | 0.32 | 0.68 | 1.00 | 0.81 |
| TEES 2.1 | 0.50 | 0.57 | 0.76 | 0.65 |
Scores on intra-sentence Localization extraction in sub-task 3 with relaxed entity boundaries.
| Participant | Recall | Precision | F1 |
|---|---|---|---|
| LIMSI | 0.08 | 0.79 | 0.15 |
| TEES 2.1 | 0.72 | 0.56 | 0.63 |