| Literature DB >> 22759455 |
Jin-Dong Kim1, Ngan Nguyen, Yue Wang, Jun'ichi Tsujii, Toshihisa Takagi, Akinori Yonezawa.
Abstract
BACKGROUND: The Genia task, when it was introduced in 2009, was the first community-wide effort to address a fine-grained, structural information extraction from biomedical literature. Arranged for the second time as one of the main tasks of BioNLP Shared Task 2011, it aimed to measure the progress of the community since 2009, and to evaluate generalization of the technology to full text papers. The Protein Coreference task was arranged as one of the supporting tasks, motivated from one of the lessons of the 2009 task that the abundance of coreference structures in natural language text hinders further improvement with the Genia task.Entities:
Mesh:
Substances:
Year: 2012 PMID: 22759455 PMCID: PMC3384256 DOI: 10.1186/1471-2105-13-S11-S1
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Event types and their arguments for the GE task.
| Event Type | Primary Argument | Secondary Argument |
|---|---|---|
| Gene_expression | Theme(Protein) | |
| Transcription | Theme(Protein) | |
| Protein_catabolism | Theme(Protein) | |
| Phosphorylation | Theme(Protein) | Site(Entity) |
| Localization | Theme(Protein) | AtLoc(Entity), ToLoc(Entity) |
| Binding | Theme(Protein)+ | Site(Entity)+ |
| Regulation | Theme(Protein/Event), Cause(Protein/Event) | Site(Entity), CSite(Entity) |
| Positive_regulation | Theme(Protein/Event), Cause(Protein/Event) | Site(Entity), CSite(Entity) |
| Negative_regulation | Theme(Protein/Event), Cause(Protein/Event) | Site(Entity), CSite(Entity) |
The type of each filler entity is specified in parenthesis. Arguments that may be filled more than once per event are marked with "+".
Figure 1Event annotation example.
Figure 2Protein coreference annotation.
Statistics of the benchmark data sets for the GE and CO tasks.
| Training | Tuning | Test | ||||
|---|---|---|---|---|---|---|
| Item | Full | Full | Full | |||
| Articles | 800 | 5 | 150 | 5 | 260 | 4 |
| Words | 176146 | 29583 | 33827 | 30305 | 57256 | 21791 |
| Proteins | 9300 | 2325 | 2080 | 2610 | 3589 | 1712 |
| Coreferences | 2247 | - | 463 | - | 714 | - |
| Relative pronouns | 1193 | - | 254 | - | 349 | - |
| Pronouns | 738 | - | 149 | - | 269 | - |
| Definite NPs | 296 | - | 58 | - | 91 | - |
| Appositions | 9 | - | 1 | - | 3 | - |
| Others | 11 | - | 1 | - | 2 | - |
| Events | 8615 | 1695 | 1795 | 1455 | 3193 | 1294 |
| Gene_expression | 1738 | 527 | 356 | 393 | 722 | 280 |
| Transcription | 576 | 91 | 82 | 76 | 137 | 37 |
| Protein_catabolism | 110 | 0 | 21 | 2 | 14 | 1 |
| Phosphorylation | 169 | 23 | 47 | 64 | 139 | 50 |
| (with Site) | (67) | (0) | (27) | (12) | (81) | (15) |
| Localization | 265 | 16 | 53 | 14 | 174 | 17 |
| (with Loc) | (116) | (12) | (32) | (10) | (111) | (2) |
| Binding | 887 | 101 | 249 | 126 | 349 | 153 |
| (with Site) | (138) | (34) | (50) | (114) | (24) | (79) |
| Regulation | 961 | 152 | 173 | 123 | 292 | 96 |
| (with Site) | (57) | (8) | (39) | (17) | (11) | (3) |
| Positive_regulation | 2847 | 538 | 618 | 382 | 987 | 466 |
| (with Site) | (175) | (7) | (75) | (47) | (37) | (7) |
| Negative_regulation | 1062 | 247 | 196 | 275 | 379 | 194 |
| (with Site) | (27) | (9) | (6) | (18) | (10) | (7) |
The events and the coreferences annotations are used for the GE and CO tasks, respectively.
Statistics of annotations in different sections of text
| Item | Abstract | Full paper | |||||
|---|---|---|---|---|---|---|---|
| All | TIAB | R/D/C | Methods | Caption | |||
| Words | 267229 | 80962 | 3538 | 7878 | 43420 | 19406 | 6720 |
| Proteins | 14969 | 6580 | 336 | 597 | 3980 | 916 | 751 |
| (Density: P/W) | (5.60%) | (8.13%) | (9.50%) | (7.58%) | (9.17%) | (4.72%) | (11.18%) |
| Event triggers | 11057 | 3280 | 216 | 312 | 2659 | 136 | 173 |
| Events | 13603 | 4436 | 272 | 427 | 3234 | 198 | 278 |
| (Density: E/W) | (5.09%) | (5.48%) | (7.69%) | (5.42%) | (7.51%) | (1.02%) | (4.14%) |
| (Density: E/P) | (90.87%) | (67.42%) | (80.95%) | (71.52%) | (81.93%) | (21.62%) | (37.02%) |
| (Avg. Coord.: E/T) | (1.23) | (1.27) | (1.26) | (1.37) | (1.23) | ||
| Gene expression | 2816 | 1193 | 62 | 98 | 841 | 80 | 112 |
| Transcription | 795 | 204 | 7 | 7 | 140 | 30 | 20 |
| Protein catabolism | 145 | 3 | 0 | 0 | 3 | 0 | 0 |
| Phosphorylation | 355 | 137 | 12 | 12 | 101 | 10 | 2 |
| Localization | 492 | 47 | 3 | 15 | 22 | 7 | 0 |
| Binding | 1485 | 380 | 16 | 74 | 266 | 6 | 18 |
| Regulation | 1426 | 371 | 35 | 30 | 281 | 4 | 21 |
| Positive_regulation | 4452 | 1385 | 98 | 131 | 1087 | 15 | 54 |
| Negative_regulation | 1637 | 716 | 39 | 60 | 520 | 46 | 51 |
The Abstract column shows the statistics of the abstraction collection (1210 titles and abstracts), and the following columns show that of the full paper collection (14 full papers). TIAB = title and abstract, Intro. = introduction and background, R/D/C = results, discussions, and conclusions, Methods = methods, materials, and experimental procedures. Some minor sections, supporting information, supplementary material, and synopsis, are ignored. Density = relative density of annotation (P/W = Proteins/Words, E/W = Events/Words, and E/P = Events/Proteins). Avg. Coord = average number of coordinated events (E/T = Events/Triggers).
Figure 3Event distribution in different sections. The interval of the contour lines is 5%. For example, in the Methods and Caption sections, 40% of the events are of Gene_expression.
Teams who participated in the GE and CO tasks
| Team | '09 | Task | Background | reference |
|---|---|---|---|---|
| FAUST | √ | 3C | [ | |
| UMASS | √ | 1C | [ | |
| UTurku | √ | 1BI | [ | |
| MSR-NLP | √ | 4C | [ | |
| ConcordU | 2C | [ | ||
| UWMadison | √ | 2C | [ | |
| Stanford | √ | 3C+1.5L | [ | |
| BMI@ASU | 3C | [ | ||
| CCP-BTMG | √ | 3BI | [ | |
| TM-SCS | 1C | [ | ||
| XABioNLP | 4C | [ | ||
| HCMUS | 6L | [ | ||
| UUtah | 1C | [ | ||
| UZurich | 1C | [ | ||
| USzeged | 2C | - | ||
| UCD | 4C | - |
The '09 column indicates whether at least one team member participated in BioNLP-ST 2009. In Background column, C=Computer Scientist, BI=Bioinformatician, B=Biologist, L=Linguist
System profiles
| Team | NLP | GE. task | CO. task | ||||
|---|---|---|---|---|---|---|---|
| group | |||||||
| FAUST | SnowBall, CNLP | McCCJ+SD | Stacking (UMASS + Stanford) | - | |||
| UMASS | SnowBall, CNLP | McCCJ+SD | Joint infer., Dual Decomposition | - | |||
| UTurku | Porter | McCCJ+SD | SVM | SVM | SVM | SVM | SVM |
| MSR-NLP | Porter | McCCJ+SD, Enju | SVM | MaxEnt | rules | - | |
| ConcordU | - | McCCJ+SD | dic | rules | rules | rules | rules |
| UWMadison | Morpha, Porter | McCCJ+SD | Joint infer., SEARN | - | |||
| Stanford | Morpha, CNLP | McCCJ+SD | MaxEnt | MSTParser | - | ||
| BMI@ASU | Porter, WordNet | Stanford+SD | SVM | SVM | UTurku | - | |
| CCP-BTMG | Porter, WordNet | Stanford+SD | Subgraph Isomorphism | - | |||
| TM-SCS | Stanford | Stanford | dic | rules | rules | - | |
| XABioNLP | KAF | - | rules | - | |||
| HCMUS | OpenNLP | - | dic, rules | rules | - | ||
| UUtah | GTag | Enju | - | SVM | Reconcile | ||
| UZurich | LingPipe | Pro3Gres | - | rules | rules | ||
| USzeged | CTag, Morpha | McCCJ | - | rules | SVM | ||
| UCD | GTag, LingPipe | - | - | rules | SVM | ||
Proc.=Processing, Trig.=Trigger detection, Arg.=Argument linking, group=Argument grouping, Mark.=Markable detection, Coref.=Coreference linking, SnowBall=SnowBall Stemmer, CNLP=Stanford CoreNLP (tokenization), CTag=CNC Tagger, GTag=Genia Tagger, KAF=Kyoto Annotation Format McCCJ=McClosky-Charniak-Johnson Parser, Stanford=Stanford Parser, SD=Stanford Dependency Conversion.
Evaluation results of Task 1 on the (W)hole, (A)bstract, and (F)ull paper collections
| Team | Part | Simple Event | Binding | Regulation | All |
|---|---|---|---|---|---|
| UTurku09 | A | 64.21/77.45/70.21 | 40.06/49.82/44.41 | 35.63/45.87/40.11 | 46.73/58.48/51.95 |
| Miwa10 | A | 70.44 | 52.62 | 40.60 | 48.62/58.96/53.29 |
| W | 44.20/53.71/48.49 | ||||
| FAUST | A | ||||
| F | 75.58/78.23/76.88 | 40.97/44.70/42.75 | 47.92/58.47/52.67 | ||
| 66.16/81.04/72.85 | 45.53/58.09/51.05 | 39.38/58.18/46.97 | 50.00/67.53/57.46 | ||
| 77.36/71.93/74.55 | 29.63/36.36/32.65 | 41.77/50.77/45.83 | 51.57/56.94/54.13 | ||
| 76.34/80.00/78.13 | 39.00/43.82/41.27 | 32.66/45.64/38.07 | 45.98/57.20/50.98 | ||
| 48.00/75.00/58.54 | 46.88/68.18/55.56 | ||||
| W | 67.01/81.40/73.50 | 37.52/52.67/43.82 | 48.49/64.08/55.20 | ||
| UMass | A | 64.21/80.74/71.54 | 43.52/60.89/50.76 | 38.78/55.07/45.51 | 48.74/65.94/56.05 |
| F | 34.72/47.51/40.12 | ||||
| 64.21/80.74/71.54 | 43.52/60.89/50.76 | 38.78/55.07/45.51 | 48.74/65.94/56.05 | ||
| 79.25/82.35/80.77 | 44.44/48.00/46.15 | 35.44/56.00/43.41 | 51.57/65.08/57.54 | ||
| 75.95/83.97/79.76 | 34.00/40.96/37.16 | 32.29/42.89/36.85 | 45.09/56.04/49.98 | ||
| 48.00/85.71/61.54 | 46.88/78.95/58.82 | ||||
| W | 68.22/76.47/72.11 | 42.97/43.60/43.28 | 38.72/47.64/42.72 | 49.56/57.65/53.30 | |
| UTurku | A | 64.97/76.72/70.36 | 45.24/50.00/47.50 | 40.41/49.01/44.30 | 50.06/59.48/54.37 |
| F | 78.18/75.82/76.98 | 37.50/31.76/34.39 | 34.99/44.46/39.16 | 48.31/53.38/50.72 | |
| 64.97/76.72/70.36 | 45.24/50.00/47.50 | 40.41/49.01/44.30 | 50.06/59.48/54.37 | ||
| 84.91/67.16/75.00 | 25.93/30.43/28.00 | 30.38/30.77/30.57 | 47.80/45.24/46.48 | ||
| 77.48/78.99/78.23 | 36.00/30.51/33.03 | 34.68/45.54/39.38 | 47.19/54.18/50.44 | ||
| 60.00/75.00/66.67 | 59.38/50.00/54.29 | ||||
| W | 68.99/74.30/71.54 | 42.36/40.47/41.39 | 36.64/44.08/40.02 | 48.64/54.71/51.50 | |
| MSR-NLP | A | 65.99/74.71/70.08 | 43.23/44.51/43.86 | 37.14/45.38/40.85 | 48.52/56.47/52.20 |
| F | 78.18/73.24/75.63 | 40.28/32.77/36.14 | 35.52/41.34/38.21 | 48.94/50.77/49.84 | |
| 65.99/74.71/70.08 | 43.23/44.51/43.86 | 37.14/45.38/40.85 | 48.52/56.47/52.20 | ||
| 83.02/57.89/68.22 | 40.74/25.00/30.99 | 35.44/53.85/42.75 | 52.20/48.26/50.15 | ||
| 78.24/76.49/77.36 | 37.00/35.24/36.10 | 35.78/40.21/37.86 | 48.18/50.93/49.52 | ||
| 60.00/75.00/66.67 | 62.50/66.67/64.52 | ||||
| W | 59.99/ | 29.33/49.66/36.88 | 35.72/45.85/40.16 | 43.55/59.58/50.32 | |
| ConcordU | A | 56.51/ | 29.97/49.76/37.41 | 36.24/47.09/40.96 | 43.09/60.37/50.28 |
| F | 70.65/ | 27.78/49.38/35.56 | 34.58/43.22/38.42 | 44.71/57.75/50.40 | |
| 56.51/ | 29.97/49.76/37.41 | 36.24/47.09/40.96 | 43.09/60.37/50.28 | ||
| 58.49/ | 22.22/50.00/30.77 | 31.65/40.98/35.71 | 38.99/56.88/46.27 | ||
| 71.37/ | 28.00/53.85/36.84 | 33.76/44.12/38.25 | 43.99/58.76/50.32 | ||
| 72.00/94.74/81.82 | 65.62/51.22/57.53 | ||||
| W | 57.33/71.34/63.57 | 34.01/44.77/38.66 | 16.39/25.37/19.91 | 32.73/45.84/38.19 | |
| TM-SCS | A | 53.65/71.66/61.36 | 36.02/49.41/41.67 | 18.29/27.07/21.83 | 33.36/47.09/39.06 |
| F | 68.57/70.59/69.57 | 29.17/35.00/31.82 | 12.20/21.02/15.44 | 31.14/42.83/36.06 | |
| 53.65/71.66/61.36 | 36.02/49.41/41.67 | 18.29/27.07/21.83 | 33.36/47.09/39.06 | ||
| 71.70/67.86/69.72 | 18.52/31.25/23.26 | 12.66/27.78/17.39 | 33.33/49.07/39.70 | ||
| 66.03/69.76/67.84 | 32.00/37.65/34.59 | 11.38/19.68/14.42 | 29.44/41.20/34.34 | ||
| 62.50/57.14/59.70 | |||||
The full paper collection is further classified to titles/abstracts (F), introductions (F), results/dicussions/conclusions (F), and methods (F). Evaluated performance is reported in recall/precision/f-score. Some notable figures are underlined.
Evaluation results of Task 2 on the (W)hole, (A)bstract, and (F)ull paper collections
| Team | Sites (222) | Locations (66) | All (288) | |
|---|---|---|---|---|
| UT+DBCLS09 | A | 23.08/88.24/36.59 | 32.14/72.41/44.52 | |
| W | 32.88/70.87/44.92 | 36.36/75.00/48.98 | ||
| FAUST | A | 43.51/71.25/54.03 | ||
| F | 17.58/69.57/28.07 | - | 17.39/66.67/27.59 | |
| W | 31.98/71.00/44.10 | 32.99/72.52/45.35 | ||
| UMass | A | 42.75/70.00/53.08 | 36.92/77.42/50.00 | 40.82/72.07/52.12 |
| F | 16.48/75.00/27.03 | - | 16.30/75.00/26.79 | |
| W | 32.88/62.93/43.20 | 22.73/83.33/35.71 | 30.56/65.67/41.71 | |
| BMI@ASU | A | 37.40/67.12/48.04 | 23.08/83.33/36.14 | 32.65/70.33/44.60 |
| F | 26.37/55.81/35.82 | - | 26.09/55.81/35.56 | |
| W | 00.00/00.00/00.00 | 30.90/65.44/41.98 | ||
| UTurku | A | 00.00/00.00/00.00 | 32.14/69.23/43.90 | |
| F | - | |||
Evaluated performance is reported in recall/precision/f-score. Some notable figures are underlined.
Evaluation results of Site extraction for different event types
| Team | Phospho. (67) | Binding (84) | Reg. (71) | |
|---|---|---|---|---|
| UT+DBCLS09 | A | 71.43/71.43/71.43 | 04.76/50.00/08.70 | 12.96/58.33/21.21 |
| W | 71.64/84.21/77.42 | 05.95/38.46/10.31 | ||
| FAUST | A | 71.43/81.63/76.19 | 04.76/14.29/07.14 | |
| F | 06.35/66.67/11.59 | 23.53/44.44/30.77 | ||
| W | 76.12/79.69/77.86 | 04.76/36.36/08.42 | 22.54/64.00/33.33 | |
| UMass | A | 76.79/76.79/76.79 | 04.76/14.29/07.14 | 22.22/70.59/33.80 |
| F | 04.76/75.00/08.96 | |||
| W | 52.24/97.22/67.96 | 20.24/53.12/29.31 | 29.58/43.75/35.29 | |
| BMI@ASU | A | 53.57/96.77/68.97 | 31.48/51.52/39.08 | |
| F | 45.45/100.0/62.50 | 23.81/65.22/34.88 | 23.53/26.67/25.00 | |
| W | 28.17/44.44/34.48 | |||
| UTurku | A | 09.52/18.18/12.50 | 31.48/54.84/40.00 | |
| F | 63.64/100.0/77.78 | 17.65/21.43/19.35 | ||
Evaluated performance is reported in recall/precision/f-score. Some notable figures are underlined.
Evaluation results of Task 3 on the (W)hole, (A)bstract, and (F)ull paper collections
| Team | Negation | Speculation | All | |
|---|---|---|---|---|
| ConcordU09 | A | 14.98/50.75/23.13 | 16.83/50.72/25.27 | 15.86/50.74/24.17 |
| W | 17.86/32.54/23.06 | |||
| UTurku | A | 19.23/38.46/25.64 | ||
| F | 15.00/23.08/18.18 | 19.28/30.85/23.73 | ||
| W | 18.77/44.26/26.36 | 19.97/40.89/26.83 | ||
| ConcordU | A | 18.06/46.59/26.03 | 20.46/42.79/27.68 | |
| F | 21.21/38.24/27.29 | |||
Evaluated performance is reported in recall/precision/f-score. Some notable figures are underlined.
Evaluation results of the CO task
| Team | Relative pronoun | Pronoun | DNP | All |
|---|---|---|---|---|
| UUtah | 56.0/71.2/62.7 | 12.0/79.0/20.8 | ||
| UZurich | 46.7/71.4/56.5 | 04.1/12.5/06.2 | 21.5/55.5/31.0 | |
| ConcordU | - | - | 19.4/63.2/29.7 | |
| UTurku | 29.3/73.3/41.9 | 12.8/72.7/21.8 | 01.4/14.3/02.5 | 14.4/67.2/23.8 |
| USzeged | - | - | - | 03.2/03.5/03.3 |
| UCD | - | - | - | 00.7/00.3/00.4 |
Evaluated performance is reported in recall/precision/f-score. DNP = definite noun phrase. Some notable figures are underlined.