| Literature DB >> 32712157 |
Alejandro Piad-Morffis1, Yoan Gutiérrez2, Yudivian Almeida-Cruz3, Rafael Muñoz4.
Abstract
The massive amount of biomedical information published online requires the development of automatic knowledge discovery technologies to effectively make use of this available content. To foster and support this, the research community creates linguistic resources, such as annotated corpora, and designs shared evaluation campaigns and academic competitive challenges. This work describes an ecosystem that facilitates research and development in knowledge discovery in the biomedical domain, specifically in Spanish language. To this end, several resources are developed and shared with the research community, including a novel semantic annotation model, an annotated corpus of 1045 sentences, and computational resources to build and evaluate automatic knowledge discovery techniques. Furthermore, a research task is defined with objective evaluation criteria, and an online evaluation environment is setup and maintained, enabling researchers interested in this task to obtain immediate feedback and compare their results with the state-of-the-art. As a case study, we analyze the results of a competitive challenge based on these resources and provide guidelines for future research. The constructed ecosystem provides an effective learning and evaluation environment to encourage research in knowledge discovery in Spanish biomedical documents.Entities:
Keywords: Annotated corpora; Entity recognition; Knowledge discovery; Natural language processing; Relation extraction; Semantic annotation models
Mesh:
Year: 2020 PMID: 32712157 PMCID: PMC7377985 DOI: 10.1016/j.jbi.2020.103517
Source DB: PubMed Journal: J Biomed Inform ISSN: 1532-0464 Impact factor: 6.317
Comparison between the eHealth-KD v2 corpus and other corpora with respect to the characteristics that define our proposal.
| Characteristics | Ixa MedGS | DrugSemantics | DDI | Bio AMR | YAGO | ConceptNet | eHealth-KD v1 | eHealth-KD v2 | |
|---|---|---|---|---|---|---|---|---|---|
| 1 | general-purpose annotation | ✓ | ✓ | ✓ | ✓ | ✓ | |||
| 2 | independence of syntax | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | |
| 3 | ontological knowledge | ✓ | ✓ | ✓ | ✓ | ✓ | |||
| 4 | composite concepts | ✓ | ✓ | ✓ | |||||
| 5 | attributes | ✓ | ✓ | ✓ | ✓ | ✓ | |||
| 6 | contextual relations | ✓ | ✓ | ||||||
| 7 | causality/ entailment | ✓ | ✓ | ✓ | ✓ |
Qualitative comparison of popular annotation tools. Adapted from Table 3 in Neves and Ŝeva [18], Table 3. A symbol ≈ indicates that the corresponding feature is only partially supported.
| Characteristics | GATE Teamware | Knowtator | WebAnno | Brat | BioQRator | CATMA | prodigy | TextAE | LightTag | Djangology | MyMiner | WAT-SL |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| multi-label annotations | ✓ | ✓ | ✓ | ✓ | ✓ | |||||||
| relation annotations | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ≈ | |||||
| allows custom model | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ |
| collaborative interface | ✓ | ≈ | ≈ | ≈ | ≈ | ≈ | ✓ | ✓ | ≈ | |||
| web-based interface | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | |
| can be self-hosted | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | |||
| open source license | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | |||
| citation |
Summary statistics for the eHealth-KD v2 corpus. Labels marked with have been incorporated in this version of the corpus.
| Metric | Total | Trial | Training | Validation | Test |
|---|---|---|---|---|---|
| Sentences | 1045 | 45 | 600 | 100 | 300 |
| 6612 | 292 | 3818 | 604 | 1898 | |
| Concept | 4092 | 181 | 2381 | 368 | 1162 |
| Action | 1742 | 82 | 976 | 167 | 517 |
| Predicate | 563 | 27 | 330 | 45 | 161 |
| Reference | 215 | 2 | 131 | 24 | 58 |
| 6049 | 232 | 3504 | 537 | 1776 | |
| target | 1729 | 88 | 974 | 166 | 501 |
| subject | 894 | 49 | 511 | 74 | 260 |
| in-context | 677 | 28 | 403 | 67 | 179 |
| is-a | 566 | 0 | 337 | 56 | 173 |
| in-place | 400 | 19 | 251 | 25 | 105 |
| causes | 367 | 0 | 219 | 27 | 121 |
| domain | 364 | 20 | 201 | 28 | 115 |
| argument | 343 | 16 | 201 | 28 | 98 |
| entails | 167 | 0 | 89 | 14 | 64 |
| in-time | 165 | 12 | 89 | 24 | 40 |
| has-property | 159 | 0 | 91 | 21 | 47 |
| same-as | 124 | 0 | 85 | 6 | 33 |
| part-of | 94 | 0 | 53 | 1 | 40 |
| 585 | 28 | 311 | 69 | 177 | |
| diminished | 18 | 1 | 8 | 2 | 7 |
| emphasized | 124 | 4 | 69 | 10 | 41 |
| negated | 164 | 4 | 94 | 24 | 42 |
| uncertain | 279 | 19 | 140 | 33 | 87 |
Fig. 1Example annotation of three sentences. The annotation shows the most relevant entities and relations defined. Adapted from Piad-Morffis et al. [41]. On the top, the original text in Spanish. On the bottom, for reference purposes, an English translation.
Fig. 2Conceptual schema for the annotation model. Each of the semantic roles defined in the annotation model are represented as circles. The possible relations defined between each pair of roles are represented as rectangles. Adapted from Piad-Morffis et al. [41].
Fig. 3Schematic representation of the annotation process.
Summary of the inter-annotator agreement score at different stages of the annotation process, for all entity and relation types.
| Agreement | Stage 2 | Stage 3 | Stage 4 |
|---|---|---|---|
| Entities | 0.7050 | 0.8159 | 0.9854 |
| 0.6989 | 0.8011 | 0.9892 | |
| 0.7810 | 0.8737 | 0.9929 | |
| 0.4324 | 0.6641 | 0.9569 | |
| 0.7315 | 0.7990 | 0.9390 | |
| Relations | 0.5146 | 0.7162 | 0.9692 |
| 0.6053 | 0.7782 | 0.9592 | |
| 0.4006 | 0.6465 | 0.9917 | |
| 0.6530 | 0.8004 | 0.9761 | |
| 0.1030 | 0.4321 | 0.9623 | |
| 0.3684 | 0.6007 | 0.9737 | |
| 0.4195 | 0.6499 | 0.9584 | |
| 0.4165 | 0.6497 | 0.9407 | |
| 0.3677 | 0.6151 | 0.9346 | |
| 0.5439 | 0.7373 | 0.9750 | |
| 0.3016 | 0.5000 | 0.8710 | |
| 0.4662 | 0.6641 | 0.9242 | |
| 0.5469 | 0.7294 | 0.9784 | |
| 0.6574 | 0.8139 | 0.9821 | |
| Attributes | 0.4663 | 0.6537 | 0.9499 |
| 1.0000 | 1.0000 | 1.0000 | |
| 1.0000 | 1.0000 | 1.0000 | |
| 0.9746 | 0.9888 | 1.0000 | |
| 0.9370 | 0.9742 | 1.0000 | |
| Global agreement | 0.6190 | 0.7667 | 0.9765 |
Results ( metric) of the baseline strategies in each scenario.
| Score ( | |||
|---|---|---|---|
| Team | End-to-end | Subtask A | Subtask B |
| Human | 0.727 | 0.861 | 0.735 |
| Dummy | 0.424 | 0.546 | 0.123 |
| Random | 0.116 | 0.205 | 0.014 |
Results ( metric) in each scenario, sorted by Scenario 1 (column Score). The top results per scenario are highlighted in bold. Results that use the baseline implementation are represented by . The dummy baseline implementation provided in the challenge is slightly different due to variations in the order of the training sentences with respect to Table 5. Adapted from Piad-Morffis et al. [40].
| Score ( | |||||
|---|---|---|---|---|---|
| Team | Techniques | End-to-end | Subtask A | Subtask B | |
| Human | 0.727 | 0.861 | 0.735 | ||
| Baseline (b) | 0.430 | 0.546 | 0.123 | ||
| TALP-UPC | |||||
| coin_flipper | 0.787 | ||||
| LASTUS-TALN | 0.229 | ||||
| NLP_UNED | 0.547 | 0.754 | |||
| HULAT-TaskAB | 0.541 | 0.775 | 0.123 | ||
| UH-Maja-KD | 0.518 | 0.433 | |||
| LSI2_UNED | 0.493 | 0.731 | 0.123 | ||
| IxaMed | 0.486 | 0.682 | 0.435 | ||
| HULAT-TaskA | 0.430 | 0.790 | 0.123 | ||
| VSP | 0.428 | 0.546 | |||
Relative impact of the characteristics of each system in the overall score, per scenario, as defined by a linear regression model fitted on each system’s performance. Tag labels correspond to the techniques used by each system as reported in Table 6. Highlighted in bold are the most significant weights in each scenario. Adapted from Piad-Morffis et al. [40].
| Scenario | ||||
|---|---|---|---|---|
| Technique | End-to-end | Subtask A | Subtask B | |
| Attention-based architecture | ( | −0.015 | −0.002 | |
| Character embeddings | ( | −0.088 | −0.006 | −0.129 |
| Convolutional networks | ( | 0.019 | −0.018 | −0.140 |
| Conditional random fields | ( | 0.010 | 0.011 | −0.103 |
| Custom embeddings | ( | −0.012 | −0.008 | −0.087 |
| Dataset augmentation | ( | −0.016 | ||
| Hand-crafted rules | ( | |||
| Joint solution (end-to-end) | ( | 0.015 | 0.081 | |
| NLP features | ( | 0.021 | −0.004 | 0.021 |
| Overlapping entities | ( | −0.002 | ||
| Pretrained embeddings | ( | 0.012 | 0.008 | 0.010 |
Fig. 4Correlation between the number of instances identified by one or more systems, and the relative frequency of labels in the training set. Adapted from Piad-Morffis et al. [40].