| Literature DB >> 24162468 |
Sampo Pyysalo1, Sophia Ananiadou.
Abstract
MOTIVATION: Anatomical entities ranging from subcellular structures to organ systems are central to biomedical science, and mentions of these entities are essential to understanding the scientific literature. Despite extensive efforts to automatically analyze various aspects of biomedical text, there have been only few studies focusing on anatomical entities, and no dedicated methods for learning to automatically recognize anatomical entity mentions in free-form text have been introduced.Entities:
Mesh:
Year: 2013 PMID: 24162468 PMCID: PMC3957068 DOI: 10.1093/bioinformatics/btt580
Source DB: PubMed Journal: Bioinformatics ISSN: 1367-4803 Impact factor: 6.937
Fig. 1.Annotation example showing tagged anatomical entity mentions
Entity types and implicit ontological structure
| Type | Examples |
|---|---|
| A | cell, heart, blood |
| A | cell, heart, head |
| O | human, drosophila |
| O | head, limb, hand |
| A | vascular system |
| O | liver, heart, lung |
| M | artery, cornea |
| T | epithelium, bone |
| C | epithelial cell |
| D | embryo, fetus |
| C | nucleus, plasmid |
| B | cyclin, insulin |
| O | blood, serum, urine |
| I | lumen, bone cavity |
| P | wound, ulcer, edema |
| C | tumor, carcinoma |
Note: Indentation corresponds to is-a structure. Not annotated: implicit structure only. Not annotated: out of scope.
Fig. 2.AnatomyTagger architecture. Parallel arcs indicate mutually independent stages of processing. Processing stages drawn shaded involve models or tools newly introduced in this study
Fig. 3.Term matching and classification using OBO resources. Example simplified from FMA; figure modified from Pyysalo
Corpus statistics
| Name | Entities | Tokens | Documents | Sources | Annotated anatomical entity types |
|---|---|---|---|---|---|
| AnatEM | 13 701 | 245 448 | 1212 | Abstracts, full text extracts | 12 anatomical entity types ( |
| AnEM | 3135 | 91 420 | 500 | Abstracts, full text extracts | Same as AnatEM, but no |
| MLEE | 3599 | 56 588 | 262 | Abstracts | Same as AnatEM, but no |
| CellFinder | 3667 | 55 362 | 10 | Full texts | C |
| CRAFT | 14 248 | 587 299 | 67 | Full texts | C |
| JNLPBA | 12 969 | 522 869 | 2404 | Abstracts | C |
Note: Annotated types and entity mention counts shown for anatomical entities as defined in Section 2.1. Statistics for publicly available part of corpus.
Development test results (F-scores)
| Method | Single-class | Multiclass |
|---|---|---|
| Baseline | 86.93 | 81.23 |
| Truecasing | 87.12 | 81.45 |
| Non-local features | 87.81 | 81.82 |
| UMLS, tokens | 89.07 | 82.79 |
| UMLS, longest phrases | 88.57 | 82.64 |
| UMLS, all phrases | 89.65 | 83.50 |
| OBO, tokens | 87.58 | 81.76 |
| OBO, longest phrases | 88.81 | 82.56 |
| OBO, all phrases | 88.40 | 82.44 |
| Brown, news, c = 100 | 87.20 | 80.92 |
| Brown, news, c = 320 | 87.71 | 81.23 |
| Brown, news, c = 1000 | 86.58 | 80.68 |
| Brown, news, c = 3200 | 87.11 | 80.80 |
| Brown, bio, c = 100 | 87.44 | 81.67 |
| Brown, bio, c = 320 | 89.56 | 82.03 |
| Brown, bio, c = 1000 | 88.94 | 81.78 |
| Brown, bio, c = 3200 | 88.55 | 81.33 |
Comparative evaluation on test data (F-scores)
| Method | Single-class | Multiclass |
|---|---|---|
| BioContext | 38.97 | — |
| MetaMap | 67.34 | — |
| Illinois | 81.01 | 75.22 |
| Gimli | 86.75 | — |
| NERsuite | 89.20 | 83.50 |
| AnatomyTagger |
Highest results highlighted in bold.
Tagged entity counts in PMC OA documents
| Type | Count |
|---|---|
| O | 2 429 093 |
| A | 511 191 |
| O | 5 101 355 |
| M | 6 855 622 |
| T | 2 541 481 |
| C | 16 062 208 |
| D | 478 429 |
| C | 4 824 697 |
| O | 3 717 117 |
| I | 699 962 |
| P | 702 526 |
| C | 4 546 971 |
| Total | 48 470 652 |