| Literature DB >> 28984180 |
Haodi Li1, Qingcai Chen2, Buzhou Tang3,4, Xiaolong Wang1, Hua Xu5, Baohua Wang6, Dong Huang1.
Abstract
BACKGROUND: Most state-of-the-art biomedical entity normalization systems, such as rule-based systems, merely rely on morphological information of entity mentions, but rarely consider their semantic information. In this paper, we introduce a novel convolutional neural network (CNN) architecture that regards biomedical entity normalization as a ranking problem and benefits from semantic information of biomedical entities.Entities:
Keywords: Biomedical entity normalization; Convolutional neural network
Mesh:
Year: 2017 PMID: 28984180 PMCID: PMC5629610 DOI: 10.1186/s12859-017-1805-7
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Examples of candidates of some biomedical entity mentions generated by the rules in the three categories
| Category | Rule in [ | Example | |
|---|---|---|---|
| Mention | Candidate | ||
| Exact-Match(CI) | Dictionary-based | Coronary artery disease | Coronary artery disease |
| Exact-Match II (CII) | Abbreviation expansion | “Sch” in “Suxamethonium chloride (Sch)” | Suxamethonium chloride |
| Subject Object | Weight gain | Gain weight | |
| Numbers replacement | Vitamin b2 | Vitamin b ii | |
| Hyphenation | 5-hydroxytriptamine | 5 hydroxytriptamine | |
| Suffixation | Hypotensive | Hypotension | |
| Synonyms | Renal cell carcinoma | Renal cell cancer | |
| Stemming | Hypotensive | {Hypotension, Hypotensin} | |
| Composite | Optic and peripheral neuropathy | {Optic neuropathy, Peripheral neuropathy} | |
| Partial-Match (CIII) | Partial match | Cardiac injury | {Cardiac arrest, Cardiac disorders, ⋯, Renal injury} |
Fig. 1Comparison of D’Souza & Ng and ours. Two subfigure contains: (a) “Workflow of the system proposed by D’Souza & Ng.” and (b) “Workflow of our system”
Fig. 2Architecture of the CNN-based ranking module
Detailed information of the three benchmark datasets used in our study
| Dataset | |||
|---|---|---|---|
| ShARe/CELF | NCBI | ||
| KB | Name | SNOMED CT | MESH & OMIM |
| #ent | 126525 | 11915 | |
| Training Set | #doc | 199 | 692 |
| #men | 5816 | 5921 | |
| #ID | 968 | 707 | |
| #NIL | 1638 | 0 | |
| Test set | #doc | 99 | 100 |
| #men | 5351 | 964 | |
| #ID | 796 | 201 | |
| #NIL | 1736 | 0 | |
Comparison of our CNN-based ranking biomedical entity normalization system with other systems (accuracy)
| ShARe/CLEF (%) | NCBI (%) | |
|---|---|---|
| CNN-based ranking | 90.30 | 86.10 |
| CNN-based ranking# | 90.21 | 85.53 |
| Baseline | 89.53 | 84.65 |
| D’Souza & Ng’s system | 90.75 | 84.65 |
| UWM | 89.50 | NA |
| DNorm | NA | 82.20 |
| TaggerOne | NA | 88.8 |
Normalization process of the rule-based baseline system and our CNN-based ranking system for some entity mentions
| Mention | Baseline system | CNN-based ranking system |
|---|---|---|
| Tremulousness | Normalized into ‘tremors’ by sieve 7. | Normalized into ‘trembles’ by ranking entities in E=E2={tremors, tremulous, neonatal tremor, NIL } with scores {0.9311, 0.9652, 0.0191, 0.0526}. |
| Tremulous | Normalized into ‘tremulous’ by sieve 1. | Normalized into ‘tremulous’ by ranking entities in E=E2={tremulous } with scores {1.0}. |
| Metaplastic polyps of the colorectum | Normalized into ‘polyps’ by sieve 10. | Normalized into ‘colorectal polyps’ by ranking entities in E=E3={adenomatous polyps in the colon, colonic polyps, adenomatous polyps of the colon and rectum, colorectal polyps, polyps, adenomatous polyps } with scores {0.0088, 0.0079, 0.0133, 0.0656, 0.0077, 0.0010}. |
| Colonic polyps | Normalized into ‘colorectal’ by sieve 1. | Normalized into ‘colonic polyps’ by ranking entities in E=E3={ colonic polyps } with scores {1.0}. |