| Literature DB >> 27777244 |
Qikang Wei1, Tao Chen1, Ruifeng Xu2, Yulan He3, Lin Gui1.
Abstract
The recognition of disease and chemical named entities in scientific articles is a very important subtask in information extraction in the biomedical domain. Due to the diversity and complexity of disease names, the recognition of named entities of diseases is rather tougher than those of chemical names. Although there are some remarkable chemical named entity recognition systems available online such as ChemSpot and tmChem, the publicly available recognition systems of disease named entities are rare. This article presents a system for disease named entity recognition (DNER) and normalization. First, two separate DNER models are developed. One is based on conditional random fields model with a rule-based post-processing module. The other one is based on the bidirectional recurrent neural networks. Then the named entities recognized by each of the DNER model are fed into a support vector machine classifier for combining results. Finally, each recognized disease named entity is normalized to a medical subject heading disease name by using a vector space model based method. Experimental results show that using 1000 PubMed abstracts for training, our proposed system achieves an F1-measure of 0.8428 at the mention level and 0.7804 at the concept level, respectively, on the testing data of the chemical-disease relation task in BioCreative V.Database URL: http://219.223.252.210:8080/SS/cdr.html.Entities:
Mesh:
Year: 2016 PMID: 27777244 PMCID: PMC5088735 DOI: 10.1093/database/baw140
Source DB: PubMed Journal: Database (Oxford) ISSN: 1758-0463 Impact factor: 3.451
Statistics of the released data of the DNER subtask in the DR task
| Dataset | Abstract | Disease named entity | |
|---|---|---|---|
| Mentions | Concept IDs | ||
| Training set | 500 | 4182 | 1965 |
| Development set | 500 | 4244 | 1865 |
| Test set | 500 | 4424 | 1988 |
Figure 1.System architecture.
Figure 2.The flow chart of CRF-based DNER model.
Figure 3.A Sample output of our abbreviation resolution.
Figure 4.An illustration of Bi-RNN.
Figure 5.Architecture of the SVM-based model.
The performance of CRF-based DNER model (The best result is highlighted in bold face)
| Different features of CRF model | DNER | ||
|---|---|---|---|
| Precision (%) | Recall (%) | F-measure (%) | |
| Experiment A | 82.52 | 72.22 | 77.03 |
| Experiment B | 85.01 | 80.65 | 82.77 |
| Experiment C | 85.11 | 80.76 | |
The performance of Bi-RNN-based model (The best result is highlighted in bold face)
| RNN model | DNER | ||
|---|---|---|---|
| Precision (%) | Recall (%) | F-measure (%) | |
| RNN with Vectors A | 70.76 | 67.40 | 69.04 |
| Bi-RNN with Vectors A | 74.96 | 75.74 | 75.35 |
| Bi-RNN with Vectors B | 77.47 | 79.09 | |
The performance of output fusion by SVM (The best result is highlighted in bold face)
| Model | DNER | ||
|---|---|---|---|
| Precision (%) | Recall (%) | F-measure (%) | |
| CRF model Experiment C | 85.11 | 80.76 | 82.88 |
| Bi-RNN using Vectors B | 77.47 | 79.09 | 78.27 |
| Output fusion by SVM | 85.28 | 83.30 | |
| Baseline | 40.88 | 59.95 | 48.61 |
The baseline provided by the organizer is based on dictionary look up.
The concept-level DNER performance
| Model | Disease name entity normalization | ||
|---|---|---|---|
| Precision (%) | Recall (%) | F-measure (%) | |
| Our approach | 76.57 | 79.57 | 78.04 |
| Baseline | 42.71 | 67.46 | 52.30 |