| Literature DB >> 31769759 |
Richard Tzong-Han Tsai1, Jorng-Tzong Horng1,2, Po-Ting Lai3, Wei-Liang Lu1, Ting-Rung Kuo1, Chia-Ru Chung1, Jen-Chieh Han1.
Abstract
BACKGROUND: Research on disease-disease association (DDA), like comorbidity and complication, provides important insights into disease treatment and drug discovery, and a large body of the literature has been published in the field. However, using current search tools, it is not easy for researchers to retrieve information on the latest DDA findings. First, comorbidity and complication keywords pull up large numbers of PubMed studies. Second, disease is not highlighted in search results. Finally, DDA is not identified, as currently no disease-disease association extraction (DDAE) dataset or tools are available.Entities:
Keywords: biological relation extraction; biomedical natural language processing; convolutional neural networks; deep learning; disease-disease association
Year: 2019 PMID: 31769759 PMCID: PMC6913619 DOI: 10.2196/14502
Source DB: PubMed Journal: JMIR Med Inform
Figure 1Disease-disease association extraction examples.
Figure 2Disease-disease association extraction dataset construction process. MeSH= Mesdical Subject Headings.
Figure 3Large margin context-aware convolutional neural network (LC-CNN) architecture. BOW: Bag of words; POS: Part of speech; NE: Named Entity.
Summary of disease-disease association extraction dataset.
| Type | Training set, n | Test set, n | Total, n |
| Abstracts | 400 | 121 | 521 |
| Sentences | 4820 | 1549 | 6369 |
| Diseases | 9522 | 2824 | 12,346 |
| Total pairs | 9086 | 2419 | 11,505 |
| Positive pairs | 2538 | 623 | 3161 |
| Negative pairs | 126 | 35 | 161 |
| Null pairs | 6422 | 1761 | 8183 |
Figure 4Precision and recall formula.
Performances of different models. P: Precision; R: Recall; F: F1-Measure.
| Input | Model | Tuning set | Test set | ||||
| P (%) | R (%) | F (%) | P (%) | R (%) | F (%) | ||
| SRa | LSTMb | 65.53 | 70.15 | 67.76 | 66.13 | 63.95 | 65.02 |
| SR | BiLSTMc | 73.78 | 70.12 | 71.90 | 65.16 | 65.64 | 65.40 |
| SR | CNNd | 75.31 | 75.39 | 75.35 | 74.86 | 71.84 | 73.32 |
| CRe | CRcross-entropy | 79.26 | 72.55 | 75.76 | 77.78 | 77.19 | 77.49 |
| CR | SVMf | 74.86 | 81.03 | 77.86 | 78.44 | 82.29 | 80.32 |
| SR+CR | SCNNg | 79.23 | 88.30 | 83.52 | 75.31 | 87.44 | 80.93 |
| SR+CR | LC-CNNh | 82.58 | 87.72 | 85.07 | 82.36 | 85.00 | 84.18 |
| Sentence+pair | BERT | 77.23 | 80.27 | 78.72 | 79.24 | 85.23 | 82.12 |
| Sentence+pair | BioBERT | 80.22 | 83.75 | 81.95 | 80.24 | 85.35 | 82.27 |
aSR: sentence representation.
bLSTM: long-short term memory.
cBiLSTM: bidirectional long-short term memory.
dCNN: convolutional neural network.
eCR: context representation.
fSVM: support vector machine.
gSCNN: syntax convolutional neural network.
hLC-CNN: Large margin context-aware convolutional neural network.
Performance of combined classifiers. P: Precision; R: Recall; F: F1-Measure.
| Method | P (%) | R (%) | F (%) |
| Baseline 1 (CRacross-entropy) | 77.78 | 77.19 | 77.49 |
| Baseline 2 (SVMb) | 78.44 | 82.29 | 80.32 |
| Baseline 3 (CNNc) | 74.86 | 71.84 | 73.32 |
| SCNNd | 75.31 | 87.44 | 80.93 |
| LC-CNNe | 82.36 | 85.00 | 84.18 |
| SVM+CNN (2-stage) | 74.45 | 72.26 | 73.34 |
aCR: context representation.
bSVM: support vector machine.
cCNN: convolutional neural network.
dSCNN: syntax convolutional neural network.
eLC-CNN: large margin context-aware convolutional neural network.
The effect of different composite embedding vectors on large margin context-aware convolutional neural network performance. P: Precision; R: Recall; F: F1-Measure.
| Method | P (%) | R (%) | F (%) |
| LC-CNNaPubMed | 82.36 | 85.00 | 84.18 |
| LC-CNNnews | 79.80 | 87.36 | 83.41 |
| LC-CNNno pretrain | 77.83 | 86.58 | 81.97 |
| LC-CNNPubMed—POSb | 80.23 | 84.26 | 82.19 |
| LC-CNNPubMed—distance | 77.68 | 87.08 | 82.11 |
aLC-CNN: large margin context-aware convolutional neural network.
bPOS: part of speech.
The distribution of sampled large margin context-aware convolutional neural network error cases.
| Type, category | Description | Ratio (%) | |
|
|
|
| |
|
| Symptom/subclass | A disease is a symptom/subclass of another disease | 28 |
|
| Co-occur | 2 diseases co-occur in the sentence | 24 |
|
| Negation | 2 diseases are negative relation | 8 |
|
| Others | The error cannot be categorized | 40 |
|
|
|
| |
|
| Simple FN | There is an obvious relation keyword for disease pair | 23 |
|
| Negation | 2 diseases are negative relation | 16 |
|
| Others | No obvious relation keyword, or the statements of DDAc are too complicated | 61 |
aFP: False positive.
bFN: False negative.
cDDA: disease-disease association.