| Literature DB >> 32348257 |
Erniu Wang1, Fan Wang1, Zhihao Yang1, Lei Wang2, Yin Zhang2, Hongfei Lin1, Jian Wang1.
Abstract
BACKGROUND: Extracting the interactions between chemicals and proteins from the biomedical literature is important for many biomedical tasks such as drug discovery, medicine precision, and knowledge graph construction. Several computational methods have been proposed for automatic chemical-protein interaction (CPI) extraction. However, the majority of these proposed models cannot effectively learn semantic and syntactic information from complex sentences in biomedical texts.Entities:
Keywords: chemical-protein interaction; dependency structure; graph convolutional network; long-range syntactic
Year: 2020 PMID: 32348257 PMCID: PMC7267994 DOI: 10.2196/17643
Source DB: PubMed Journal: JMIR Med Inform
Figure 1The overall architecture of our model. Bi-LSTM: bi-directional long short-term memory; GCN: graph convolutional network; POS: part-of-speech; MLP: multilayer perceptron; sub: subject; obj: object; hc: representation of chemical; hs: representation of sentence; hp: representation of protein; f: max-pooling function.
Figure 2Examples of the ChemProt corpus. CPI: chemical-protein interaction.
The chemical-protein relation (CPR) groups.
| Group | Evaluated in the BioCreative VI ChemProt shared task? | ChemProt relations |
| CPR:1 | No | PART_OF |
| CPR:2 | No | REGULATOR|DIRECT_REGULATOR|INDIRECT_REGULATOR |
| CPR:3 | Yes | UPREGULATOR|ACTIVATOR|INDIRECT_UPREGULATOR |
| CPR:4 | Yes | DOWNREGULATOR|INHIBITOR|INDIRECT_DOWNREGULATOR |
| CPR:5 | Yes | AGONIST|AGONIST-ACTIVATOR|AGONIST-INHIBITOR |
| CPR:6 | Yes | ANTAGONIST |
| CPR:7 | No | MODULATOR|MODULATOR-ACTIVATOR|MODULATOR-INHIBITOR |
| CPR:8 | No | COFACTOR |
| CPR:9 | Yes | SUBSTRATE|PRODUCT_OF|SUBSTRATE_PRODUCT_OF |
| CPR:10 | No | NOT |
Statistics of the ChemProt corpus.
| Annotations | Data set | ||
|
| Training, n | Development, n | Test, n |
| Document | 1020 | 612 | 800 |
| Chemicals | 13,017 | 8004 | 10,810 |
| Proteins | 12,752 | 7567 | 10,019 |
| CPRa:3 | 768 | 550 | 665 |
| CPR:4 | 2254 | 1094 | 1661 |
| CPR:5 | 173 | 116 | 195 |
| CPR:6 | 235 | 199 | 293 |
| CPR:9 | 727 | 457 | 644 |
| Evaluated CPIsb | 4157 | 2416 | 3458 |
| Evaluated CPIs in one sentence | 4122 | 2412 | 3444 |
aCPR: chemical-protein relation.
bCPI: chemical-protein interaction.
Figure 3Illustrative examples of chemical-protein relation (CPR) classes.
Hyperparameter setting.
| Hyperparameter | Tuned range | Optimal |
| Word embedding dimension | [100,200,300] | 200 |
| POSa embedding dimension | [10,20,30,40] | 20 |
| Entity type embedding dimension | [40,50,60,70,80] | 60 |
| GCNb hidden units | [100,200,300] | 200 |
| LSTMc hidden units | [100,200,300] | 200 |
| Learning rate | [0.1,0.2,0.3,0.4] | 0.3 |
| Dropout rate | [0.4,0.5,0.6] | 0.5 |
aPOS: part-of-speech.
bGCN: graph convolutional network.
cLSTM: long short-term memory.
Figure 4Comparison of different pruning distances.
Performance evaluation of different embedding features.
| Embedding feature | Precision (%) | Recall (%) | F-score (%) | Δ (%) |
| Word | 57.64 | 61.62 | 59.56 | —a |
| Word+POSb | 58.49 | 63.06 | 60.69 | +1.13 |
| Word+Entity type | 64.06 | 61.05 | 62.52 | +2.96 |
| Word+POS+Entity type | 63.79 | 66.62 | 65.17 | +5.61 |
aNot applicable.
bPOS: part-of-speech.
Comparison with the baseline method.
| Model | Precision (%) | Recall (%) | F-score (%) | ||||
|
| |||||||
|
| CNNa | 42.47 | 69.43 | 52.70 | |||
|
| GCNb | 48.77 | 63.69 | 55.24 | |||
|
| Bi-LSTMc | 60.59 | 60.34 | 60.46 | |||
|
| |||||||
|
| Bi-LSTM+CNN | 57.77 | 64.73 | 61.05 | |||
|
| Bi-LSTM+GCN (our model) | 63.79 | 66.62 | 65.17 | |||
aCNN: convolutional neural network.
bGCN: graph convolutional network.
cBi-LSTM: bi-directional long short-term memory.
Comparison with other existing methods.
| Model | Precision (%) | Recall (%) | F-score (%) |
| Verga et al [ | 48.00 | 54.10 | 50.80 |
| Matos [ | 57.38 | 47.22 | 51.81 |
| Liu et al [ | 57.4 | 48.7 | 52.7 |
| Lung et al [ | 63.52 | 51.21 | 56.71 |
| Corbett and Boyle [ | 62.97 | 62.20 | 62.58 |
| Mehryary et al [ | 59.05 | 67.76 | 63.10 |
| Peng et al [ | 72.66 | 57.35 | 64.10 |
| Our model | 63.79 | 66.62 | 65.17 |