| Literature DB >> 30523437 |
Abstract
Chemical named entity recognition (NER) has traditionally been dominated by conditional random fields (CRF)-based approaches but given the success of the artificial neural network techniques known as "deep learning" we decided to examine them as an alternative to CRFs. We present here several chemical named entity recognition systems. The first system translates the traditional CRF-based idioms into a deep learning framework, using rich per-token features and neural word embeddings, and producing a sequence of tags using bidirectional long short term memory (LSTM) networks-a type of recurrent neural net. The second system eschews the rich feature set-and even tokenisation-in favour of character labelling using neural character embeddings and multiple LSTM layers. The third system is an ensemble that combines the results of the first two systems. Our original BioCreative V.5 competition entry was placed in the top group with the highest F scores, and subsequent using transfer learning have achieved a final F score of 90.33% on the test data (precision 91.47%, recall 89.21%).Entities:
Keywords: Chemicals; Deep learning; Named entity recognition
Year: 2018 PMID: 30523437 PMCID: PMC6755713 DOI: 10.1186/s13321-018-0313-8
Source DB: PubMed Journal: J Cheminform ISSN: 1758-2946 Impact factor: 5.514
The layers used in the traditional network
| Layer | Type | Input(s) | No. of output neurons | Notes |
|---|---|---|---|---|
|
| Embedding |
| 300 | |
|
| Conv1D |
| 256 | Width = 3, activation = relu, dropout of 0.5 |
|
| Concatenate |
| 556 | |
|
| Bidirectional LSTM |
| 64 per direction, total 128 | Dropout of 0.5 |
|
| TimeDistributed Dense |
| 5 | Activation = softmax |
Layers in minimalist network
| Layer | Type | Input(s) | No. of output neurons | Notes |
|---|---|---|---|---|
|
| LSTM |
| 128 | |
|
| LSTM |
| 128 | Reversed |
|
| Concatenate |
| Dropout of 0.5 | |
|
| Bidirectional LSTM |
| 64 per direction, total 128 | Dropout of 0.5 |
|
| Bidirectional LSTM |
| 64 per direction, total 128 | Dropout of 0.5 |
|
| TimeDistributed(Dense) |
| 5 | Activation = softmax |
“Predictive transfer” network
| Layer | Type | Input(s) | No. of output neurons | Notes |
|---|---|---|---|---|
|
| LSTM |
| 128 | |
|
| LSTM |
| 128 | Reversed |
|
| TimeDistributed(Dense) |
| 91 | Activation = softmax |
|
| TimeDistributed(Dense) |
| 91 | Activation = softmax |
“Dictionary transfer” network
| Layer | Type | Input(s) | No. of output neurons | Notes |
|---|---|---|---|---|
|
| LSTM |
| 128 | |
|
| LSTM |
| 128 | Reversed |
|
| Concatenate |
| Dropout of 0.5 | |
|
| Bidirectional LSTM |
| 64 per direction, total 128 | Dropout of 0.5 |
|
| Bidirectional LSTM |
| 64 per direction, total 128 | Dropout of 0.5 |
|
| GlobalMaxPooling1D |
| 128 | |
|
| Dense |
| 3 | Activation = sigmoid |
Results of official BioCreative V.5 submissions
| System | Official test | Internal evaluation | ||||
|---|---|---|---|---|---|---|
| F (%) | Precision (%) | Recall (%) | F (%) | Precision (%) | Recall (%) | |
| Traditional | 89.19 | 88.67 | 89.71 | 87.03 | 86.48 | 87.58 |
| Minimalist | 89.01 | 88.65 | 89.36 | 86.64 | 84.79 | 88.58 |
| Ensemble |
|
|
|
|
|
|
The official test was part of the BioCreative competition, and the internal evaluations were performed by ourselves using 1/5 of the training data not used for training
Entries in italics are the best results in that column
Results of systems described in this paper
| System | Official test | Internal evaluation | ||||
|---|---|---|---|---|---|---|
| F score (%) | Precision (%) | Recall (%) | F score (%) | Precision (%) | Recall (%) | |
| 1: Traditional | 89.04 | 89.57 | 88.52 | 86.75 | 86.03 | 87.49 |
| 2: Minimalist | 88.71 | 88.04 |
| 86.85 | 85.10 | 88.68 |
| 3: Ensemble of 1 and 2 | 90.11 | 88.69 | 88.02 | 88.02 | 86.89 |
|
| 4: Traditional with custom embeddings | 89.19 | 90.05 | 87.93 | 86.91 | 87.10 | 86.72 |
| 5: Minimalist with transfer training | 89.32 | 90.49 | 88.18 | 87.18 | 87.58 | 87.38 |
| 6: Ensemble of 4 and 5 |
|
| 89.21 |
|
| 88.17 |
Entries in italics are the best results in that column
Results of training using different LSTM implementations
| System | Official test | Internal evaluation | ||||
|---|---|---|---|---|---|---|
| F score (%) | Precision (%) | Recall (%) | F score (%) | Precision (%) | Recall (%) | |
| 1: Traditional | 89.04 | 89.57 | 88.52 | 86.75 | 86.03 | 87.49 |
| 4: Traditional with custom embeddings | 89.19 |
| 87.93 | 86.91 |
| 86.72 |
| 7: As 1, with default LSTM, and recurrent dropout | 89.11 | 89.23 | 88.98 | 86.93 | 85.86 | 88.01 |
| 8: As 4, with default LSTM, and recurrent dropout |
| 89.19 |
|
| 85.86 |
|
Entries in italics are the best results in that column
Results of internal evaluation of minimalist system with different transfer learning strategies
| Predictive transfer | Dictionary transfer | F score (%) | Precision (%) | Recall (%) |
|---|---|---|---|---|
| None | None | 86.85 | 85.10 | 88.68 |
| None | At start | 86.40 | 85.20 | 87.64 |
| None | Interleaved | 86.80 | 85.74 | 87.88 |
| At start | None | 87.14 | 85.47 |
|
| At start | After predictive | 87.08 | 86.16 | 88.07 |
| After dictionary | At start | 87.24 | 86.09 | 88.42 |
| At start | Interleaved with dictionary | 87.03 | 85.46 | 88.66 |
| At start | Interleaved |
|
| 87.58 |
| Interleaved | None | 87.30 | 85.90 | 88.75 |
| Interleaved | At start | 86.88 | 85.59 | 88.20 |
| Interleaved | Interleaved | 87.30 | 86.36 | 88.27 |
Entries in italics are the best results in that column