| Literature DB >> 31516126 |
Fei Li1,2,3, Yonghao Jin1, Weisong Liu1,2,3, Bhanu Pratap Singh Rawat4, Pengshan Cai4, Hong Yu1,2,3,4.
Abstract
BACKGROUND: The bidirectional encoder representations from transformers (BERT) model has achieved great success in many natural language processing (NLP) tasks, such as named entity recognition and question answering. However, little prior work has explored this model to be used for an important task in the biomedical and clinical domains, namely entity normalization.Entities:
Keywords: BERT; deep learning; electronic health record note; entity normalization; natural language processing
Year: 2019 PMID: 31516126 PMCID: PMC6746103 DOI: 10.2196/14830
Source DB: PubMed Journal: JMIR Med Inform
Figure 1Overview of this paper's methods. Bidirectional encoder representations from transformers (BERT) [11] was trained on Wikipedia text and the BookCorpus dataset. BioBERT [13] was initialized with BERT and fine-tuned using PubMed and (PubMed Central) PMC publications. We initialized the BERT-based model that was trained using 1.5 million electronic health record notes (EhrBERT) with BioBERT and then fine-tuned it using unlabeled electronic health record (EHR) notes. We further fine-tuned EhrBERT using annotated corpora for the entity normalization task. CDR: Chemical-Disease Relations; MADE: Medication, Indication, and Adverse Drug Events; NCBI: National Center for Biotechnology Information.
Main hyper-parameter settings of EhrBERTa.
| Hyper-parameter | Value |
| Epoch | 15 |
| Maximal sequence length | 128 |
| Batch size | 64 |
| Learning rate | 0.00003 |
| Embedding size | 768 |
| Dropout probability | 0.1 |
| Transformer blocks | 12 |
| Self-attention heads | 12 |
aEhrBERT: bidirectional encoder representations from transformers (BERT)–based model that was trained using 1.5 million electronic health record notes.
Figure 2Model architectures. An example of entity normalization is shown and the named entity “dyspnea on exertion” is normalized to the term “60845006” in the Systematized Nomenclature of Medicine—Clinical Terms (SNOMED-CT) vocabulary (SNOMED International, 2019). The size of classes depends on the vocabularies used in a corpus, which is about 380,000 (Medical Dictionary for Regulatory Activities [MedDRA] and SNOMED-CT) for the Medication, Indication, and Adverse Drug Events (MADE) 1.0 corpus and 11,000 (MErged DIsease voCabulary [MEDIC]) for the National Center for Biotechnology Information (NCBI) Disease and Chemical-Disease Relations (CDR) corpora. BERT: bidirectional encoder representations from transformers; C: dTrm-dimensional representation; [CLS]: classifier token; E: demb-dimensional embedding; T: dTrm-dimensional vector; Trm: bidirectional transformer.
F1s and standard deviations.
| Corpus and model | F1 (%), mean (SD) | Improvement compared with MetaMap or DNorma | |
|
|
|
| |
|
| BERTd | 67.87 (0.25) | N/Ae |
|
| BioBERT | 68.22 (0.11) | N/A |
|
| EhrBERT500kf | 68.74 (0.14) | N/A |
|
| EhrBERT1Mg | 68.82 (0.29) | N/A |
|
|
|
| |
|
| MetaMap [ | 38.59 (0) | N/A |
|
| BERT | 40.81 (0.08) | +2.22 |
|
| BioBERT | 40.87 (0.06) | +2.28 |
|
| EhrBERT500k | 40.95 (0.04) | +2.36 |
|
| EhrBERT1M | 40.95 (0.07) | +2.36 |
|
|
|
| |
|
| DNorm [ | 88.37 (0) | N/A |
|
| BERT | 89.43 (0.99) | +1.06 |
|
| EhrBERT500k | 90.00 (0.48) | +1.63 |
|
| EhrBERT1M | 90.35 (1.12) | +1.98 |
|
| BioBERT | 90.71 (0.37) | +2.34 |
|
|
|
| |
|
| DNorm [ | 89.92 (0) | N/A |
|
| BERT | 93.11 (0.54) | +3.19 |
|
| BioBERT | 93.42 (0.10) | +3.50 |
|
| EhrBERT500k | 93.45 (0.09) | +3.53 |
|
| EhrBERT1M | 93.82 (0.15) | +3.90 |
aDNorm: disease name normalization.
bMADE: Medication, Indication, and Adverse Drug Events.
cWe used gold entity mentions as input.
dBERT: bidirectional encoder representations from transformers.
eN/A: not applicable.
fEhrBERT500k: BERT-based model that was trained using 500,000 electronic health record notes.
gEhrBERT1M: BERT-based model that was trained using 1 million electronic health record notes.
hWe used MetaMap-predicted entity mentions as input.
iNCBI: National Center for Biotechnology Information.
jCDR: Chemical-Disease Relations.
P values of the different models for the Medication, Indication, and Adverse Drug Events (predicted entities) corpus.
| Model | Model, | |||
|
| BERTa | BioBERT | EhrBERT500kb | EhrBERT1Mc |
| MetaMap | <.001 | <.001 | <.001 | <.001 |
| BERT |
| .17 | .02 | .02 |
| BioBERT |
|
| .04 | .04 |
| EhrBERT500k |
|
|
| .50 |
aBERT: bidirectional encoder representations from transformers.
bEhrBERT500k: BERT-based model that was trained using 500,000 electronic health record notes.
cEhrBERT1M: BERT-based model that was trained using 1 million electronic health record notes.
P values of the different models for the Chemical-Disease Relations corpus.
| Model | Model, | |||
|
| BERTa | BioBERT | EhrBERT500kb | EhrBERT1Mc |
| DNormd | .004 | <.001 | <.001 | <.001 |
| BERT |
| .18 | .22 | .04 |
| BioBERT |
|
| .41 | .03 |
| EhrBERT500k |
|
|
| .03 |
aBERT: bidirectional encoder representations from transformers.
bEhrBERT500k: BERT-based model that was trained using 500,000 electronic health record notes.
cEhrBERT1M: BERT-based model that was trained using 1 million electronic health record notes.
dDNorm: disease name normalization.
Figure 3A case study. The left column shows examples where EhrBERT gave valid predictions. The right column shows examples where EhrBERT failed to give valid predictions. The rectangles denote mentions and weights of the word pieces in these mentions. The darker the color is, the larger the weight is. Split word pieces are denoted with “##.” The text in green and red indicate gold and predicted answers respectively. EhrBERT: bidirectional encoder representations from transformers (BERT)-based model that was trained using 1.5 million electronic health record notes.
P values of the different models for the National Center for Biotechnology Information disease corpus.
| Model | Model, | |||
|
| BERTa | EhrBERT500kb | EhrBERT1Mc | BioBERT |
| DNormd | .10 | .01 | .04 | .004 |
| BERT |
| .25 | .15 | .03 |
| EhrBERT500k |
|
| .37 | .09 |
| EhrBERT1M |
|
|
| .32 |
aBERT: bidirectional encoder representations from transformers.
bEhrBERT500k: BERT-based model that was trained using 500,000 electronic health record notes.
cEhrBERT1M: BERT-based model that was trained using 1 million electronic health record notes.
dDNorm: disease name normalization.