Literature DB >> 30478023

Extraction of Information Related to Adverse Drug Events from Electronic Health Record Notes: Design of an End-to-End Model Based on Deep Learning.

Fei Li^1,2,3, Weisong Liu^1,2,3, Hong Yu^1,2,3,4.

Abstract

BACKGROUND: Pharmacovigilance and drug-safety surveillance are crucial for monitoring adverse drug events (ADEs), but the main ADE-reporting systems such as Food and Drug Administration Adverse Event Reporting System face challenges such as underreporting. Therefore, as complementary surveillance, data on ADEs are extracted from electronic health record (EHR) notes via natural language processing (NLP). As NLP develops, many up-to-date machine-learning techniques are introduced in this field, such as deep learning and multi-task learning (MTL). However, only a few studies have focused on employing such techniques to extract ADEs.
OBJECTIVE: We aimed to design a deep learning model for extracting ADEs and related information such as medications and indications. Since extraction of ADE-related information includes two steps-named entity recognition and relation extraction-our second objective was to improve the deep learning model using multi-task learning between the two steps.
METHODS: We employed the dataset from the Medication, Indication and Adverse Drug Events (MADE) 1.0 challenge to train and test our models. This dataset consists of 1089 EHR notes of cancer patients and includes 9 entity types such as Medication, Indication, and ADE and 7 types of relations between these entities. To extract information from the dataset, we proposed a deep-learning model that uses a bidirectional long short-term memory (BiLSTM) conditional random field network to recognize entities and a BiLSTM-Attention network to extract relations. To further improve the deep-learning model, we employed three typical MTL methods, namely, hard parameter sharing, parameter regularization, and task relation learning, to build three MTL models, called HardMTL, RegMTL, and LearnMTL, respectively.
RESULTS: Since extraction of ADE-related information is a two-step task, the result of the second step (ie, relation extraction) was used to compare all models. We used microaveraged precision, recall, and F1 as evaluation metrics. Our deep learning model achieved state-of-the-art results (F1=65.9%), which is significantly higher than that (F1=61.7%) of the best system in the MADE1.0 challenge. HardMTL further improved the F1 by 0.8%, boosting the F1 to 66.7%, whereas RegMTL and LearnMTL failed to boost the performance.
CONCLUSIONS: Deep learning models can significantly improve the performance of ADE-related information extraction. MTL may be effective for named entity recognition and relation extraction, but it depends on the methods, data, and other factors. Our results can facilitate research on ADE detection, NLP, and machine learning. ©Fei Li, Weisong Liu, Hong Yu. Originally published in JMIR Medical Informatics (http://medinform.jmir.org), 26.11.2018.

Entities: Chemical Disease Species

Keywords: adverse drug event; deep learning; multi-task learning; named entity recognition; natural language processing; relation extraction

Year: 2018 PMID： 30478023 PMCID： PMC6288593 DOI： 10.2196/12159

Source DB: PubMed Journal: JMIR Med Inform

Introduction

Background

An adverse drug event (ADE) is an injury resulting from a medical drug intervention [1]. Previous studies reported that ADEs could account for up to 41% of all hospital admissions [2,3]. An ADE may cause a prolonged length of stay in the hospital and increase the economic burden [4]. The annual cost of ADEs for a 700-bed hospital is approximately $5.6 million [5]. Moreover, the total number of iatrogenic deaths can reach nearly 800,000 per year, which is higher than the death rate of heart disease or cancer [6]. In 2013, medical error, including ADEs, is the third most-common cause of death in the United States [7]. Therefore, ADE detection and report are crucial for pharmacovigilance and drug-safety surveillance [8,9]. Two methods are usually used to detect and report ADE. In premarketing surveillance, ADEs can be discovered during phase III clinical trials for drug development. In postmarketing surveillance, ADEs are discovered by patients and physicians using the Food and Drug Administration (FDA) Adverse Event Reporting System (FAERS). These traditional methods are limited by the number of participants [10], underreporting [11], and missing patterns of drug exposure [12]; for example, underreporting is a known issue in FAERS and may occur due to several reasons. First, the objective and content of the report in FAERS change over time, which may confuse physicians and the general public. Second, patients may choose not to mention some reactions, due to which practitioners fail to report them. Third, ADEs with long latency or producing unusual symptoms may be unrecognized. Other reasons may include payments from pharmaceutical companies to doctors [13] and inefficient communication between patients, physicians, and pharmacists. Recently, the FDA made the FAERS data available through a public dashboard [14]. Since anyone can view ADE reports online, this venture may help the FDA receive feedback to improve the FAERS. Many researchers have used other resources to identify ADEs, such as biomedical publications [15,16], social media [17,18], and electronic health record (EHR) notes [19-21]. The ADEs extracted from these resources are an important complement to traditional ADE-surveillance systems. However, manual collection of ADEs from these data is laborious [22]. As such, the use of computer systems is a good choice to automatically detect ADEs, but may fail since these data are often unstructured text. Therefore, natural language processing (NLP) techniques are employed for this significant task [15,20,21,23]. From the viewpoint of NLP, ADE detection is covered under the task of information extraction, which includes ADE extraction as well as information related to ADE, such as medications and indications. Extraction of ADE-related information can be casted as a two-step pipeline. The first step is named entity recognition (NER) [24], which recognizes a string of text as an entity (eg, medication or ADE) that is predefined by dataset annotators. The second step is relation extraction (RE) [15], which is a model that determines whether two entities have a specific relation (eg, medication and ADE). Previous studies employed traditional machine-learning techniques [15,16,23,24] such as condition random field (CRF) [25] or support vector machine (SVM) [26]. Recently, deep learning attracted much attention in NLP due its numerous advantages such as better performances and less feature engineering compared to other systems [27,28]. However, only a few studies have addressed extraction of ADE-related information via deep learning. Since ADE detection can be divided into two tasks (ie, NER and RE), it is logical to incorporate multi-task learning (MTL) [29] into ADE detection. However, only limited prior work has investigated the impact of MTL on ADE detection.

Relevant Literature

Adverse Drug Event Detection

Since ADEs play an important role in drug-safety surveillance, ADE detection receives increasing attention from both the federal regulation and the research community. Besides the official reporting system FAERS, there are other databases that collect data on known ADEs, such as the Comparative Toxicogenomics Database [30] and SIDER [31]. Various resources have been used to detect ADEs, such as biomedical publications [15,16], social media [17,18], and electronic health record (EHR) notes [19-21]. For example, Gurulingappa et al [16] leveraged medical case reports to build a corpus in order to support drug-related adverse effects. Wei et al [15] organized a challenge task to extract chemical-induced disease relations from the literature and created an annotated corpus from 1500 articles. With respect to the methods, both supervised and unsupervised methods were used. Ramesh et al [32] developed a supervised machine-learning model to extract adverse event entities from FAERS narratives. Xu and Wang [33] used a semisupervised bootstrapped method to construct a knowledge base for the drug-side-effect association. Liu et al [34] proposed a causality-analysis model based on structure learning for identifying factors that contribute to adverse drug reactions. Yildirim et al [35] applied the k-mean algorithm to identify adverse reactions. Xu et al [23] used SVM to extract ADEs between sentence-level and document-level drug-disease pairs. Recently, Munkhdalai et al [21] attempted to use deep learning to address ADE extraction, but their model was not end-to-end and relied on the entities. Study overview. NER: named entity recognition. RE: relation extraction. BiLSTM: bidirectional long short-term-memory. CRF: conditional random field. MTL: multi-task learning. MADE: Medication, Indication, and Adverse Drug Events. HardMTL: multi-task learning model for hard parameter sharing. RegMTL: multi-task learning model for soft parameter sharing based on regularization. LearnMTL: multi-task learning model for soft parameter sharing based on task-relation learning.

Named Entity Recognition

NER is a standard task that has been studied for many years in NLP [25]. Many researchers made important contributions to dataset construction including the GENIA corpus [36], the NCBI disease corpus [37], and the ShARe/CLEF eHealth evaluation [38]. Early studies addressed NER by diverse approaches such as rule-based [39] and machine-learning approaches [40-42], among which CRF-based approaches predominated. For example, Campos et al [40] presented a CRF model to recognize biomedical names, which achieved state-of-the-art performance at the time by incorporating rich features. Tang et al [43] modified the label scheme of CRF to make it be capable of recognizing disjoint clinical concepts. However, such approaches need many efforts for feature engineering. Recently, a bidirectional LSTM (BiLSTM) model [44,45] was proposed and became a popular method for NER. In the biomedical domain, Jagannatha and Yu [20] used such a model to detect medical events from EHR notes.

Relation Extraction

RE has been widely studied, and some typical RE corpora in the biomedical domain include the 2010 i2b2/VA challenge [46] and BioCreative V CDR task [15]. Early work used rules and NLP structures such as dependency trees [47] and coreference chains [48] to help extract relations. Others usually leveraged machine-learning approaches such as SVM [49,50] and structured learning [51]. As deep learning developed, researchers proposed a number of neural network models to handle RE [52,53]. Our study is related to the joint or end-to-end entity and RE, which allows performance of NER and RE simultaneously. Miwa and Bansal [54] proposed an end-to-end model based on the sequence and tree LSTM. Similarly, Mehryary et al [55] proposed an end-to-end system to extract information about bacteria and their habitats.

Multi-Task Learning

MTL [29] refers to training the model for multiple related tasks. It is widely used in artificial intelligence research such as computer vision [56] and NLP [57]. Learning these tasks simultaneously may improve the performance as compared to learning the tasks individually. Prior MTL studies mostly focused on homogeneous MTL that consists of tasks with only one type such as classification or regression [58]. Some of their tasks are closely related, such as cross-lingual [59] and synthetic data [60]. Based on a previous study [58], MTL can be roughly divided into two categories—hard and soft parameter sharing. For hard parameter sharing, the lower layers are shared among multiple tasks and each task has its own higher layer [54]. For soft parameter sharing, each task has its own model with its own parameters. There are some representative methods for soft parameter sharing such as regularization [59] or learning task relations [56].

Objective

Since only a few prior studies have addressed end-to-end detection of ADE via deep learning, we aimed to design a two-step pipeline model that consists of two submodels: a BiLSTM [61] CRF [25] network for NER and a BiLSTM-Attention [62] network for RE. Since extraction of ADE-related information includes two steps, it is possible to study the impact of MTL on NER and RE. However, only limited prior work has focused on MTL with such heterogeneous and loosely related tasks. Therefore, our second objective was to fill this gap by proposing three MTL models and comparing them with the pipeline model. An overview of this study is shown in Figure 1.

Figure 1

Study overview. NER: named entity recognition. RE: relation extraction. BiLSTM: bidirectional long short-term-memory. CRF: conditional random field. MTL: multi-task learning. MADE: Medication, Indication, and Adverse Drug Events. HardMTL: multi-task learning model for hard parameter sharing. RegMTL: multi-task learning model for soft parameter sharing based on regularization. LearnMTL: multi-task learning model for soft parameter sharing based on task-relation learning.

Methods

Deep Learning Pipeline Model

BiLSTM-CRF Submodel for NER

Our NER submodel is presented in Figure 2. We extended the state-of-the-art BiLSTM-CRF model [44,45] by enriching its features. For a sentence, we extracted four kinds of features for each token, namely, its word, whether the initial character is capital, its part-of-speech (POS) tag, and its character representation. We employed a convolutional neural network to obtain character representation. After the token representations are obtained by concatenating the features, we fed them into a bidirectional LSTM layer to learn the hidden representations. Subsequently, the hidden representations were input into the CRF layer to determine the optimal labels for all the tokens in the sentence. For labels, we use the BMES (Begin, Middle, End, Singular) label scheme [45] plus entity types. For example, the label of the token “Renal” is “B_Disease.” The details of the BiLSTM-CRF submodel for NER are provided in Multimedia Appendix 1.

Figure 2

NER submodel. For simplicity, here we use “Renal Failure” to illustrate the architecture. For “Renal,” the word feature is “Renal,” the capital feature of the initial character is “R,” the POS feature is “JJ,” and the character representation is generated from CNN. NER: named entity recognition. CNN: convolutional neural network. CRF: condition random field. LSTM: long short-term memory. CNN: convolutional neural network. POS: part of speech.

BiLSTM-Attention Submodel for RE

Our RE submodel is presented in Figure 3. A relation instance can be considered as a token sequence and two target entities. Here, the token sequence did not necessarily have to be one sentence, as we could also extract intersentence relations. For each token, we extracted four kinds of features, namely, its word, its POS tag, and the position embeddings [63]. Here, the character representation was not used, because it reduced the performance in our preliminary experiments. Similar to the case for NER, we employed a BiLSTM layer to generate the hidden representations. Subsequently, the attention method [62] was used to obtain context features.

Figure 3

RE submodel. The target entities are “renal failure” (e1) and “antibiotics” (e2). Positions represent token distances to the target entities. RE: relation extraction. LSTM: long short-term memory. POS: part of speech.

Because only context features may not be enough to capture the semantic relation, we also employed other features for concision, which are not shown in Figure 3. Considering previous work [21], these features included words of two target entities, types of two target entities, the token number between two target entities, and the entity number between two target entities. Like the word or POS embeddings, these features can be represented as vectors. Therefore, the output layer considers the concatenation of all these features to determine the relation of target entities. The details of the BiLSTM-Attention submodel for RE are provided in Multimedia Appendix 2.

Multi-Task Learning Models

In this section, we propose three MTL models: one model used hard parameter sharing [54] and two models used soft parameter sharing, namely, regularization [59] and task relation learning [56].

HardMTL

Our MTL model for hard parameter sharing is presented in Figure 4. We employed the shared-private architecture [64] to make each submodel of each task retain its private parts and share some parts for multi-task learning. The NER and RE submodels had their own BiLSTM layers, namely, LSTM and LSTM, and shared a BiLSTM layer, LSTM. During training, the shared BiLSTM layer LSTM was used by both the NER and RE submodels, so that it was tuned during the back-propagation by both submodels. Therefore, the model was able to learn useful knowledge from both tasks. The details of the HardMTL model are provided in Multimedia Appendix 3.

Figure 4

The high-level view of HardMTL. For conciseness, “LSTM” indicates a BiLSTM layer, and the layers above the BiLSTM layer are denoted as D and D. The forward procedures for an NER instance and an RE instance are indicated by blue and green arrow lines, respectively. HardMTL: multi-task learning model for hard parameter sharing. LSTM: long short-term-memory. BiLSTM: bidirectional long short-term-memory. CRF: conditional random field. NER: named entity recognition. RE: relation extraction.

RegMTL

Our first MTL model for soft parameter sharing was based on regularization, and its architecture is presented in Figure 5. With reference to previous studies [59,60], we employed the L2 regularization in order to encourage the parameters of the NER and RE submodels to be similar instead of sharing some parts in the networks. Two BiLSTM layers were considered because different inputs of the NER and RE submodels lead to different dimensions of the first BiLSTM layer; therefore, L2 regularization of the parameters of the first BiLSTM layer was computationally intractable. We resolved this issue by performing L2 regularization in the second BiLSTM layer. The details of the RegMTL model are provided in Multimedia Appendix 3.

Figure 5

The high-level view of RegMTL. LSTM and LSTM indicate the first and second BiLSTM layers of the NER model. LSTM and LSTM indicate the first and second BiLSTM layers of the RE model. NER: named entity recognition. RE: relation extraction. RegMTL: multi-task learning model for soft parameter sharing based on regularization. BiLSTM: bidirectional long short-term-memory. CRF: conditional random field. LSTM: long short-term-memory.

LearnMTL

Our second MTL model for soft parameter sharing was based on task relation learning [56], and its architecture is illustrated in Figure 6. After generating hidden representations from the BiLSTM and attention layers, we used a linear layer, W, to exchange information between the NER and RE submodels. To utilize task-specific and shared information, the concatenation of hidden representations of the BiLSTM and information exchange layers was fed into the upper decoders D and D. The details of the LearnMTL model are provided in Multimedia Appendix 3.

Figure 6

The high-level view of LearnMTL. LearnMTL: multi-task learning model for soft parameter sharing based on task-relation learning. CRF: conditional random field. LSTM: long short-term-memory.

Dataset

We used the MADE dataset from the MADE1.0 challenge for detecting medications and ADEs from EHR notes [65]. It consists of 1089 EHR notes of patients with cancer, from which data for 18 common Protected Health Information aspects were removed according to the Health Insurance Portability and Accountability Act. The dataset was separated into 876 notes for training and 213 notes for testing. In this dataset, the annotators annotated not only ADEs, but also other ADE-related information. They predefined 9 entity types, namely, Medication, Indication, Frequency, Severity, Dosage, Duration, Route, ADE, and SSLIF (any sign, symptom, and disease that is not an ADE or Indication). In addition, they predefined 7 relation types between these entity types, namely, Dosage-Medication, Route-Medication, Frequency-Medication, Duration-Medication, Medication-Indication, Medication-ADE, and Severity-ADE. NER submodel. For simplicity, here we use “Renal Failure” to illustrate the architecture. For “Renal,” the word feature is “Renal,” the capital feature of the initial character is “R,” the POS feature is “JJ,” and the character representation is generated from CNN. NER: named entity recognition. CNN: convolutional neural network. CRF: condition random field. LSTM: long short-term memory. CNN: convolutional neural network. POS: part of speech. RE submodel. The target entities are “renal failure” (e1) and “antibiotics” (e2). Positions represent token distances to the target entities. RE: relation extraction. LSTM: long short-term memory. POS: part of speech. The high-level view of HardMTL. For conciseness, “LSTM” indicates a BiLSTM layer, and the layers above the BiLSTM layer are denoted as D and D. The forward procedures for an NER instance and an RE instance are indicated by blue and green arrow lines, respectively. HardMTL: multi-task learning model for hard parameter sharing. LSTM: long short-term-memory. BiLSTM: bidirectional long short-term-memory. CRF: conditional random field. NER: named entity recognition. RE: relation extraction. The high-level view of RegMTL. LSTM and LSTM indicate the first and second BiLSTM layers of the NER model. LSTM and LSTM indicate the first and second BiLSTM layers of the RE model. NER: named entity recognition. RE: relation extraction. RegMTL: multi-task learning model for soft parameter sharing based on regularization. BiLSTM: bidirectional long short-term-memory. CRF: conditional random field. LSTM: long short-term-memory. The high-level view of LearnMTL. LearnMTL: multi-task learning model for soft parameter sharing based on task-relation learning. CRF: conditional random field. LSTM: long short-term-memory.

Results

The experimental settings used to obtain these results are provided in Multimedia Appendix 4.

Comparison Between Our Best Model and Existing Systems

We compared our models with the top three systems in the MADE1.0 challenge. Chapman et al [66] used CRF for NER and random forest for RE. Specifically, two random forest models were used—one for detecting whether relations exist between entities and the other for classifying what specific relation type exists. Xu et al [67] used BiLSTM-CRF for NER with word, prefix, suffix, and character features. For RE, they used SVM and designed features such as positions, distances, bag of words, and bag of entities. Dandala et al [68] also used BiLSTM-CRF for NER, but they input different features into the model such as words, POS tags, and characters. For RE, they employed the BiLSTM-Attention model that takes tokens, entity types, and positions as input. Full neural systems ([68] and our study) achieve better performances with the MADE dataset than with other systems (Table 1). Although the main methods between the study of Dandala et al [68] and our study are similar, our model is much better, as it significantly improved the F1 for RE by 5%. The reasons for this superiority may be that we used more features than previous work, such as capital information and entity words, and our model attained approximately 0.8% improvement in F1 from MTL.

Table 1

System	Named entity recognition	Relation extraction	F1
Chapman et al [66]	CRF^a	Random forest	59.2
Xu et al [67]	BiLSTM^b-CRF	Support vector machine	59.9
Dandala et al [68]	BiLSTM-CRF	BiLSTM-Attention	61.7
Our Best (HardMTL^c)	BiLSTM-CRF	BiLSTM-Attention	66.7

aCRF: conditional random field

bBiLSTM: bidirectional long short-term memory

cHardMTL: multi-task learning model for hard parameter sharing

Comparison Between the Pipeline and MTL Models

The HardMTL model outperforms other models, achieving an F1 of 84.5% in NER and 66.7% in RE (Table 2); the pipeline model ranks second, with F1 values of 84.1% and 65.9%, respectively. The RegMTL model obtains the best recall in both NER (84.5%) and RE (63.6%). Surprisingly, the most-complex MTL model LearnMTL ranked last.

Table 2

Performances (%) of the pipeline and multi-task learning models. The values presented are the means of 5 runs of each model. The microaveraged P, R, and F1s of all entity or relation types are shown.

Method	Entity recognition			Relation extraction
	P	R	F1	P	R	F1
Pipeline	85.0	83.2	84.1	69.8	62.4	65.9
HardMTL^a	85.0	84.1	84.5	70.2	63.6	66.7
RegMTL^b	84.5	84.5	84.5	66.7	63.6	65.1
LearnMTL^c	84.5	82.8	83.6	67.2	61.5	64.2

aHardMTL: multi-task learning model for hard parameter sharing

bRegMTL: multi-task learning model for soft parameter sharing based on regularization

cLearnMTL: multi-task learning model for soft parameter sharing based on task relation learning

In our experiments, HardMTL successfully boosted the NER F1 by 0.4% (P=.003) and the RE F1 by 0.8% (P=.01), but RegMTL and LearnMTL failed to boost the performances. Thus, the effectiveness of different MTL methods depends on the selected tasks and data. For heterogenous and loosely related tasks such as NER and RE, it is more difficult for MTL to be effective.

Performance of Each Entity Type

Table 3 shows the performance of each entity type. Medication and Route (both F1>90%) were easier to recognize than other types. In contrast, ADE is the most-difficult type to recognize (F1=55%). Other entity types with lower performances included Indication and Duration.

Table 3

Performance (%) of each entity type.

Entity type	P	R	F1
Medication	91.1	92.0	91.3
Indication	65.4	64.8	64.8
Frequency	87.1	86.5	86.3
Severity	84.6	84.7	84.7
Dosage	87.9	86.4	88.0
Duration	75.3	76.6	77.6
Route	91.6	91.9	91.9
Adverse drug events	59.5	57.6	55.4
SSLIF^a	83.9	84.8	84.9

aSSLIF: any sign, symptom, and disease that is not an ADE or Indication

Performance of Each Relation Type

Table 4 shows the performance of each relation type. Medication-ADE relations are the most-difficult type to extract (F1=45.5%). Severity-ADE ranks second (F1=54.1%), followed by Duration-Medication (F1=59.5%). In contrast, Route-Medication and Dosage-Medication extraction are relatively easier, with F1>80%.

Table 4

Performance (%) of each relation type.

Relation type	P	R	F1
Severity-Adverse drug events	55.0	54.4	54.1
Route-Medication	81.0	82.5	82.1
Medication-Indication	53.9	52.5	52.9
Dosage-Medication	80.9	79.8	81.0
Duration-Medication	60.3	63.7	59.5
Frequency-Medication	77.7	78.6	78.4
Medication-Adverse drug events	50.4	47.6	45.5

Comparison Between the Pipeline Model and MedEx System

MedEx [69] is an end-to-end system used to identify medications and their attributes such as routes and dosages. Therefore, the final results of MedEx correspond to our results for extracting 4 kinds of relations: Route-Medication, Dosage-Medication, Duration-Medication, Frequency-Medication. Table 5 compares MedEx with our model. Our model significantly outperformed MedEx, which demonstrates that our model is a competitive system in this domain.

Table 5

Results (%) of comparisons between our pipeline model and the MedEx system.

Entity type	MedEx system			Pipeline model
	P	R	F1	P	R	F1
Route-Medication	71.9	47.9	57.5	81.0	82.5	82.1
Dosage-Medication	29.7	3.5	6.2	80.9	79.8	81.0
Duration-Medication	25.5	15.6	19.4	60.3	63.7	59.5
Frequency-Medication	52.5	36.2	42.8	77.7	78.6	78.4

Comparison of our model with the existing systems in the Medication, Indication, and Adverse Drug Events dataset. The microaveraged F1s of relation extraction are shown according to the official evaluation report. aCRF: conditional random field bBiLSTM: bidirectional long short-term memory cHardMTL: multi-task learning model for hard parameter sharing Performances (%) of the pipeline and multi-task learning models. The values presented are the means of 5 runs of each model. The microaveraged P, R, and F1s of all entity or relation types are shown. aHardMTL: multi-task learning model for hard parameter sharing bRegMTL: multi-task learning model for soft parameter sharing based on regularization cLearnMTL: multi-task learning model for soft parameter sharing based on task relation learning Performance (%) of each entity type. aSSLIF: any sign, symptom, and disease that is not an ADE or Indication Performance (%) of each relation type. Results (%) of comparisons between our pipeline model and the MedEx system.

Discussion

Principal Findings

Existing systems usually selected a two-step pipeline to address ADE-related information extraction: recognizing entities and extracting relations. BiLSTM-CRF is the most-popular model for NER, whereas the selections of RE models are mixed. All our models outperformed the existing systems in the MADE1.0 challenge, which may be because of the following reasons: First, our models benefited from deep learning that is able to learn better from the data. Second, we enriched the features of deep learning models; therefore, our model outperformed the system [68] that used similar deep learning models as ours. For MTL, we found that the model using hard parameter sharing (HardMTL) performed better than the other two models using soft parameter sharing (RegMTL and LearnMTL) and that the most complex MTL model, LearnMTL, performed the worst in our data. Our results are not surprising, as different MTL methods depend on different tasks and data [54,56,59]. Overall, MTL more difficult between heterogeneous and loosely related tasks such as NER and RE. In our experiments, the entity type “ADE” and relation type “Medication-ADE” were the most difficult information to be extracted. Based on our analysis, this is not only due to a lack of training data, but also the intrinsic character of ADEs. ADEs are often implicit in the context without any obvious pattern, which negatively affects the model (Example 1 in Multimedia Appendix 5). In contrast, some entity or relation types with obvious patterns (eg, Medication-Dosage) are easier to identify (Example 2 in Multimedia Appendix 5). Finally, we found that the performance improved when we used the pretrained word embeddings in the biomedical domain [70] rather than those in the general domain. Furthermore, if the pretrained word embeddings were not tuned, our models would perform better. One likely reason for this is that such a method can alleviate the overfitting problem.

Error Analysis

We randomly sampled hundreds of error instances of NER and RE. Through the manual analyses, we found several sources of errors. For NER, the major false-negative errors are due to long expressions of entities (Examples 3 and 4 in Multimedia Appendix 5). These entities (eg, IgG kappa monoclonal protein) include multiple words; therefore, it is difficult to detect their boundaries. Moreover, the major false-positive errors for NER occur because some entity types are incorrectly recognized as SSLIF (Examples 5 and 6 in Multimedia Appendix 5). This may be because the training instances of SSLIF are ≥10 times those of other entity types such as ADE. Thus, imbalanced data distribution may lead to certain bias of our models. With respect to RE, the major false-negative errors are due to long distances between target entities (Example 9 in Multimedia Appendix 5). The relation of two entities can be expressed through ≥6 sentences in EHRs; therefore, our model may miss such relations in a long context. In addition, the major false-positive errors for RE occur because relation expressions exist in the instance, but are not related to the target entities (Examples 7 and 8 in Multimedia Appendix 5). For instance, in Example 7 of Multimedia Appendix 5, “His current therapy includes [thalidomide]Entity1 50 mg a day for 2 weeks out of the month. He had been on Velcade, which was stopped secondary to increasing [peripheral neuropathy], ” “peripheral neuropathy,” and “thalidomide” have no Medication-ADE relation, but the model incorrectly predicts their relation due to the words “secondary to.”

Contributions

The main contributions of this work are as follows: (1) We proposed an up-to-date deep learning model to perform ADE-related information extraction in an end-to-end manner. Our model achieved new state-of-the-art performance, improving the F1 by 4.2% (absolute value). (2) To our knowledge, this is the first attempt to investigate the impact of MTL on two heterogeneous and loosely related tasks (ie, NER and RE). One of our MTL models further improved the F1 by 0.8% (absolute value). (3) Our manually annotated dataset—Medication, Indication, and Adverse Drug Events (MADE) [65]—will be publicly available to support the research on extraction of ADE-related information.

Conclusions

We proposed a deep learning model to detect ADEs and related information. We also investigated MTL on two ADE-related tasks, NER and RE. Our models achieved state-of-the-art performance in an ADE-detection dataset. MTL can improve performance, but it depends on the methods and data used. In the future, we plan to evaluate our models with more related datasets.

47 in total

1. High rates of adverse drug events in a highly computerized hospital.

Authors: Jonathan R Nebeker; Jennifer M Hoffman; Charlene R Weir; Charles L Bennett; John F Hurdle
Journal: Arch Intern Med Date: 2005-05-23

Review 2. A systematic review of the performance characteristics of clinical event monitor signals used to detect adverse drug events in the hospital setting.

Authors: Steven M Handler; Richard L Altman; Subashan Perera; Joseph T Hanlon; Stephanie A Studenski; James E Bost; Melissa I Saul; Douglas B Fridsma
Journal: J Am Med Inform Assoc Date: 2007-04-25 Impact factor: 4.497

3. Long short-term memory.

Authors: S Hochreiter; J Schmidhuber
Journal: Neural Comput Date: 1997-11-15 Impact factor: 2.026

Review 4. Deep learning.

Authors: Yann LeCun; Yoshua Bengio; Geoffrey Hinton
Journal: Nature Date: 2015-05-28 Impact factor: 49.962

5. Incidence and preventability of adverse drug events among older persons in the ambulatory setting.

Authors: Jerry H Gurwitz; Terry S Field; Leslie R Harrold; Jeffrey Rothschild; Kristin Debellis; Andrew C Seger; Cynthia Cadoret; Leslie S Fish; Lawrence Garber; Michael Kelleher; David W Bates
Journal: JAMA Date: 2003-03-05 Impact factor: 56.272

6. NCBI disease corpus: a resource for disease name recognition and concept normalization.

Authors: Rezarta Islamaj Doğan; Robert Leaman; Zhiyong Lu
Journal: J Biomed Inform Date: 2014-01-03 Impact factor: 6.317

Review 7. Overview of the First Natural Language Processing Challenge for Extracting Medication, Indication, and Adverse Drug Events from Electronic Health Record Notes (MADE 1.0).

Authors: Abhyuday Jagannatha; Feifan Liu; Weisong Liu; Hong Yu
Journal: Drug Saf Date: 2019-01 Impact factor: 5.606

8. Exploring the boundaries: gene and protein identification in biomedical text.

Authors: Jenny Finkel; Shipra Dingare; Christopher D Manning; Malvina Nissim; Beatrice Alex; Claire Grover
Journal: BMC Bioinformatics Date: 2005-05-24 Impact factor: 3.169

9. Automatically Recognizing Medication and Adverse Event Information From Food and Drug Administration's Adverse Event Reporting System Narratives.

Authors: Balaji Polepalli Ramesh; Steven M Belknap; Zuofeng Li; Nadya Frid; Dennis P West; Hong Yu
Journal: JMIR Med Inform Date: 2014-06-27

10. Filtering Entities to Optimize Identification of Adverse Drug Reaction From Social Media: How Can the Number of Words Between Entities in the Messages Help?

Authors: Redhouane Abdellaoui; Stéphane Schück; Nathalie Texier; Anita Burgun
Journal: JMIR Public Health Surveill Date: 2017-06-22

20 in total

Review 1. Deep learning in clinical natural language processing: a methodical review.

Authors: Stephen Wu; Kirk Roberts; Surabhi Datta; Jingcheng Du; Zongcheng Ji; Yuqi Si; Sarvesh Soni; Qiong Wang; Qiang Wei; Yang Xiang; Bo Zhao; Hua Xu
Journal: J Am Med Inform Assoc Date: 2020-03-01 Impact factor: 4.497

2. A general approach for improving deep learning-based medical relation extraction using a pre-trained model and fine-tuning.

Authors: Tao Chen; Mingfen Wu; Hexi Li
Journal: Database (Oxford) Date: 2019-01-01 Impact factor: 3.451

3. Relation Classification for Bleeding Events From Electronic Health Records Using Deep Learning Systems: An Empirical Study.

Authors: Avijit Mitra; Bhanu Pratap Singh Rawat; David D McManus; Hong Yu
Journal: JMIR Med Inform Date: 2021-07-02

4. Naranjo Question Answering using End-to-End Multi-task Learning Model.

Authors: Bhanu Pratap Singh Rawat; Fei Li; Hong Yu
Journal: KDD Date: 2019-08

Review 5. Artificial Intelligence for Drug Toxicity and Safety.

Authors: Anna O Basile; Alexandre Yahi; Nicholas P Tatonetti
Journal: Trends Pharmacol Sci Date: 2019-08-02 Impact factor: 14.819

6. BENTO: A Visual Platform for Building Clinical NLP Pipelines Based on CodaLab.

Authors: Yonghao Jin; Fei Li; Hong Yu
Journal: Proc Conf Assoc Comput Linguist Meet Date: 2020-07

7. Bleeding Entity Recognition in Electronic Health Records: A Comprehensive Analysis of End-to-End Systems.

Authors: Avijit Mitra; Bhanu Pratap Singh Rawat; David McManus; Alok Kapoor; Hong Yu
Journal: AMIA Annu Symp Proc Date: 2021-01-25

8. Automated Identification of Patients With Immune-Related Adverse Events From Clinical Notes Using Word Embedding and Machine Learning.

Authors: Samir Gupta; Anas Belouali; Neil J Shah; Michael B Atkins; Subha Madhavan
Journal: JCO Clin Cancer Inform Date: 2021-05

Review 9. Can antiepileptic efficacy and epilepsy variables be studied from electronic health records? A review of current approaches.

Authors: Barbara M Decker; Chloé E Hill; Steven N Baldassano; Pouya Khankhanian
Journal: Seizure Date: 2021-01-13 Impact factor: 3.184

Review 10. Machine and deep learning approaches for cancer drug repurposing.

Authors: Naiem T Issa; Vasileios Stathias; Stephan Schürer; Sivanesan Dakshanamurthy
Journal: Semin Cancer Biol Date: 2020-01-03 Impact factor: 15.707