Literature DB >> 34468999

Identifying adverse drug reactions from free-text electronic hospital health record notes.

Arthur Wasylewicz¹, Britt van de Burgt¹, Aniek Weterings², Naomi Jessurun³, Erik Korsten^1,4, Toine Egberts^5,6, Arthur Bouwman^4,7, Marieke Kerskes⁸, René Grouls⁸, Carolien van der Linden².

Abstract

BACKGROUND: Adverse drug reactions (ADRs) are estimated to be the fifth cause of hospital death. Up to 50% are potentially preventable and a significant number are recurrent (reADRs). Clinical decision support systems have been used to prevent reADRs using structured reporting concerning the patient's ADR experience, which in current clinical practice is poorly performed. Identifying ADRs directly from free text in electronic health records (EHRs) could circumvent this. AIM: To develop strategies to identify ADRs from free-text notes in electronic hospital health records.
METHODS: In stage I, the EHRs of 10 patients were reviewed to establish strategies for identifying ADRs. In stage II, complete EHR histories of 45 patients were reviewed for ADRs and compared to the strategies programmed into a rule-based model. ADRs were classified using MedDRA and included in the study if the Naranjo causality score was ≥1. Seriousness was assessed using the European Medicine Agency's important medical event list.
RESULTS: In stage I, two main search strategies were identified: keywords indicating an ADR and specific prepositions followed by medication names. In stage II, the EHRs contained a median of 7.4 (range 0.01-18) years of medical history covering over 35 000 notes. A total of 318 unique ADRs were identified of which 63 were potentially serious and 179 (sensitivity 57%) were identified by the rule. The method falsely identified 377 ADRs (positive predictive value 32%). However, it also identified an additional eight ADRs.
CONCLUSION: Two key strategies were developed to identify ADRs from hospital EHRs using free-text notes. The results appear promising and warrant further study.

Entities: Chemical

Keywords: adverse drug event; adverse drug reaction; clinical decision support; clinical decision support system; drug allergy; free-text; natural language processing; text-mining

Mesh：

Year: 2021 PMID： 34468999 PMCID： PMC9292762 DOI： 10.1111/bcp.15068

Source DB: PubMed Journal: Br J Clin Pharmacol ISSN： 0306-5251 Impact factor: 3.716

What is already known about this subject

Recurrent adverse drug reactions (ADRs) are common and significantly influence morbidity, mortality and medical costs. Many recurrent ADRs are preventable and can be attributed to unintended represcription. Clinical decision support systems were implemented to prevent unintended represcription, but such systems only function when the ADR is registered as structured information, which in current clinical practice is poorly done.

What this study adds

Identifying ADRs directly from hospital free‐text electronic health record (EHR) notes using an automated tool is promising, although sensitivity and specificity need further improvement. Nearly a third of all registered ADRs could, however, not be found in physician notes. The seriousness influences the chance the rule‐based model finds the ADR.

INTRODUCTION

Adverse drug reactions (ADRs), including allergic responses, frequently occur and significantly influence morbidity, mortality and medical costs. , About 3.6‐6.5% of hospitalizations are related to an ADR. , , , , Furthermore, 10‐15% of patients develop an ADR during hospitalization , resulting in death in 0.05‐0.25% of cases. , ADRs are the fifth most common cause of hospital deaths. , Moreover, 28‐56% of all ADRs are potentially preventable. , , , Different approaches have been used to define preventability, and most of these present difficulties for translation into interventions. A significant number of potentially preventable ADRs are recurrent ADRs (reADRs) (10‐30% of all ADRs, , 13‐50% of medication‐related hospitalizations , , , ). reADRs have a different form of preventability compared to first occurrence ADRs, introducing knowledge on a patient's response to a drug in a certain dose in a certain context, making them easier to prevent. Preventable reADRs have multiple origins. The most important cause is unintended represcription, defined as the represcription of medication previously intentionally stopped due to an ADR (eg, the represcription of hydrochlorothiazide stopped due to hyponatraemia in an elderly woman). , To prevent unintended represcriptions and the risk of reADRs, clinical decision support systems (CDSSs) have been implemented to alert prescribers when a medication is represcribed after it was previously stopped due to an ADR. Currently, however, CDSSs only function when the ADR is registered as structured information at the level of the individual patient within an ADR module linked to or part of the computerized physician order entry (CPOE) system of the electronic health record (EHR). In current clinical practice, this is poorly performed due to time constraints, inadequate IT systems, a lack of peer support and failing to acknowledge the importance of structurally registering ADRs. Healthcare professionals frequently describe ADRs in clinical notes and discharge summaries using free‐text entries, which is not effective in preventing unintended represcription. , , Identifying ADRs directly from free‐text EHR notes could solve the issue of underreporting in a structured format by healthcare professionals. In recent years, progress has been made in identifying ADRs from free text. Honingman et al developed an algorithm to screen primary care records. Iqbal et al developed and validated an algorithm to detect specific ADRs related to antidepressants and antipsychotics in psychiatric hospital EHRs. Aramaki et al developed and tested an ADR identification algorithm using Japanese discharge summaries. The sensitivity of the different algorithms was approximately 60% for general ADR identification , and up to 90% for specific ADRs. Previous studies have focused on specific medication, , specific ADRs, , selected notes , , , , , , , , , and specific settings. , No studies have been identified that use a general approach to detect ADRs from all free text available in a hospital EHR system. Investigating specific notes may result in identifying only a fraction of the reported ADRs; focusing on specific ADRs automatically overlooks other ADRs. Moreover, previous studies have not assessed the causality and seriousness of the identified results. Therefore, the aim of this study was to develop strategies to identify ADRs from free‐text notes in hospital EHRs.

METHODS

Design and setting

The study was performed at the Catharina Hospital, Eindhoven, the Netherlands, a 696‐bed teaching hospital that used CS‐EZIS (version 5.2, Chipsoft B.V., Amsterdam, the Netherlands) for its EHR system. This EHR system was implemented in stages, launching in 2008 and adopting paperless recording from 2015 onwards. Medical records before 2008 are available (as scanned PDFs) as part of the multimedia module. Within the EHR system there are distinct modules. For example, a CPOE module, a module for structured ADR registration, a CDSS module and a module for free‐text EHR notes. Inside the free‐text EHR module, different types of EHR notes may be distinguished (eg, physician notes, nursing notes, pathology notes, radiology notes and operation notes). An EHR note is registered at a specific time and could contain multiple entries such as medical history, physical examinations, additional findings, summaries and therapeutic plans. Figure 1 provides a graphical representation of the EHR structure. To supply additional, medication‐related, clinical decision support the hospital uses Gaston Pharma (Gaston Medical, Eindhoven, the Netherlands), which is linked to the EHR database.

FIGURE 1

On the left is a graphical representation of the EHR including the different modules. The free‐text notes included from the different modules are marked grey. On the right is an example of a free‐text EHR note with two potential ADRs

Stage I: Identification of search strategies

To discover strategies for identifying ADRs, the EHRs of 10 random patients from internal medicine and geriatric departments were manually screened and supplemented with strategies devised by the researchers. ADRs retrieved from the manual review were categorized into key identification strategies. These were subsequently fine‐tuned by adding different words with the same meaning, commonly used abbreviations for these words, and typing and common spelling errors. Based on the false negatives, letter combinations or text strings were identified which were to be ignored, followed by variations, abbreviations, typing and spelling errors. The strategies were programmed into a rule‐based model using the available CDSS, Gaston Pharma. The model output included a text string containing the identified keywords (determined by the strategy), the entire free‐text EHR note and the EHR notes without the disregarded text strings. One output could contain one or more ADR.

Stage II: Inclusion of patients' EHRs

The performance of the rule‐based model was assessed using 45 additional EHRs, which were compared to a manual EHR review. The EHRs of 45 consecutive patients were included in the study when the patients were hospitalized for over 24 hours to the departments of geriatrics (15), internal medicine (15) or oncology (15). The inclusion order was based on reverse chronological discharges before 1 June 2018. A complete history of the free‐text EHR notes was included. Scanned or imported (PDF) documents were excluded.

Stage II: EHR reviews, definitions and classification

The manual EHR review was performed independently by two assessors (a clinical pharmacist and a physician in training) using a predefined protocol, included in Supporting Information. The EHRs were searched for free‐text notes containing potential ADRs. The ADRs were defined according to the World Health Organization (WHO): “a response to a drug which is noxious and unintended, and which occurs at doses normally used in man for the prophylaxis, diagnosis, or therapy of disease, or for the modifications of physiological function”. Type A‐‐D potential ADRs were included: type A, augmented pharmacological effects; type B, bizarre, including allergic and nonimmune drug sensitivities; type C, chronic effects; and type D, delayed effects, including carcinogenesis and teratogenesis. In cases where the two assessors did not reach a consensus, a third assessor (a member of the Dutch Pharmacovigilance Centre, LAREB) gave the final decision. Symptoms or diseases with multifactorial causes, including medication, were included as potential ADRs (eg, “hyponatraemia due to malnutrition and hydrochlorothiazide use”). Duplicate entries (ie, the same ADR occurring during the same hospitalization) were not included. reADRs were scored separately. An ADR was considered recurrent if the medication was represcribed or the ADR occurred during a separate hospitalization. Free‐text and CPOE were not searched to find out if special measures were taken to modify risk of recurrence when represcribed (eg, dose reduction). Anatomical Therapeutic Chemical (ATC) classification was used to code the medication associated with the ADR. An ADR having more than one medication as the possible cause was included as a single ADR with all separately coded medications. In contradiction to pharmacovigilance requirements, ADRs without specific drug names mentioned, but with a drug group mentioned (eg, hyponatraemia due to antidepressant use) were included in the study, as these can still present important additions to the care process and medical history. The rule‐based model used the Dutch G‐standard database, including all generic medicine names, trade names and group names registered in the Netherlands. The ADRs were classified using the Medical Dictionary for Regulatory Activities (MedDRA version 23.1). MedDRA provides validated standardized hierarchical structure terminology, which is used by regulatory authorities, post‐marketing pharmacovigilance institutes and pharmaceutical manufacturers. The ADRs were classified using the lowest hierarchy, being the lower‐level term, while the preferred term, used in the summary of product characteristics (SmPC), was matched to obtained references to ADRs. For example, the lower‐level terms tingling of extremity, pins and needles and peripheral neuropathy all fall within the preferred term paraesthesia. ADRs were categorized as potentially serious using the corresponding European Medicines Agency's Important Medical Events list. The causality of the ADRs was assessed by a clinical pharmacist trained in pharmacovigilance using the Naranjo algorithm. Only ADRs with a Naranjo score of ≥1 were included.

Stage II: Data collection

At the moment of hospitalization, characteristics such as gender, age, total medications (including over‐the‐counter medications) and treatment specialisms were collected from the patients' EHRs. Medical history and laboratory results were collected to calculate the Charlson Comorbidity Index. Moreover, the following information was collected to characterize the data: the number of hospitalizations (≥24 hours) and ambulant visits (including hospitalizations <24 hours), medical specialisms, record history, the number of EHR notes and the number of words (calculated using spaces) and characters used. The following data was collected for each EHR note containing ADRs: ADRs, medication involved, search strategy, surrounding paragraph or context (including the space between words, date, form and type of healthcare professional). Research Manager (Cloud9, Deventer) was used to record, edit and save the anonymized data. Venn diagram plotter version 1.5.5 was used to construct the Venn diagram.

Stage II: Data analysis

If an alert generated by the rule contained multiple ADRs, they were all considered to be identified. True positives (TPs) were ADRs identified by the manual and rule‐based EHR reviews. False positives (FPs) were identifications by the rule‐based EHR review but not the manual review. False negatives (FNs) were ADRs not identified by the rule‐based EHR review. Sensitivity was calculated as TPs/(TPs + FNs). The positive predictive value (PPV) was calculated as TPs/(TPs + FPs). FNs and FPs were further analysed to improve the applied search strategies and search for additional strategies to improve future versions of the tool. FPs were also analysed to provide recommendations for improving the list of disregarded text strings and context.

RESULTS

Based on the 10 EHR records, five key strategies for identifying EHR notes containing ADRs were identified (Table 1). Supporting Information Table S1 provides a full overview, including disregarded text strings and original Dutch words. The first strategy (S1) used keywords implying one or multiple ADRs, including conjugations of (a) drug‐induced, (b) allergy, (c) side effect, (d) intolerance, (e) reaction and (f) toxicity. The second strategy (S2) included a search of 13 different prepositions followed by drug groups, names, therapies or their abbreviations. An example could be “pins and needles after FOLFOX cycle.” Supporting information Table S1 provides a full list of the added abbreviations used in S1 and S2. The third strategy (S3) used free‐text entries titled allergy and anaphylaxis. Such free‐text entries were used when the ADR module was introduced in 2015. The fourth strategy (S4) searched the complication registration module for drug‐related complications. The final strategy (S5) searched for ADRs registered in the ADR module, including coded and free‐text entries.

TABLE 1

Summary of ADR identification strategies used in the rule

Number	Search strategies		Included trigger words ^e
S1	Keywords implying an ADR	Conjugations of	Drug‐induced
			Allergy
			Side‐effect
			Intolerance
			Toxicity
			Reaction
S2a	Prepositions followed closely ^b by a drug group ^c , a generic drug name, a drug brand, trade name or abbreviation a drug ^c or drug therapy ^d	Conjugations of	By
			With
			After
			Of
			On
			Since
S2b	Abbreviations using the included prepositions	Conjugations of	a.r. (as a result)
			b.o. (based on)
			a.c.o. (as a consequence of)
S3	Content of forms labelled ^a	Conjugations of	Allergies:
S3	Content of forms labelled ^a	Conjugations of	Anaphylaxis:
S4	Content of complication registration containing key field drug‐induced	…	…
S5	Content of ADR module	…	…

Notes: The maximum number of characters (ie, proximity between a preposition and a drug name) was set at 16.

Abbreviations: ADR, adverse drug reaction.

Forms labelled allergy and anaphylaxis using free‐text entries were employed prior to the introduction of the ADR module in 2015.

The maximal number of characters between the preposition and the drug name was 16.

The drug group names were based on the Anatomical Therapeutic Chemical (ATC) therapeutic subgroup, pharmacological subgroup, chemical subgroup or chemical substance (ie, second to fifth levels of ATC main groups classified by WHO).

Examples are PPI (proton‐pump inhibitor), HCTZ (hydrochlorothiazide), FOLFOX (combination therapy of fluorouracil and oxaliplatin). A full list of abbreviations is provided in Table S2.

English translations of the trigger words are presented here; Dutch trigger words are presented in Table S1.

Summary of ADR identification strategies used in the rule Notes: The maximum number of characters (ie, proximity between a preposition and a drug name) was set at 16. Abbreviations: ADR, adverse drug reaction. Forms labelled allergy and anaphylaxis using free‐text entries were employed prior to the introduction of the ADR module in 2015. The maximal number of characters between the preposition and the drug name was 16. The drug group names were based on the Anatomical Therapeutic Chemical (ATC) therapeutic subgroup, pharmacological subgroup, chemical subgroup or chemical substance (ie, second to fifth levels of ATC main groups classified by WHO). Examples are PPI (proton‐pump inhibitor), HCTZ (hydrochlorothiazide), FOLFOX (combination therapy of fluorouracil and oxaliplatin). A full list of abbreviations is provided in Table S2. English translations of the trigger words are presented here; Dutch trigger words are presented in Table S1.

Stage II: Patient and data characteristics

Table 2 presents the patient and data characteristics included in the 45 EHRs. The mean age of the patients was 68 years (range 21‐92) and 64.4% were female. During the most recent hospitalization, patients had a median Charlson Comorbidity Index score of 5 (range 0‐13) and used a median of eight (range 0‐20) different medications. Patients had a median of three hospitalizations (range 1‐39) and 60 ambulant visits (ie, hospital stay <24 hours) (range 2‐433), resulting in a median medical history of 7.4 years (range 0.01‐18). The median number of free‐text EHR notes per patient was 585 (range 41‐2820). These were formed of a median of 41 921 words (range 4070‐259 750) constructed by a median of 449 179 (22 027‐2 594 750) characters. This resulted in approximately 35 000 free‐text EHR notes for review.

TABLE 2

Patient and data characteristics

Variable		Range
Mean age in years	68	21‐92
Female (%)	29 (64.4)	n/a
Variable	Median	Range
Charlson comorbidity index at last hospitalization	5	0‐13
Unique medication used at last hospitalization (n)	8	0‐20
Hospitalizations ^a	3	1‐39
Ambulant visits ^b	60	2‐433
Medical record history (years)	7.4	0.01‐18
FT EHR notes per patient	585	41‐2820
Words ^c per patient	41 921	4070‐259 750
Characters per patient	449 179	22 027‐2 594 750

Abbreviations: EHR, electronic health record; FT, free text.

Hospitalizations were >24 hours; hospitalizations <24 hours were included as ambulant visits.

Ambulant visits included telephone and video consultations.

The number of spaces was used to estimate the number of words.

Patient and data characteristics Abbreviations: EHR, electronic health record; FT, free text. Hospitalizations were >24 hours; hospitalizations <24 hours were included as ambulant visits. Ambulant visits included telephone and video consultations. The number of spaces was used to estimate the number of words.

Stage II: Inclusion of ADRs

Figure 2 provides a flowchart showing the inclusion of potential ADRs discovered during the manual EHR review. A total of 643 potential ADRs were identified. During matching, 39 potential reADRs were detected and the remaining potential ADRs (n = 269) were identified as duplicates. Excluding the duplicates and reADRs resulted in 326 unique potential ADRs. After excluding eight (n = 8) potential ADRs with a Naranjo score <1, 318 unique ADRs remained.

FIGURE 2

Inclusion and exclusion of potential ADRs. pADRs, potential adverse drug reactions; reADRs, recurrent adverse drug reacions

Stage II: Type of EHR notes

ADRs were found in different types of EHR notes. Most ADRs (68%, n = 216) were cited in physician notes, including ambulant, ER and admission notes. However, 17% (n = 55) of ADRs were only found in nursing notes. The remaining identified ADRs were only found in other types of EHR notes, such as dietician or pharmacist notes. ADRs included in the study were recorded by 206 individual healthcare professionals, dived over 12 medical specialisms.

Stage II: ADR characteristics

The median Naranjo score for the included ADRs was 4 (range 1‐6). Fifteen ADRs were judged probable (score 5‐8) and no ADRs were scored definite (score ≥9). Overall, patients had a median of four ADRs (range 0‐32). The median number of ADRs was six (range 1‐32) in the oncology EHRs and two (range 0‐26) in the internal medicine and geriatric EHRs. Supporting Information Table S3 provides an overview of the number of ADRs per system organ class. A fifth (19.8%, n = 63) of all ADRs were classified as potentially serious. Supporting Information Table S4 provides an overview of all potentially serious ADRs and related medication. Twenty of these were related to chemotherapy, six to myelosuppression, seven to polyneuropathy, three to hepatotoxicity and one to a pulmonary embolism. Serious ADRs not related to chemotherapy were renal failure (n = 6), myelosuppression (n = 5), hepatotoxicity (n = 4) and ileus (n = 2). Cardiac problems were frequently registered, including bradycardia (n = 3), QT prolongation (n = 2), ventricular tachycardia (VT) (n = 1) and cardiovascular collapse (n = 1). One anaphylactic reaction and one case of allergic angioedema were also identified. Most ADRs (87%, n = 278) were stated in the SmPC of a causative medicine. Five percent (n = 14) could have been related to the ADRs cited in the SmPC, albeit not at the preferred term level, 3% (n = 10) of symptoms had no specific drug mentioned in the ADR entry, so no match was possible, and 5% (n = 16) of symptoms had no reference in the SmPC. A few examples of ADRs without reference in the SmPC were paraesthesia due to exemestane, polyneuropathy due to oxycodone and urine retention due to midazolam. Supporting Information Table S5 provides the full list of ADRs with no reference in the SmPC. A total 39 of reADRs were identified, representing over 10% of the identified ADRs, distributed over 17 patients. Twelve of the 39 reADRs were potentially serious, seven of which were associated with chemotherapy. One reADR resulted in an acute hypersensitivity reaction due to represcription during hospitalization.

Stage II: Comparison of rule‐based and manual EHR reviews

The rule‐based model identified 556 potential ADRs, 179 unique identifications matched the ADRs obtained from the manual EHR review and 377 potential ADRs were identified as FPs. Of the 318 ADRs identified in the manual EHR review, 179 were also identified by the rule‐based review, resulting in a sensitivity of 57% and a PPV of 32%. However, the rule identified eight additional ADRs with a Naranjo score ≥1, of which one ADR was classified as serious. Figure 3 presents a Venn diagram of the EHR review methods and the overlap therein.

FIGURE 3

Venn diagram presenting unique adverse drug reactions (ADRs). The blue circle (n = 318), including the green portion, represents the total number of unique ADRs identified by the manual electronic health record (EHR) review. The red circle (n = 556) including the green and yellow portions represents the total number of unique ADRs identified by the rule‐based EHR review (true positives + false positives). The red section (n = 377) represents the false positives. The green section (n = 179) represents the number of true positives. The yellow circle represents ADRs found only by the rule‐based EHR review

Stage II: Analysis of rule‐based EHR strategies

Table 3 presents the TPs, FPs and PPVs for the different search strategies, including their stratifications. The total TPs per strategy is higher than the number of unique ADRs that were correctly identified using multiple search strategies. The rule‐based model correctly identified 179 unique ADRs. Of these 179 ADRs, 159 were identified using only one strategy, 19 were identified using two strategies (S2 with + S1 drug‐induced, n = 7; S2 with + S2 various, n = 7; S1 drug‐induced + S2 of, n = 2; S2 by + S2 of, n = 2; S1 drug‐induced + S1 allergy, n = 1) and four were identified using three strategies (S1 drug‐induced + S1 toxicity + S2 of, n = 2; S1 allergy + S1 reaction + S2 on, n = 1; S2 by + S2 with + S2b a.c.o. [as a consequence of], n = 1), which adds‐up to 206 true positive identifications.

TABLE 3

True positives and false positives per search strategy identifying ADRs

Number	Search strategies	Included trigger words		TPs		FPs		PPV
				n	%	n	%	%
S1	Keywords implying an ADR	Conjugations of	Drug‐induced	20	10	5	1	80
			Allergy	17	8	80	21	18
			Side effect	21	10	10	3	68
			Intolerance	0	0	0	0	n/a
			Reaction	1	0	2	1	33
			Toxicity	13	6	1	0	93
S1 Total/overall ADRs found by keywords				72	35	98	26	42
S2a	Prepositions followed closely ^b by a drug group ^c , a generic drug name, a drug brand, trade name or abbreviation a drug, or drug therapy ^d	Conjugations of	By	16	8	19	5	41
			With	55	27	44	12	56
			After	12	6	67	18	15
			Of	31	15	62	16	33
			On	5	2	73	19	6
			Since	2	1	12	3	14
S2b	Abbreviations using the included prepositions	Conjugations of	As a result of/because of	0	0	1	0	0
			Based on	3	1	0	0	100
			As a consequence of	4	2	1	0	80
S2 Total/overall ADRs found by a combination of predisposition and drug name				128	62	278	74	31
S3	Content of forms labelled ^a	Conjugations of	Allergies:	0	0	0	0	n/a
S3	Content of forms labelled ^a	Conjugations of	Anaphylaxis:	0	0	0	0	n/a
S3 Total/overall ADRs found in labelled allergy and anaphylaxis forms				0	0	0	0	n/a
S4	Content of complication registration containing key field drug‐induced	…	…	0	0	0	0	n/a
S5	Content of ADR module	…	…	6	3	0	0	100
S5 Total/overall ADRs found in ADR module				6	3	0	0	100
Total ADRs identified				206	100	377	100	35

Abbreviations: Adr, adverse drug reaction; FP, false positive; PPV, positive predicted value; TP, true positive.

Forms labelled allergy and anaphylaxis using free‐text entries were applied prior to the ADR module's introduction in 2015.

The maximum number of characters (ie, proximity between a preposition and drug name) was set at 16.

Examples are PPI (proton‐pump inhibitor), HCTZ (hydrochlorothiazide), FOLFOX (combination therapy of fluorouracil and oxaliplatin). A full list of abbreviations is provided in Table S2.

True positives and false positives per search strategy identifying ADRs Abbreviations: Adr, adverse drug reaction; FP, false positive; PPV, positive predicted value; TP, true positive. Forms labelled allergy and anaphylaxis using free‐text entries were applied prior to the ADR module's introduction in 2015. The maximum number of characters (ie, proximity between a preposition and drug name) was set at 16. The drug group names were based on the Anatomical Therapeutic Chemical (ATC) therapeutic subgroup, pharmacological subgroup, chemical subgroup or chemical substance (ie, second to fifth levels of ATC main groups classified by WHO). Examples are PPI (proton‐pump inhibitor), HCTZ (hydrochlorothiazide), FOLFOX (combination therapy of fluorouracil and oxaliplatin). A full list of abbreviations is provided in Table S2. Overall the search strategy using prepositions followed by a drug name or group (S2) accounted for 62% (n = 125) of the identified ADRs, while using keywords (S1) accounted for 35% (n = 72), and only 3% (n = 6) were identified using the ADR module (S5). Within S1 the most effective keyword was toxicity (PPV of 93%), which identified 6% (n = 13) of the ADRs. Less effective, although with a higher yield, were the keywords drug‐induced (PPV 80%, n = 20) and side effect (PPV 68%, n = 21). Within S2, the preposition with the highest yield was with (46 TPs, PPV 32%). The term based on had a PPV of 100%, although it identified only two ADRs. The FPs related to S2 were responsible for 74% of the total ADRs, followed by words forming abbreviations of allergy (21%, n = 80). Naranjo causality score and SmPC reference did not markedly increase or decrease the sensitivity of the rule‐based review, nor did the system organ class or the type of medication. However, ADR potential seriousness increased the sensitivity to 67% (41/66) compared to 55% (138/252) for nonserious ADRs.

Stage II: False negatives analysis

Table 4 shows the analysis and categorization of the false negatives (n = 139). S2 accounted for most the of the false negatives, 41% (n = 57). Within S2, abbreviations of drug names (31.7%, n = 44) were the most common cause and missing abbreviations (4.3%, n = 6) were the second most common cause. Other search strategies using S1 and S3 accounted for six (4.3%) and three (2.2%) false negatives, respectively. Two additional strategies were uncovered from the analysis of the false negatives. The most promising additional strategy (aS6) was usage of MedDRA terms combined with drug names in close proximity to each other (16 characters), which identified an additional 29 ADRs. The second additional strategy was adding abbreviations of ‘cannot tolerate’, which added an additional two positively identified ADRs. For 30.2% (n = 38) of false‐negative ADRs, no simple rule‐based identification strategies were uncovered. ADRs not mentioned in physician notes were less likely to be identified and were responsible for 44% (54/122) of the FNs.

TABLE 4

Analysis of false negatives

Number	Search strategies (n = 139)		n	%
	Missing conjugations of	Drug‐induced	3	2.2
		Allergy	0	0.0
		Side effect	3	2.2
		Intolerance	0	0.0
		Reaction	0	0.0
		Toxicity	0	0.0
S1	Potential improvement: Using keywords implying ADRs		6	4.4
	>16 characters between the preposition and drug name		2	1.4
	Missing synonyms for drug names		44	31.7
	Missing abbreviations		6	4.3
	Missing prepositions		2	1.4
	DD		3	2.2
S2	Potential improvement: Using prepositions followed closely ^a by a drug group ^b , ^d , a generic drug name, a drug brand, trade name or abbreviation a drug, or drug therapy ^c , ^d		57	41.0
S3	Potential improvement: In ADRs found in labelled allergy and anaphylaxis forms		0	0.0
	Specific missing complication fields		3	2.2
S4	Potential improvement: Content of complication registration containing key field drug‐induced		3	2.2
S5	Potential improvement: Found in ADR module		0	0.0
	MedDRA + drug name		3	2.2
	Drug name + MedDRA		24	17.3
	Missing synonym of MedDRA term		2	1.4
aS6	Opportunity for additional strategy: MedDRA term mentioned in text combined with drug name ^d		29	20.9
	Cannot tolerate		2	1.6
aS7	Opportunity for additional strategy: Abbreviations of cannot tolerate		2	1.6
	No obvious additional strategy ^e		42	30.2

Abbreviations: ADR, adverse drug reactions; DD, differential diagnosis.

The maximum number of characters between the preposition and drug was 16.

Drug group names used were based on the ATC therapeutic subgroup, pharmacological subgroup, chemical subgroup or chemical substance (ie, second to fifth levels of ATC main groups classified by WHO).

Examples being PPI (proton‐pump inhibitor), HCTZ (hydrochlorothiazide), FOLFOX (combination therapy of fluorouracil and oxaliplatin). A full list of abbreviations is provided in Table S2.

MedDRA term and drug name are mentioned within 16 characters of each other.

No simple rule‐based strategy was thought of to identify these ADRs.

Analysis of false negatives Abbreviations: ADR, adverse drug reactions; DD, differential diagnosis. The maximum number of characters between the preposition and drug was 16. Drug group names used were based on the ATC therapeutic subgroup, pharmacological subgroup, chemical subgroup or chemical substance (ie, second to fifth levels of ATC main groups classified by WHO). Examples being PPI (proton‐pump inhibitor), HCTZ (hydrochlorothiazide), FOLFOX (combination therapy of fluorouracil and oxaliplatin). A full list of abbreviations is provided in Table S2. MedDRA term and drug name are mentioned within 16 characters of each other. No simple rule‐based strategy was thought of to identify these ADRs.

DISCUSSION

This study describes the first steps in the development of an automated tool for identifying ADRs using free text in hospital EHRs. To our knowledge, this is the first study describing strategies to identify ADRs from a hospital EHR using all types of free‐text EHR notes and including all types of ADRs. Furthermore, it is the first to consider the causality of the potential ADRs. During stage I, the manual review of 10 EHRs, two promising strategies were identified: keywords indicating an ADR and specific prepositions followed closely by medication names. In stage II, 45 complete EHR histories were manually reviewed and compared to strategies built into a rule‐based model. Despite the early development stage, the rule‐based model achieved a sensitivity of 57% and a PPV of 32%. Analysis of the FNs revealed that S1 as well as S2 could potentially be significantly further improved. Studies of previous ADRs involving hospitalizations have demonstrated that each patient handover is accompanied by information loss, particularly during handovers from hospitals to primary care. , This study supports these findings within the same hospital and EHR setting. In 32% of cases, no reference was found in physician notes to ADRs recorded by nurses, pharmacy technicians, pharmacists or other healthcare professionals. These findings also support the hypothesis that focusing on specific EHR notes only partially identifies previous ADRs. Moreover, only 2% of the ADRs had a structured registration, enabling CDSS alerting. Recurrency of ADRs was common: 17 of the 45 patients studied experienced a reADR and one patient had three recurrences. One of these reADRs resulted in an acute hypersensitivity reaction due to unintended represcription during hospitalization. While not formally assessed, at least 10% of the ADRs appeared to have been preventable, with a warning during represcription. Analysis of the FNs revealed possibilities for fine‐tuning discovered strategies, such as extending the library of synonyms and abbreviated medication names. However, the analysis also revealed that additional strategies are needed to achieve the desired sensitivity. An obvious strategy would be to include symptoms and side effects followed or preceded by medication names. While powerful, this strategy may be prone to falsely identifying disease symptoms as an ADR. Moreover, although an extensive, coded database of ADRs was readily available, many of the ADRs were either misspelled, abbreviated or described in such a way that they were not readily identified in the MedDRA database. As with the G‐standard medication database used in the developed tool, the MedDRA database will require extension to include frequently used synonyms and abbreviations. The FP analysis demonstrated that natural language processing techniques are required to understand the context of trigger words, for example recognizing when effects are considered positive (eg, hypernatremia resolved after starting hydrochlorothiazide). However, several simple modifications could potentially significantly reduce FPs. The first would be to extend the library of disregarded text strings; this could reduce the number of FPs resulting from the trigger word allergy in particular. Second, medicine names or abbreviations therefore must be screened to identify those having additional meanings. One of the limitations of the study methodology was that the identification strategies were based on EHRs originating from a single hospital, using one EHR system, while language use may differ between hospitals and regions. Furthermore, only EHRs of patients recently hospitalized to a ward focusing on internal medicine were included. The language used by healthcare professionals may vary according to their specialisms. Scanned documents were discarded, thereby potentially missing ADRs, particularly since referral letters often contain ADR information. These limitations may have resulted in a failure to discover key identification strategies and an overestimation of the rule's performance. Nevertheless, the EHR history contained notes from ambulant visits and hospitalizations related to several medical specialisms (n = 12) and the ADRs were recorded by a large number of diverse healthcare professionals (n = 206). The Naranjo algorithm was used to assess causality of the ADRs. There is, however, much debate on the reliability of this and other algorithms to assess causality because of problems with reproducibility and validity. However, “no method is universally accepted for causality assessment of ADRs”. The Naranjo algorithm was chosen as it is still the preferred method for causality assessment by pharmacovigilance authorities and healthcare professionals in the Netherlands. At least possible (≥0) ADRs were included. It could be argued that only a score of ≥5 or even ≥9 should be used for inclusion. However, the primary aim of the study was to discover strategies identifying EHR notes possibly containing ADRs, and for this purpose it is useful to include all possible ADRs. Excluding type E and F ADRs could be seen as a limitation. Current CDSSs, however, would not be able to generate alerts on E‐ and F‐type ADRs. The first step to further develop the tool would be to translate the search strategies and logic to programming more suitable for natural language processing (eg, Python or R). This process would also create the possibility of adding fuzzy logic and using artificial intelligence techniques such as machine learning. The developed rule‐based model retrieved items of text referring to ADRs, but it did not extract and code the ADRs and associated medication, which would be required to avoid duplicating identified ADRs and is essential before feeding ADRs back to the EHR for use in a CDSS. Therefore, the second step would be to automatically extract and code the ADRs from the identified text strings. Also, for a tool to fully utilize all available free text in the EHR, optical character recognition software must be considered before processing the text. At the back end of the tool, a CDSS could be used to extract valuable information to contextualize the retrieved ADR. For example, if the tool returns hypernatremia due to diuretic, the CDSS can retrieve the specific medication and dose used from the CPOE. After such developments, the tool should be tested on a different hospital EHR to study the generalizability and usability of the tool. Using ADRs registered in free text as input for CDSS to alert physicians would be a considerable advance to reduce the number unintended represcriptions. It is important to consider, however, that there is also an overall underreporting of ADRs by healthcare professionals, therefore the implementation of tools to detect ADRs from free text will never solve the entire problem. Considerable attention should thus also be given to directly improving ADR registration by patients as well as healthcare professionals. Education and electronic reminders can help to improve the feeling of support from social environment and recognition of the importance of correct ADR registration. , Also, improving EHR systems in such a way as to make it easier and less time‐consuming to properly register an ADR can markedly improve registration. Introducing patient self‐reporting within the EHR patient portal would possibly also increase the number of ADRs registered.

CONCLUSION

Two key strategies were developed to identify ADRs from free text in a hospital EHR. These strategies show promise, warranting further study and the development of a tool to alert healthcare professionals to previously experienced ADRs.

CONFLICT OF INTEREST

None declared.

CONTRIBUTORS

A.Wa., A.We,, M.K., R.G. and C.vdL. contributed to the conception and initial design of the study. T.E. and E.K. contributed to the revised study design. A.Wa. and A.We. contributed to the acquisition of the data. N.J. contributed to the performing causality assessment of the potential adverse drug events and validation of MedDRA coding. A.Wa., A.We. and B.vdB. contributed to the analysis of the data and drafting of the manuscript. N.J., E.K., T.E., A.B., M.K., R.G. and C.vdL. critically revised the interpretation and analysis of the data and manuscript. All authors agree to be fully accountable for ensuring the integrity and accuracy of the work. All authors have read and approved the final manuscript. SUPPORTING INFORMATION TABLE S1 Complete overview of Dutch keywords used in the rule‐based model identification of ADRs, including ignored text strings SUPPORTING INFORMATION TABLE S2 Complete list of abbreviations of drugs, drug groups and drug therapies SUPPORTING INFORMATION TABLE S3 ADRs per MedDRA system organ class SUPPORTING INFORMATION TABLE S4 Number of serious ADRs per system organ class and related medication SUPPORTING INFORMATION TABLE S5 Adverse drug reactions found without a reference in the SmPC, sorted by Naranjo score Click here for additional data file.

44 in total

1. Recurrence of adverse drug reactions following inappropriate re-prescription: better documentation, availability of information and monitoring are needed.

Authors: Carolien M J van der Linden; Paul A F Jansen; Rob J van Marum; René J E Grouls; Erik H M Korsten; Antoine C G Egberts
Journal: Drug Saf Date: 2010-07-01 Impact factor: 5.606