Literature DB >> 24303296

Categorizing medications from unstructured clinical notes.

Faisal Farooq¹, Shipeng Yu, Vikram Anand, Balaji Krishnapuram.

Abstract

One of the important pieces of information in a patient's clinical record is the information about their medications. Besides administering information, it also consists of the category of the medication i.e. whether the patient was taking these medications at Home, were administered in the Emergency Department, during course of stay or on discharge etc. Unfortunately, much of this information is presently embedded in unstructured clinical notes e.g. in ER records, History & Physical documents etc. This information is required for adherence to quality and regulatory guidelines or for retrospective analysis e.g. CMS reporting. It is a manually intensive process to extract such information. This paper explains in detail a statistical NLP system developed to extract such information. We have trained a Maximum Entropy Markov model to categorize instances of medication names into previously defined categories. The system was tested on a variety of clinical notes from different institutions and we achieved an average accuracy of 91.3%.

Entities: Chemical Disease Gene Species

Year: 2013 PMID： 24303296 PMCID： PMC3814480

Source DB: PubMed Journal: AMIA Jt Summits Transl Sci Proc

Introduction

Medications form an important entity in the documentation of patient encounters. The information about medications consists of administering information (dosage, route etc.) as well as temporal information (taking at home, administered in ED, prescribed at discharge etc.). This information is indispensable for various purposes notably diagnoses, treatment, retrospective analysis for quality and clinical adherence, research, clinical trials and so on. For example, Center for Medicare and Medicaid Services (CMS) requires hospitals to report the adherence to guidelines for the various quality indicators (Heart Failure, Pneumonia, Acute Myocardial Infarction and Surgical Care). Amongst these are metrics like whether a pneumonia (PN) patient received an antibiotic within a defined time period of arrival or documentation of not doing so (e.g. if the patient was on an antibiotic at home just prior to arrival). Thus, it is important to know the category of any medication that is identified in the clinical record of a patient. Some of the common categories are Home Medications, Between Arrival and Admission (e.g. ED), During Stay and Discharge. However, much of this information is available in unstructured (free) text. Typically, this information is embedded deep into the text in various types of documents. In spite of the recommendations to push this information into structured data, much of it is still unstructured. This is often due to the fact that medications are reconciled (home, hospital stay etc.) at discharge time e.g. by stopping, continuing or changing. This reconciliation information is easy to enter using spoken text. It is a manually intensive process to read and extract such information from a record. This paper explains in detail the system developed to extract such information. After briefly discussing the related work, we present the data, our methodology and the results obtained.

Related Work

Information extraction from clinical text has recently received a lot of attention. Researchers are applying machine learning and natural language processing for text mining in systems. Friedman et al3 discuss the potential of using NLP techniques in the medical domain, and also provide a comparative overview of the state-of-the-art NLP tools applied to biomedical text. Literature 4,5,6 provides a survey of various approaches to information extraction from biomedical text including named entity tagging and extracting relationship between different entities and between different texts. In general the systems that have been designed by various researchers can be classified into two broad categories i) Rule Based and ii) Learning Based. Rule based systems8,9,10 encode knowledge into grammars, if-else rules and regular expressions. They can be easy to implement, however, suffer from major drawbacks like hard maintainability and very low generalization. The rules engine is hard to maintain and number of rules can increase exponentially. On the other hand learning based systems11,12 take advantage of data and the inherent statistical knowledge. They do not require any hand crafting of rules and are easy to maintain. However, they require hand labeled data on which these systems can be trained which can be laborious to create. Given the nature of the problem where it is not feasible to encode diverse rules into a small subset, it is better, however to invest one-time effort in creation of such datasets (often called ground truth). Researchers have also focused on medication information extraction. Jagannathan et al7 assessed four commercial NLP engines for their ability to extract medication information. Uzuner et al13 describe various systems that were built as part of the i2b2 challenge. These systems, however, were dedicated to extract information such as medication name, dose, route, frequency etc. In this current work, we focus on the aspect of automatically classifying the medications into 5 categories: 1) Home, 2) Between Arrival and Admission, 3) During Stay, 4) Discharge and 5) None. Where as the first 4 categories are self explanatory, we defined the fourth category to account for the fact that sometimes, the medication names may be mentioned in some other context e.g. orders and allergies. Table 1 provides some real examples of medications in their corresponding contexts. We propose a maximum entropy based learning method which uses statistical and natural language features that are easy to extract and do not require high degree of domain expertise. The method is highly scalable and can be easily trained as more data is available.

Table 1.

Example of medication instances (

Category	Examples
Home	Had Z-Pak previously. No other home medications.
	HOME MEDICATIONS:1. Bactrim.
	levofloxacin for two weeks
Between Arrival and Admission	ED COURSE: Considered the differential for PE despite a therapeutic INR on coumadin.
	IV levaquin infusing from ED
	This patient was given an aspirin initially upon arrival.
During Stay	Ancef hung at this time
	IV Zosyn done infusing
	times a day, Lovenox 40 mg subcutaneous daily,
Discharge	This nurse called into MyLocalDrugStore on Month 99, 2011, Levaquin
	discharge instructions gone over with patient and stated understanding. prescriptions for Levaquin, Augmentin
	Family requesting oral antibiotic for patient to take post discharge. This nurse spoke with Dr. Doe who wrote
None	Screening for methicillin resistant staphylococcus aureus was positive.
	I will give the patient p.o. Zithromax because of her pain during infusion and will give her Exelon for her
	ALLERGIES: Penicillin and iodine.
	patietn agreeable to receiving primaxin via IV
	He gets hives with bactrim

Data Collection and Annotation

To account for variety of data, we collected data from four different institutions. Due to confidentiality agreements, we are unable to divulge the names of these institutions. However, it is worth mentioning that all of these hospitals are medium to large size institutions and are geographically very distant from each other within continental US. The documents were extracted for one quarter worth of patient visits and represented various types viz., ED Records, H&Ps, Consultation Reports and Discharge Summaries etc. To create the ground truth data, we developed an annotation tool to visualize and label data. This tool loaded one document at a time and highlighted single instances of medications identified. Since, our aim was to develop a module that could be used in the quality adherence space as required by CMS, we used a list of medication provided by CMS in its guidelines16. Alternatively more comprehensive lists2 or automated methods can also be used for this step. The tool performed simple string matching (accounting for minor spelling variation) on the tokenized document in order to locate the instances of medications. Then these instances would be highlighted one at a time and the annotator (a clinical expert e.g. a nurse) could make the appropriate selection (using a radio button corresponding to each class) and click ‘Next’ to go to the second instance and repeat till the end of the document. This process was repeated for each document in the database till we had 560 documents from four different institutions (almost evenly distributed into the four document types mentioned above). After labeling, we had around 1200 instances of medications in these documents, thus averaging 2.14 meds per document with a variation of between zero to 20 medications in a single document.

Feature Extraction

Before extracting any features from the data, we passed it through a preprocessing module. This module performed tokenization, sentence boundary detection and stop word removal. It also performed class grouping e.g. all numbers (except dates) were replaced by $NUM, all measurements mg, cm, ml into $QUANT and all instances of medication names found by the system into $MED. The feature extraction was guided by some basic principles. We did not want features that would be extremely hard to generate or would be specific to institutions or document types. In addition, we wanted features that would generalize across languages and take the language structure into account. Lastly, we wanted features that a clinical expert would understand and often use in their judgment (sometimes unknowingly). Some of the features classes that we experimented with are described below.

Context:

The phrasal context (or the words surrounding) the medication ($MED instance in this case) often provide information about its group. For example, “She is allergic to $MED and $MED”, “Patient received antibiotics while in the emergency department. This consisted of both intravenous $MED and $MED”, “$MED IV complete that was hung in ER”. As we note in these cases, the tokens in italics often describe the context of the medication. This is applied to often referred to as n-gram features. Whereas classically n-grams mean left-context of the token in question, there is no reason to limit the scope and we extracted n-gram features from either sides of the $MED instance. Needless to mention, we performed stemming and canonical representation of the tokens while extracting these features.

Tense and Negation:

In order to supplement the context (token) level features with richer knowledge, we also extracted tense (past, present, future) and negation information using the Stanford Parser18. Worth mentioning is the fact that examples like “We will give the patient $MED” are labeled by the clinical experts as class 5 (None). The reason is that this is ‘intended’ rather than ‘performed’. In many situations, this can change and the medication may actually never be administered. Thus, this feature proves to be extremely beneficial.

Part of Speech:

One of the most common ‘language’ features is the part of speech (POS) information which splits the tokens in a sentence into the part of the spoken text based on grammatical rules. For example a sentence,” We are going to put the patient on $MED” would yield the following POS information We/PRP are/VBP going/VBG to/TO put/VB the/DT patient/NN on/IN $MED/NNP ./. We used Mayo Clinic’s cTakes1 POS tagger. In addition, we also extracted the phrase-level information such as Noun Phrase(NP), VP (Verb Phrase) etc. using the cTakes chunker. The former is often referred to as deep parsing and the latter as shallow parsing.

Document metadata:

The document metadata such as document type and document date often provides useful insights about the content of the document. For example, in case a medication instance is described as being administered two days ago and the document is an ED Report, then most likely that instance of the medication refers to a medication being taken at home. Whereas documents like Discharge Summaries, Consultation Reports, and Progress Notes etc. represent a uniform distribution of all 5 classes, some document types are more representative of some classes than others. For example, an ED Report mostly represents Classes 1, 2 and 5 and an H&P mostly represents 1, 2, 3 and 5. Sometimes, the document type often acts as a tie breaker. As an illustration, without knowledge of document type, the instance in the statement “The patient was given $MED”, it can equally represent Classes 1, 2 and 3. However, if this statement is found in an ED Record, then the tie breaks in favor of Class 2. Thus, using the document metadata would prove useful information. In order to account for unknown document types, we added a category Unknown to the types of documents.

Heading:

A document is often divided into sections with each section starting with a heading. For example, KNOWN ALLERGIES: Amiodarone Hydrochloride, Niacin HOME MEDS: Tylenol, Levaquin, Insulin Even though this is useful information, it is often not easy to detect headings. We used simple rules like uppercase, title case, semi-colon, paragraph etc. to detect certain tokens are headings and used it as a feature. A tribute to statistical systems is that all features are weighted by the learning algorithm and if we are not sure of the feature extracted, it is automatically assigned a low probability. All of these groups of features were used to extract numerous features (often thousands to tens of thousands) and fed into a learning algorithm for classification. The next section describes the learning and classification algorithm we used in the system.

Classification

We used the maximum entropy markov models for classification. The motivating idea behind maximum entropy is that the most uniform model should be preferred when no information is present and whatever information is present should constrain that uniform model. Thus, the Maximum Entropy model prefers the most uniform model satisfying the constraint λi’s represent parameters that are estimatoed during the course of the Maximum Entropy training. The fi’s are any real-valued functions describing features of the (feature, class) relationship, F represents the features extracted, and c represents a medication category (1–5). Since the Maximum Likelihood training overfits (Nigam et al.17), we use a Maximum Aposteriori training with Gaussian priors over feature functions. The prior probability of the model is the product of Gaussians of each feature value λi with variance . The training has been implemented using the Improved Iterative Scaling (IIS) algorithm which is of the Quasi-Newton family of numerical algorithms. We used the openNLP MaxEnt system for training and classification with the setting as mentioned above.

Experiments and Results

The data was split into training and test set. We did not have any separate validation (parameter tuning) sets. This was taken care of by the default configuration in the openNLP toolkit. We performed a wide variety of experiments to validate impact of features, generalization and scalability. They are described as follows:

Feature Validation:

To evaluate each individual feature, we split the data evenly into training and test. We started off with the simplest context (ngram) feature described in the above section and trained our model following which we evaluated the model performance on the test data. After that, we started adding one feature group at a time and evaluate the differential gain. We added the feature groups in the following order: a) Context, b) +POS, c) +Tense, d) + Negation, e) +Metadata, f) +Heading. Figure 2 shows the gain in the accuracies by adding a feature group to the existing models. Note that even though the actual accuracy numbers would certainly depend on the order in which the features are added, the average gains would be similar.

Leave One Out (Document Type):

In this case, we trained on all the data except the data corresponding to one document type. We tested on the held out data (corresponding to a document type never encountered during training). We repeated this such that each document type was held out at least once and then calculated the average of the accuracy scores. The average accuracy in this setup was 83.9%.This experiment gives us an idea of generalization within an institution in case unseen document types are encountered.

Leave One Out (Institution):

Our next experiment was aimed at estimating the generalization across institutions. We held out all the data from one institution and trained on the other 3. We evaluated the trained model on the held out institution. We repeated the experiment till every institution was held out and the averaged the accuracy scores. The average accuracy in this setup was 82.9%.

Overall Results:

The final experiment we conducted was randomly splitting the data into a training set (75%) and a test set (25%) and performing a cross-fold validation. The average accuracy was 91.3 ± 2.11%.

Discussion

We achieved encouraging results using the system. In this section, we will also present some cases that we believe the system should be able to identify correctly, but is not doing so (1–2 in Table 2). Such cases show that there is some room for improvement. However, we will also show examples of some cases which are extremely hard (3–4 in Table 2). Example 3 is classified as Class 5 because it is an Order, which would make it correct. However, according to the experts since there is a second Signed entry by a nurse, this means the nurse actually administered the medication.

Table 2

Example failed cases (Med is in bold italics. O is system out and GT is groundtruth)

	Example	O	GT
1.	$MED was not started here in the ER because the patient is given $MED for an unknown reason	2	1
2.	MEDICATIONS: $MED, $MED, $MED (which she stopped 3 days ago), $MED	1	5
3.	Order: $MEDSigned: Dr. John Doe MM/DD/YYYY HH:MM:SSSigned: Jane Doe, RN MM/DD/YYYY HH:MM:SS	5	2
4.	HOME MEDS: 1. $MED, 2. $MEDShe is a non-compliant patient and does not take her meds.	1	5

Conclusion and Future work

In this paper, we presented a system that classifies an instance of a medication into one of the pre-defined categories. We used statistical natural language processing and machine learning to address the problem and indeed achieved very encouraging results. The rationale for this system is CMS quality reporting where many quality indicator questions to be reported require this information. This is being performed manually in many hospitals. Even though this use case in itself is large enough to warrant addressing the challenge, the system can be used in many other cases also. Guideline adherence to best practices (e.g. aspirin within 24 hrs), support to med reconciliation tools, interactive discharge summaries etc. are other use cases for such a system. As part of the future work, we plan to improve the detection of headings (maybe make it stochastic) and also are looking into decomposing a document into logical sections which will prove useful features for the system.

6 in total

1 in total

Review 1. Can antiepileptic efficacy and epilepsy variables be studied from electronic health records? A review of current approaches.

Authors: Barbara M Decker; Chloé E Hill; Steven N Baldassano; Pouya Khankhanian
Journal: Seizure Date: 2021-01-13 Impact factor: 3.184