| Literature DB >> 26860263 |
George Karystianis1,2, Therese Sheppard3, William G Dixon3,4, Goran Nenadic5,6,7.
Abstract
BACKGROUND: Free-text medication prescriptions contain detailed instruction information that is key when preparing drug data for analysis. The objective of this study was to develop a novel model and automated text-mining method to extract detailed structured medication information from free-text prescriptions and explore their variability (e.g. optional dosages) in primary care research databases.Entities:
Mesh:
Year: 2016 PMID: 26860263 PMCID: PMC4748480 DOI: 10.1186/s12911-016-0255-x
Source DB: PubMed Journal: BMC Med Inform Decis Mak ISSN: 1472-6947 Impact factor: 2.796
Examples of prescription instructions represented in our model
| Prescription | dn_min | dn_max | df_min | df_max | di_min | di_max | dose unit |
|---|---|---|---|---|---|---|---|
| take 2 tablets 4 times a day | 2 | 2 | 4 | 4 | 1 | 1 | tablet |
| 2 tabs qid | 2 | 2 | 4 | 4 | 1 | 1 | tablet |
| a half to one tablet to 2 three times a day when required | 0.5 | 2 | 0 | 3 | 1 | 1 | tablet |
| 10 mg to be taken weekly | 10 | 10 | 1 | 1 | 7 | 7 | mg |
| 2 with each meal | 2 | 2 | 3 | 3 | 1 | 1 | ? |
| take 2.5 ml twice a day | 2.5 | 2.5 | 2 | 2 | 1 | 1 | ml |
| half a tablet twice a day when required | 0.5 | 0.5 | 0 | 2 | 1 | 1 | tablet |
| 2 puffs 6 hrly prn | 2 | 2 | 0 | 4 | 1 | 1 | puff |
| 1 to 3 every day | 1 | 3 | 1 | 1 | 1 | 1 | ? |
| one or two to be taken every 4 to 6 hours | 1 | 2 | 4 | 6 | 1 | 1 | ? |
| take as directed | 1 | ? | ? | ? | 1 | ? | - |
| apply as needed | 1 | 1 | 0 | ? | 1 | ? | - |
dn_min is dose number (minimum), dn_max is dose number (maximum), df_min is dose frequency (minimum), df_max is dose frequency (maximum), di_min is dose interval (minimum), di_max is dose interval (maximum). Additional file 1: Table S1 contains examples of frequent Latin abbreviations
Fig. 1The two-step approach for the extraction of structured dose information from CPRD prescription instructions
Examples of rules for the recognition of dosage attributes in medication data
The rules were implemented in MinorThird [29] and we use its notation here. Only the part in brackets (the string of interest) is being extracted as a mention (i.e., annotation); the rest of the rule (if any) specifies the context/anchors. The rules use explicit matching of spans (e.g., eq(‘times’)), the dictionary matches for single (e.g. a(verb) – matching verbs that indicate the administration of a medication (e.g., take, insert)) and multiword terms (e.g., @period, see Additional file 1: Table S2). Number models numerical expressions including those belonging to the dictionary “number”; a(timeUnitLy) matches the words of the dictionary “timeUnitLy” with prescription text that indicates an adverb of time e.g., “daily”, “weekly”, etc; @perTimeUnit recognises syntactical patterns of the dictionary “perTimeUnit” that contain both numeric and word phrases in prescription text e.g., “four times a day”, “2 times per week”, etc.; a(timeUnit) identifies the words from the “timeUnit” dictionary (see Additional file 1: Table S2)
The accuracy of the medication attribute extraction
| Dosage attribute | True positives (out of 220) | Accuracy (%) |
|---|---|---|
| dose number (minimum) | 211 | 95.9 |
| dose number (maximum) | 210 | 95.4 |
| dose frequency (minimum) | 210 | 95.4 |
| dose frequency (maximum) | 207 | 94.0 |
| dose interval (minimum) | 220 | 100.0 |
| dose interval (maximum) | 217 | 98.6 |
| dose unit | 220 | 100.0 |
| (macro) accuracy | 97.0 | |
Accuracy is shown for each attribute, considered separately. Macro accuracy represents an average of the accuracy values across different attributes
Medication prescription variability in the most common CPRD prescription instructions
| Prescriptions with | Number of such prescriptions (out of 56,114) | Prescriptions percentage |
|---|---|---|
| all medication elements as “?” | 406 | 0.7 % |
| at least one element as “?” | 11,696 | 20.8 % |
| dn_min ≠ dn_max | 6,278 | 11.1 % |
| df_min ≠ df_max | 10,249 | 18.2 % |
| di_min ≠ di_max | 55 | 0.1 % |
| no dose units | 36,111 | 65.4 % |
dn_min is dose number (minimum), dn_max is dose number (maximum), df_min is dose frequency (minimum), df_max is dose frequency (maximum), di_min is dose interval (minimum), di_max is dose interval (maximum)
|
|
|