| Literature DB >> 31674914 |
Irena Spasic1, Dominik Krzeminski2, Padraig Corcoran1, Alexander Balinsky3.
Abstract
BACKGROUND: Clinical trials are an important step in introducing new interventions into clinical practice by generating data on their safety and efficacy. Clinical trials need to ensure that participants are similar so that the findings can be attributed to the interventions studied and not to some other factors. Therefore, each clinical trial defines eligibility criteria, which describe characteristics that must be shared by the participants. Unfortunately, the complexities of eligibility criteria may not allow them to be translated directly into readily executable database queries. Instead, they may require careful analysis of the narrative sections of medical records. Manual screening of medical records is time consuming, thus negatively affecting the timeliness of the recruitment process.Entities:
Keywords: clinical trial; electronic medical records; eligibility determination; machine learning; natural language processing
Year: 2019 PMID: 31674914 PMCID: PMC6913747 DOI: 10.2196/15980
Source DB: PubMed Journal: JMIR Med Inform
Figure 1The three premarketing phases of a clinical trial.
Description of the eligibility criteria, as provided in the annotation guidelines used for the National Natural Language Processing Clinical Challenge shared task.
| ID | Criterion | Time period | Default |
| ABDOMINAL | Intra-abdominal surgery, small or large intestine resection, or small bowel obstruction | Any | Not met |
| ADVANCED-CAD | Advanced artery disease | Present | Not met |
| ALCOHOL-ABUSE | Alcohol use exceeds weekly recommended limits | Present | Not met |
| ASP-FOR-MI | Use of aspirin to prevent myocardial infarction | Any | Not met |
| CREATININE | Serum creatinine is above the upper limit of normal | Any | Not met |
| DIETSUPP-2MOS | Use of dietary supplements (excluding vitamin D) | Past 2 months | Not met |
| DRUG-ABUSE | Drug abuse | Any | Not met |
| ENGLISH | Speaks English | Any | Met |
| HBA1c | Glycated hemoglobin value is between 6.5 and 9.5 | Any | Not met |
| KETO-1YR | Diagnosed with ketoacidosis | Past year | Not met |
| MAJOR-DIABETES | Major diabetes-related complication | Any | Not met |
| MAKES-DECISIONS | Able to make decisions for themselves | Present | Met |
| MI-6MOS | Myocardial infarction | Past 6 months | Not met |
Figure 2System architecture.
A selection of rule-based punctuation removal examples.
| Rule target | Input | Output |
| Prescription |
q. a.m. q. Sunday tab. |
qam q Sunday tab |
| Vitamin |
vit. D MVit. |
vit D MVit |
| Personal title |
Dr. Harold Nutter Harold Nutter, Ph.D. |
Dr Harold Nutter Harold Nutter, PhD |
| Shorthand x |
hx. of migraines sx. of depression Rx. for cpap |
hx of migraines sx of depression Rx for cpap |
| Species name |
E. coli C. diff H. pylori |
E coli C diff H pylori |
Examples of text normalization.
| Example | Surface forms | Normalized form | Relevance |
| 1 | mom, father, sister | family member | filtering |
| 2 | FH, FHx, FamHx | family history | filtering |
| 3 | whiskey, vodka, beer | alcohol | ALCOHOL-ABUSE |
| 4 | Lantus, Humalog, NPH | insulin | MAJOR-DIABETES |
| 5 | DM2, DMII, NIDDM | diabetes mellitus 2 | MAJOR-DIABETES |
| 6 | CRRT, CRRTX | continuous renal replacement therapy | MAJOR-DIABETES |
| 7 | ARF | acute renal failure | MAJOR-DIABETES |
| 8 | CKD | chronic kidney disease | MAJOR-DIABETES |
| 9 | BB, bblocker, betablocker | beta blocker | ADVANCED-CAD |
| 10 | ECG, EKG | electrocardiogram | ADVANCED-CAD |
| 11 | ICD | implantable cardioverter defibrillator | ADVANCED-CAD |
| 12 | CVD | cardiovascular disease | ADVANCED-CAD |
| 13 | MI, heart attack | myocardial infarction | MI-6MOS, ASP-FOR-MI, ADVANCED-CAD |
| 14 | STEMI | ST elevation myocardial infarction | MI-6MOS, ASP-FOR-MI, ADVANCED-CAD |
| 15 | ASA, ECASA | aspirin | ASP-FOR-MI |
Examples of word abstraction.
| Example | Surface forms | Semantic type | Abstraction | Relevance |
| 1 | marijuana, heroin, ecstasy | Pharmacologic substance | Illicit drug | DRUG-ABUSE |
| 2 | beta blocker, nitroglycerin, CCB | Pharmacologic substance | Heart medication | ADVANCED-CAD |
| 3 | crestor, advicor, compactin | Pharmacologic substance | Statin | ADVANCED-CAD |
| 4 | vitamin C, calcium, primrose oil | Pharmacologic substance | Supplement | DIETSUPP-2MOS |
| 5 | turmeric, green tea, cinnamon | Food | Supplement | DIETSUPP-2MOS |
| 6 | vodka, beer, wine | Food | Alcohol | ALCOHOL-ABUSE |
Rule-based feature extraction.
| Tag | Feature | Extractiona | Examplesb |
| MEDRX | Prescription instructions | Regular expressions | po q4h prn |
| KIDMED | Kidney medication | Lexicon (221 entries)c | Thymoglobulin |
| BRPMED | Blood pressure medication | —d | Avapro |
| HRTMED | Heart medication | — | Plavix |
| HRTTRT | Heart treatment | Regular expressions | Re |
| HRTISC | Heart ischemia | Regular expressions | Electro |
| HRTANG | Angina | Regular expressions | Chest wall heaviness |
| HRTCAD | Any of the HRT tags above + explicit references to CAD | Regular expressions | Given his extensive cardiac history |
| ASPFMI | Aspirin for heart problems | Regular expressions | Start on heparin |
| SPLMNT | Supplement (strong evidence) | Lexicon (67 entries) + regular expressions | Ibuprofen 800 mg |
| DFCNCY | Supplement (weak evidence) | Lexicon (27 entries) + regular expressions | |
| MNTCAP | Mental capacity | Regular expressions | Increasing |
| DRGADD | Substance abuse | Lexicon (17 entries) + regular expressions | History of |
| NOENGL | Does not speak English | Lexicon (66 entries) + regular expressions | An |
| ALCABS | Alcohol abuse | Lexicon (7 entries) + regular expressions | |
| ALCSTP | Stopped drinking alcohol | Regular expressions | Alcoholism 10 |
| KETACD | Ketoacidosis | Regular expressions | Ketones positive |
| KIDDAM | Kidney problems | Regular expressions | |
| DMCMPL | Diabetic complications | Regular expressions | |
| ABDMNL | Abdominal surgery or small bowel obstruction | Regular expressions | Gastric |
| HIGHCRT | High creatinine | Regular expressions + information extraction | Blood urea nitrogen/ |
| GLYHMG | Glycated hemoglobin in a given interval | Information extraction |
aAll lexicons and regular expressions are available from the c2s2 GitHub repository [44].
bItalic typeset is used to indicate the types of text features targeted by lexicons and regular expressions.
cKIDMED, BRPMED, HRTMED are organized into a single lexicon of 221 entries.
dNot applicable.
Figure 3Distribution of class labels.
Features used in rule-based classification.
| ID | Features |
| ALCOHOL-ABUSE | ALCABS, ALCSTP |
| DRUG-ABUSE | DRGADD |
| ENGLISH | NOENGL |
| KETO-1YR | KETACD |
| MAKES-DECISIONS | MNTCAP |
| MI-6MOS | BRPMED, HRTMED, HRTTRT, HRTISC, HRTANG, HRTCAD, ASPFMI |
Figure 4Summary of cross-validation results. SVM:support vector machines; LR: logistic regression; NB: naïve Bayesian; GTB: gradient tree boosting; HBA1c:glycated hemoglobin.
Detailed holdout test results.
| ID | Meta | Not meta | Overall | Baselineb | c2s2c | ||||||||
| Pd (%) | Re (%) | Ff (%) | P (%) | R (%) | F (%) | F (%) | F (%) | System | Rank | ||||
| ABDOMINAL | 64.86 | 80.00 | 71.64 | 87.76 | 76.79 | 81.90 | 76.77 | 90.64 | Rules | 4 | |||
| ADVANCED-CAD | 83.02 | 97.78 | 89.80 | 96.97 | 78.05 | 86.49 | 88.14 | 88.14 | c2s2 | 1 | |||
| ALCOHOL-ABUSE | 22.22 | 66.67 | 33.33 | 98.70 | 91.57 | 95.00 | 64.17 |
| Hybrid | 2 | |||
| ASP-FOR-MI | 87.67 | 94.12 | 90.78 | 69.23 | 50.00 | 58.06 | 74.42 | 77.34 | HNNg | 2 | |||
| CREATININE | 80.00 | 83.33 | 81.63 | 93.44 | 91.94 | 92.68 | 87.16 | 89.75 | Rules | 2 | |||
| DIETSUPP-2MOS | 78.85 | 93.18 | 85.42 | 91.18 | 73.81 | 81.58 | 83.50 | 89.53 | Hybrid | 4 | |||
| DRUG-ABUSE | 40.00 | 66.67 | 50.00 | 98.77 | 96.39 | 97.56 | 73.78 |
| Hybrid | 2 | |||
| ENGLISH | 91.25 | 100.00 | 95.42 | 100.00 | 46.15 | 63.16 | 79.29 | 97.66 | Hybrid | 4 | |||
| HBA1c | 100.00 | 82.86 | 90.62 | 89.47 | 100.00 | 94.44 | 92.53 | 93.82 | Rules | 2 | |||
| KETO-1YR | 0.00 | 0.00 | 0.00 | 100.00 | 100.00 | 100.00 | 50.00 |
| All | 1 | |||
| MAJOR-DIABETES | 85.00 | 79.07 | 81.93 | 80.43 | 86.05 | 83.15 | 82.54 | 86.02 | Hybrid | 2 | |||
| MAKES-DECISIONS | 97.62 | 98.80 | 98.20 | 50.00 | 33.33 | 40.00 | 69.10 |
| HNN | 2 | |||
| MI-6MOS | 33.33 | 50.00 | 40.00 | 94.59 | 89.74 | 92.11 | 66.05 |
| Rules | 4 | |||
| Overallh (microaveraged) | 83.97 | 91.29 | 87.47 | 93.54 | 87.86 | 90.61 | 89.04 | 91.11 | Hybrid | 4 | |||
aThe binary classification task involves 2 classes (met and not met). The results are provided for each class separately and then combined into the overall F value.
bThe best results from 3 related studies are used as the baseline. They are named after the approach they used: rules [34], hybrid [17], and HNN [36]. The baseline results in italics were calculated on the basis of at most eight positive examples, which account for less than 10% of the test data.
cc2s2: Cardiff Cohort Selection System.
dP: precision.
eR: recall.
fF: F measure.
gHNN: hierarchical neural network.
hThe overall values provided in the bottom row have been microaveraged across the 13 classifiers.
Detailed holdout test results for ADVANCED-CAD.
| System | Met | Not met | Overall | ||||
| Pa (%) | Rb (%) | Fc (%) | P (%) | R (%) | F (%) | F (%) | |
| c2s2d | 83.02 | 97.78 | 89.80 | 96.97 | 78.05 | 86.49 | 88.14 |
| Hybrid | 74.55 | 91.11 | 82.00 | 87.10 | 65.85 | 75.00 | 78.50 |
| Rules | 67.80 | 88.89 | 76.92 | 81.48 | 53.66 | 64.71 | 70.81 |
| HNNe | 77.36 | 91.11 | 83.67 | 87.88 | 70.73 | 78.38 | 81.03 |
aP: precision.
bR: recall.
cF: F measure.
dc2s2: Cardiff Cohort Selection System.
eHNN: hierarchical neural network.