| Literature DB >> 27903489 |
Jinying Chen1, Jiaping Zheng2, Hong Yu1,3.
Abstract
BACKGROUND: Many health organizations allow patients to access their own electronic health record (EHR) notes through online patient portals as a way to enhance patient-centered care. However, EHR notes are typically long and contain abundant medical jargon that can be difficult for patients to understand. In addition, many medical terms in patients' notes are not directly related to their health care needs. One way to help patients better comprehend their own notes is to reduce information overload and help them focus on medical terms that matter most to them. Interventions can then be developed by giving them targeted education to improve their EHR comprehension and the quality of care.Entities:
Keywords: electronic health records; information extraction; learning to rank; natural language processing; supervised learning
Year: 2016 PMID: 27903489 PMCID: PMC5156821 DOI: 10.2196/medinform.6373
Source DB: PubMed Journal: JMIR Med Inform
Figure 1Overview of our approach: building the FOCUS corpus (Step 1), developing FOCUS (Step 2), and evaluation (Step 3). FOCUS: Finding impOrtant medical Concepts most Useful to patientS; EHR: electronic health record; rankSVM: ranking support vector machine.
Figure 2Objective function used in training ranking support vector machine.
Figure 3Equation for defining a combined feature of TL and maxWL. TL: term length (ie, length of a candidate term by word); maxWL: length of the longest word (by character) in a candidate term.
Figure 4Equations for defining topic feature.
Figure 5Prediction function of random forest.
Statistics of the FOCUSa corpus.
| Characteristics of the FOCUS corpus | |
| Number of notes, | 90 |
| Number of words per EHRb note, mean (SD) | 816 (133) |
| Number of candidate terms identified by MetaMap per EHR note, mean (SD) | 250 (42) |
| Number of important medical terms identified by physicians per EHR note, mean (SD) | 9 (5) |
aFOCUS: Finding impOrtant medical Concepts most Useful to patientS.
bEHR: electronic health record.
The 8 major topics in the FOCUSa corpus.
| UMLSb semantic type | Number of important terms, | Example terms |
| Disease or syndrome | 295 | autoimmune hemolytic anemia, gastroesophageal reflux, pancytopenia, Sjogren's syndrome, osteoporosis |
| Organic chemical | 88 | atenolol, vincristine, warfarin, Wellbutrin, Zocor |
| Finding | 59 | alopecia, hematuria, hypertension, NSTEMI (non-ST-elevation myocardial infarction), retinopathy |
| Neoplastic process | 35 | dermoid, large B cell lymphoma, pancreatic neoplasm, thyroid nodule |
| Therapeutic or preventive procedure | 34 | chemotherapy, dialysis, immunosuppression, kidney transplantation, pancreatectomy |
| Amino acid, peptide, or proteinc | 30 | basal insulin, Rituxan, Neupogen, Synthroid, hemoglobin A1C, HPL (human placental lactogen) |
| Pathologic function | 25 | atrial fibrillation, autonomic dysfunction, BPH (benign prostatic hyperplasia), microscopic hematuria, systolic dysfunction |
| Diagnostic procedure | 17 | thyroid ultrasound, echocardiogram, endoscopy, biopsy, cardiac catheterization |
aFOCUS: Finding impOrtant medical Concepts most Useful to patientS.
bUMLS: Unified Medical Language System.
cElectronic health record terms in this topic were split into 2 subtopics: medicine (denoted by their ingredients) and laboratory measure.
Performance of different natural language processing systems.
| System | P5a | R5b | F5c | P10d | R10e | F10f | AUC-ROCrankingg | AUC-ROCKEh |
| Adapted KEA++i | 0.333 | 0.211 | 0.239 | 0.281 | 0.362 | 0.292 | 0.890 | 0.780 |
| RFj | 0.409 | 0.267 | 0.299 | 0.339 | 0.416 | 0.346 | 0.891 | 0.821 |
| FOCUSk | 0.462 | 0.305 | 0.341 | 0.369 | 0.464 | 0.381 | 0.940 | 0.866 |
| .01 | .01 | .01 | .045 | .03 | .02 | <.001 | <.001 |
aP5: precision at rank 5.
bR5: recall at rank 5.
cF5: F-score at rank 5.
dP10: precision at rank 10.
eR10: recall at rank 10.
fF10: F-score at rank 10.
gAUC-ROCranking: area under the receiver operating characteristic curve computed on the candidate terms extracted by a system.
hAUC-ROCKE: area under the receiver operating characteristic curve (KE: keyphrase extraction) computed by using all the gold-standard important terms as positive examples.
iKEA++: extension of the keyphrase extraction algorithm KEA.
jRF: random forest.
kFOCUS: Finding impOrtant medical Concepts most Useful to patientS.
Performance of natural language processing systems with and without the additional features.
| System | P5a | R5b | F5c | P10d | R10e | F10f | AUC-ROCrankingg | AUC-ROCKEh |
| FOCUS-basei | 0.413 | 0.256 | 0.295 | 0.331 | 0.401 | 0.337 | 0.911 | 0.840 |
| FOCUSj | 0.462 | 0.305 | 0.341 | 0.369 | 0.464 | 0.381 | 0.940 | 0.866 |
| .03 | .02 | .02 | .003 | <.001 | .001 | <.001 | <.001 | |
| RF-basek | 0.349 | 0.219 | 0.251 | 0.303 | 0.381 | 0.315 | 0.848 | 0.781 |
| RFl | 0.409 | 0.267 | 0.299 | 0.339 | 0.416 | 0.346 | 0.891 | 0.821 |
| .003 | .01 | .01 | .01 | .10 | .046 | <.001 | <.001 |
aP5: precision at rank 5.
bR5: recall at rank 5.
cF5: F-score at rank 5.
dP10: precision at rank 10.
eR10: recall at rank 10.
fF10: F-score at rank 10.
gAUC-ROCranking: area under the receiver operating characteristic curve computed on the candidate terms extracted by a system.
hAUC-ROCKE: area under the receiver operating characteristic curve (KE: keyphrase extraction) computed by using all the gold-standard important terms as positive examples.
iFOCUS-base: Finding impOrtant medical Concepts most Useful to patientS; uses only the baseline features.
jFOCUS: Finding impOrtant medical Concepts most Useful to patientS; uses the baseline features plus the additional features.
kRF-base: random forest; uses only the baseline features.
lRF: random forest; uses the baseline features plus the additional features.