| Literature DB >> 29089288 |
Jinying Chen1, Abhyuday N Jagannatha2, Samah J Fodeh3, Hong Yu1,4.
Abstract
BACKGROUND: Medical terms are a major obstacle for patients to comprehend their electronic health record (EHR) notes. Clinical natural language processing (NLP) systems that link EHR terms to lay terms or definitions allow patients to easily access helpful information when reading through their EHR notes, and have shown to improve patient EHR comprehension. However, high-quality lay language resources for EHR terms are very limited in the public domain. Because expanding and curating such a resource is a costly process, it is beneficial and even necessary to identify terms important for patient EHR comprehension first.Entities:
Keywords: electronic health records; information extraction; lexical entry selection; natural language processing; transfer learning
Year: 2017 PMID: 29089288 PMCID: PMC5686421 DOI: 10.2196/medinform.8531
Source DB: PubMed Journal: JMIR Med Inform
Figure 1Overview of development of the adapted distant supervision (ADS) natural language processing system to rank candidate terms mined from electronic health record (EHR) corpora: data extraction (steps 1 and 2), ADS (step 3), and evaluation (step 4). CHV: consumer health vocabulary.
Figure 2Equations for feature mapping functions used in feature space augmentation (1), objective function used in supervised distant supervision (2), and average precision (3).
Performance of different natural language processing systems on the evaluation set under 4 conditions using 100, 200, 500, and 1000 target-domain training examplesa.
| System | AUC-ROCb | Average precision | |||||||
| 100 | 200 | 500 | 1000 | 100 | 200 | 500 | 1000 | ||
| SourceOnly | 0.739 | 0.739 | 0.739 | 0.739 | 0.811 | 0.811 | 0.811 | 0.811 | |
| TargetOnly | 0.728 | 0.749 | 0.769 | 0.782 | 0.799 | 0.816 | 0.833 | 0.844 | |
| ADS-fsac | 0.746 | 0.756 | 0.815 | 0.823 | |||||
| ADS-sdsd | 0.775 | 0.786 | 0.838 | 0.847 | |||||
| 4.25 | 2.79 | 8.78 | 3.81 | 3.04 | 11.58 | ||||
| <.001 | .01 | <.001 | <.001 | .003 | <.001 | ||||
aThe highest performance scores are italicized.
bAUC-ROC: area under the receiver operating characteristic curve.
cADS-fsa: adapted distant supervision-feature space augmentation.
dADS-sds: adapted distant supervision-supervised distant supervision.
eThe P values for difference between ADS-fsa and SourceOnly, ADS-sds and SourceOnly, ADS-fsa and TargetOnly, and ADS-sds and TargetOnly are <.001 (t99 ranges from 4.84 to 133.31) for all metrics under all conditions. We report the P values (if the P value ≤.05) and the corresponding t99 values for difference between ADS-fsa and ADS-sds.
Performance of different ADS-sdsa systems implemented by using all types of features or by dropping each individual type of feature, under 4 conditions using 100, 200, 500, and 1000 target-domain training examplesb.
| ADS-sds system | AUC-ROCc | Average precision | |||||||
| 100 | 200 | 500 | 1000 | 100 | 200 | 500 | 1000 | ||
| ADS-sds-ALLd | 0.751 | 0.759 | 0.775 | 0.786 | 0.819 | 0.826 | 0.838 | 0.847 | |
| ADS-sds-woWEe | 0.711 | 0.718 | 0.726 | 0.733 | 0.780 | 0.785 | 0.793 | 0.799 | |
| 30.37 | 32.74 | 59.92 | 112.25 | 36.61 | 39.63 | 81.04 | 124.15 | ||
| <.001 | <.001 | <.001 | <.001 | <.001 | <.001 | <.001 | <.001 | ||
| ADS-sds-woSemf | 0.753 | 0.760 | 0.772 | 0.782 | 0.823 | 0.829 | 0.838 | 0.845 | |
| 4.63 | 12.28 | 3.18 | 4.00 | 4.55 | |||||
| <.001 | <.001 | .002 | <.001 | <.001 | |||||
| ADS-sds-woATRg | 0.751 | 0.759 | 0.774 | 0.786 | 0.819 | 0.826 | 0.838 | 0.847 | |
| ADS-sds-woGTFh | 0.740 | 0.749 | 0.765 | 0.777 | 0.813 | 0.821 | 0.833 | 0.842 | |
| 13.04 | 9.50 | 14.85 | 22.55 | 8.12 | 6.49 | 11.52 | 23.07 | ||
| <.001 | <.001 | <.001 | <.001 | <.001 | <.001 | <.001 | <.001 | ||
| ADS-sds-woTLi | 0.741 | 0.751 | 0.767 | 0.778 | 0.807 | 0.815 | 0.829 | 0.838 | |
| 11.21 | 10.81 | 19.78 | 25.58 | 16.43 | 17.15 | 34.50 | 41.72 | ||
| <.001 | <.001 | <.001 | <.001 | <.001 | <.001 | <.001 | <.001 | ||
aADS-sds: adapted distant supervision-supervised distant supervision.
bWe report the P values (if the P value ≤.05) and the corresponding t99 values for differences between each implementation and ADS-sds-ALL.
cAUC-ROC: area under the receiver operating characteristic curve.
dADS-sds-ALL: ADS-sds with all types of features.
eADS-sds-woWE: ADS-sds without word embedding.
fADS-sds-woSem: ADS-sds without semantic features.
gADS-sds-woATR: ADS-sds without features derived from automatic term recognition.
hADS-sds-woGTF: ADS-sds without general-domain term frequency.
iADS-sds-woTL: ADS-sds without term length.