| Literature DB >> 27836816 |
Le Zheng1,2, Yue Wang2,3, Shiying Hao2, Andrew Y Shin2, Bo Jin4, Anh D Ngo4, Medina S Jackson-Browne4, Daniel J Feller4, Tianyun Fu4, Karena Zhang2, Xin Zhou5, Chunqing Zhu4, Dorothy Dai4, Yunxian Yu6, Gang Zheng3, Yu-Ming Li5, Doff B McElhinney2, Devore S Culver7, Shaun T Alfreds7, Frank Stearns4, Karl G Sylvester2, Eric Widen4, Xuefeng Bruce Ling2,6.
Abstract
BACKGROUND: Diabetes case finding based on structured medical records does not fully identify diabetic patients whose medical histories related to diabetes are available in the form of free text. Manual chart reviews have been used but involve high labor costs and long latency.Entities:
Keywords: data mining; diabetes mellitus; electronic medical record; natural language processing
Year: 2016 PMID: 27836816 PMCID: PMC5124114 DOI: 10.2196/medinform.6328
Source DB: PubMed Journal: JMIR Med Inform
Figure 1A schematic presentation of the natural language processing (NLP)–based algorithm integrated into the statewide diabetes mellitus case finding and surveillance. The clinical note was preprocessed and identified to generate the decision. The knowledge bases, statistical model, and the gold standard datasets form the basis of the NLP engine. ICD: International Classification of Diseases; NLM: US National Library of Medicine; MeSH: Medical Subject Headings; EMR: electronic medical record; HIE: health information exchange; PPV: positive predictive value. SNOMED CT: Systematized Nomenclature of Medicine – Clinical Terms.
Figure 2Cohort construction of the study. ICD9: International Classification of Diseases, Ninth Revision; DM: diabetes mellitus; MDS: multidimensional scaling.
Figure 3Equations describing the modeling process of the natural language processing (NLP)–based algorithm.
Figure 4The multidimensional scaling (MDS) plots of the training result. This analysis was aimed at detecting meaningful underlying dimensions, for example, 1 and 2, which allow the explanation of the observed similarities (distances) between the investigated subjects. The axes of the MDS plots represent no real sizes and thus were marked as dimension 1 and dimension 2 without units. The red dots and blue triangles, indicating the positive and negative samples, were clearly separated. The “false positives” are circled in the plot. Chart reviews showed that these were notes with a genuine diagnosis of diabetes mellitus.
Figure 5List of the top 30 clinical variables included in the diabetes mellitus natural language processing (NLP)–based model. BMI: body mass index.
Figure 6Performance evaluation of the proposed case finding algorithm. Top: the contingency tables on blind test and prospective gold standard datasets. Middle: the positive predictive value (PPV), negative predictive value (NPV), sensitivity, and specificity of the validation based on the retrospective blind testing subcohort and prospective cohort. Bottom: the prospective case finding results in the total population. DM: diabetes mellitus; GS: gold standard; ICD-9-CM: International Classification of Diseases, Ninth Revision, Clinical Modification; NLP: natural language processing.