BACKGROUND: This cross-sectional retrospective study utilized Natural Language Processing (NLP) to extract tobacco-use associated variables from clinical notes documented in the Electronic Health Record (EHR). OBJECITVE: To develop a rule-based algorithm for determining the present status of the patient's tobacco-use. METHODS: Clinical notes (n= 5,371 documents) from 363 patients were mined and classified by NLP software into four classes namely: "Current Smoker", "Past Smoker", "Nonsmoker" and "Unknown". Two coders manually classified these documents into above mentioned classes (document-level gold standard classification (DLGSC)). A tobacco-use status was derived per patient (patient-level gold standard classification (PLGSC)), based on individual documents' status by the same two coders. The DLGSC and PLGSC were compared to the results derived from NLP and rule-based algorithm, respectively. RESULTS: The initial Cohen's kappa (n= 1,000 documents) was 0.9448 (95% CI = 0.9281-0.9615), indicating a strong agreement between the two raters. Subsequently, for 371 documents the Cohen's kappa was 0.9889 (95% CI = 0.979-1.000). The F-measures for the document-level classification for the four classes were 0.700, 0.753, 0.839 and 0.988 while the patient-level classifications were 0.580, 0.771, 0.730 and 0.933 respectively. CONCLUSIONS: NLP and the rule-based algorithm exhibited utility for deriving the present tobacco-use status of patients. Current strategies are targeting further improvement in precision to enhance translational value of the tool.
BACKGROUND: This cross-sectional retrospective study utilized Natural Language Processing (NLP) to extract tobacco-use associated variables from clinical notes documented in the Electronic Health Record (EHR). OBJECITVE: To develop a rule-based algorithm for determining the present status of the patient's tobacco-use. METHODS: Clinical notes (n= 5,371 documents) from 363 patients were mined and classified by NLP software into four classes namely: "Current Smoker", "Past Smoker", "Nonsmoker" and "Unknown". Two coders manually classified these documents into above mentioned classes (document-level gold standard classification (DLGSC)). A tobacco-use status was derived per patient (patient-level gold standard classification (PLGSC)), based on individual documents' status by the same two coders. The DLGSC and PLGSC were compared to the results derived from NLP and rule-based algorithm, respectively. RESULTS: The initial Cohen's kappa (n= 1,000 documents) was 0.9448 (95% CI = 0.9281-0.9615), indicating a strong agreement between the two raters. Subsequently, for 371 documents the Cohen's kappa was 0.9889 (95% CI = 0.979-1.000). The F-measures for the document-level classification for the four classes were 0.700, 0.753, 0.839 and 0.988 while the patient-level classifications were 0.580, 0.771, 0.730 and 0.933 respectively. CONCLUSIONS: NLP and the rule-based algorithm exhibited utility for deriving the present tobacco-use status of patients. Current strategies are targeting further improvement in precision to enhance translational value of the tool.
Entities:
Keywords:
Data mining; decision support systems clinical; electronic health records; health information systems; information storage and retrieval; smoking
Authors: Lisa Simon; Enihomo Obadan-Udoh; Alfa-Ibrahim Yansane; Arti Gharpure; Steven Licht; Jean Calvo; James Deschner; Anna Damanaki; Berit Hackenberg; Muhammad Walji; Heiko Spallek; Elsbeth Kalenderian Journal: Appl Clin Inform Date: 2019-05-29 Impact factor: 2.342
Authors: AlokSagar Panny; Harshad Hegde; Ingrid Glurich; Frank A Scannapieco; Jayanth G Vedre; Jeffrey J VanWormer; Jeffrey Miecznikowski; Amit Acharya Journal: Methods Inf Med Date: 2022-04-05 Impact factor: 1.800
Authors: Braja G Patra; Mohit M Sharma; Veer Vekaria; Prakash Adekkanattu; Olga V Patterson; Benjamin Glicksberg; Lauren A Lepow; Euijung Ryu; Joanna M Biernacka; Al'ona Furmanchuk; Thomas J George; William Hogan; Yonghui Wu; Xi Yang; Jiang Bian; Myrna Weissman; Priya Wickramaratne; J John Mann; Mark Olfson; Thomas R Campion; Mark Weiner; Jyotishman Pathak Journal: J Am Med Inform Assoc Date: 2021-11-25 Impact factor: 7.942