Stephen T Wu1, Young J Juhn2, Sunghwan Sohn1, Hongfang Liu1. 1. Department of Health Sciences Research, Mayo Clinic, Rochester, Minnesota, USA. 2. Department of Community Pediatric and Adolescent Medicine, Mayo Clinic, Rochester, Minnesota, USA.
Abstract
OBJECTIVE: To specify the problem of patient-level temporal aggregation from clinical text and introduce several probabilistic methods for addressing that problem. The patient-level perspective differs from the prevailing natural language processing (NLP) practice of evaluating at the term, event, sentence, document, or visit level. METHODS: We utilized an existing pediatric asthma cohort with manual annotations. After generating a basic feature set via standard clinical NLP methods, we introduce six methods of aggregating time-distributed features from the document level to the patient level. These aggregation methods are used to classify patients according to their asthma status in two hypothetical settings: retrospective epidemiology and clinical decision support. RESULTS: In both settings, solid patient classification performance was obtained with machine learning algorithms on a number of evidence aggregation methods, with Sum aggregation obtaining the highest F1 score of 85.71% on the retrospective epidemiological setting, and a probability density function-based method obtaining the highest F1 score of 74.63% on the clinical decision support setting. Multiple techniques also estimated the diagnosis date (index date) of asthma with promising accuracy. DISCUSSION: The clinical decision support setting is a more difficult problem. We rule out some aggregation methods rather than determining the best overall aggregation method, since our preliminary data set represented a practical setting in which manually annotated data were limited. CONCLUSION: Results contrasted the strengths of several aggregation algorithms in different settings. Multiple approaches exhibited good patient classification performance, and also predicted the timing of estimates with reasonable accuracy. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://group.bmj.com/group/rights-licensing/permissions.
OBJECTIVE: To specify the problem of patient-level temporal aggregation from clinical text and introduce several probabilistic methods for addressing that problem. The patient-level perspective differs from the prevailing natural language processing (NLP) practice of evaluating at the term, event, sentence, document, or visit level. METHODS: We utilized an existing pediatric asthma cohort with manual annotations. After generating a basic feature set via standard clinical NLP methods, we introduce six methods of aggregating time-distributed features from the document level to the patient level. These aggregation methods are used to classify patients according to their asthma status in two hypothetical settings: retrospective epidemiology and clinical decision support. RESULTS: In both settings, solid patient classification performance was obtained with machine learning algorithms on a number of evidence aggregation methods, with Sum aggregation obtaining the highest F1 score of 85.71% on the retrospective epidemiological setting, and a probability density function-based method obtaining the highest F1 score of 74.63% on the clinical decision support setting. Multiple techniques also estimated the diagnosis date (index date) of asthma with promising accuracy. DISCUSSION: The clinical decision support setting is a more difficult problem. We rule out some aggregation methods rather than determining the best overall aggregation method, since our preliminary data set represented a practical setting in which manually annotated data were limited. CONCLUSION: Results contrasted the strengths of several aggregation algorithms in different settings. Multiple approaches exhibited good patient classification performance, and also predicted the timing of estimates with reasonable accuracy. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://group.bmj.com/group/rights-licensing/permissions.
Entities:
Keywords:
Asthma epidemiology; Information extraction; Natural language processing; Patient classification
Authors: Susana B Martins; Yuval Shahar; Maya Galperin; Herbert Kaizer; Dina Goren-Bar; Deborah McNaughton; Lawrence V Basso; Mary K Goldstein Journal: Stud Health Technol Inform Date: 2004
Authors: Young J Juhn; Jennifer St Sauver; Slavica Katusic; Delfino Vargas; Amy Weaver; John Yunginger Journal: Soc Sci Med Date: 2005-01-11 Impact factor: 4.634
Authors: B P Yawn; J W Yunginger; P C Wollan; C E Reed; M D Silverstein; A G Harris Journal: J Allergy Clin Immunol Date: 1999-01 Impact factor: 10.793
Authors: M D Silverstein; J W Yunginger; C E Reed; T Petterson; D Zimmerman; J T Li; W M O'Fallon Journal: J Allergy Clin Immunol Date: 1997-04 Impact factor: 10.793
Authors: M D Silverstein; C E Reed; E J O'Connell; L J Melton; W M O'Fallon; J W Yunginger Journal: N Engl J Med Date: 1994-12-08 Impact factor: 91.245
Authors: Chung-Il Wi; Sunghwan Sohn; Mary C Rolfes; Alicia Seabright; Euijung Ryu; Gretchen Voge; Kay A Bachman; Miguel A Park; Hirohito Kita; Ivana T Croghan; Hongfang Liu; Young J Juhn Journal: Am J Respir Crit Care Med Date: 2017-08-15 Impact factor: 21.405
Authors: Son Doan; Cleo K Maehara; Juan D Chaparro; Sisi Lu; Ruiling Liu; Amanda Graham; Erika Berry; Chun-Nan Hsu; John T Kanegaye; David D Lloyd; Lucila Ohno-Machado; Jane C Burns; Adriana H Tremoulet Journal: Acad Emerg Med Date: 2016-04-13 Impact factor: 3.451