OBJECTIVE: To determine how well statistical text mining (STM) models can identify falls within clinical text associated with an ambulatory encounter. MATERIALS AND METHODS: 2241 patients were selected with a fall-related ICD-9-CM E-code or matched injury diagnosis code while being treated as an outpatient at one of four sites within the Veterans Health Administration. All clinical documents within a 48-h window of the recorded E-code or injury diagnosis code for each patient were obtained (n=26 010; 611 distinct document titles) and annotated for falls. Logistic regression, support vector machine, and cost-sensitive support vector machine (SVM-cost) models were trained on a stratified sample of 70% of documents from one location (dataset Atrain) and then applied to the remaining unseen documents (datasets Atest-D). RESULTS: All three STM models obtained area under the receiver operating characteristic curve (AUC) scores above 0.950 on the four test datasets (Atest-D). The SVM-cost model obtained the highest AUC scores, ranging from 0.953 to 0.978. The SVM-cost model also achieved F-measure values ranging from 0.745 to 0.853, sensitivity from 0.890 to 0.931, and specificity from 0.877 to 0.944. DISCUSSION: The STM models performed well across a large heterogeneous collection of document titles. In addition, the models also generalized across other sites, including a traditionally bilingual site that had distinctly different grammatical patterns. CONCLUSIONS: The results of this study suggest STM-based models have the potential to improve surveillance of falls. Furthermore, the encouraging evidence shown here that STM is a robust technique for mining clinical documents bodes well for other surveillance-related topics.
OBJECTIVE: To determine how well statistical text mining (STM) models can identify falls within clinical text associated with an ambulatory encounter. MATERIALS AND METHODS: 2241 patients were selected with a fall-related ICD-9-CM E-code or matched injury diagnosis code while being treated as an outpatient at one of four sites within the Veterans Health Administration. All clinical documents within a 48-h window of the recorded E-code or injury diagnosis code for each patient were obtained (n=26 010; 611 distinct document titles) and annotated for falls. Logistic regression, support vector machine, and cost-sensitive support vector machine (SVM-cost) models were trained on a stratified sample of 70% of documents from one location (dataset Atrain) and then applied to the remaining unseen documents (datasets Atest-D). RESULTS: All three STM models obtained area under the receiver operating characteristic curve (AUC) scores above 0.950 on the four test datasets (Atest-D). The SVM-cost model obtained the highest AUC scores, ranging from 0.953 to 0.978. The SVM-cost model also achieved F-measure values ranging from 0.745 to 0.853, sensitivity from 0.890 to 0.931, and specificity from 0.877 to 0.944. DISCUSSION: The STM models performed well across a large heterogeneous collection of document titles. In addition, the models also generalized across other sites, including a traditionally bilingual site that had distinctly different grammatical patterns. CONCLUSIONS: The results of this study suggest STM-based models have the potential to improve surveillance of falls. Furthermore, the encouraging evidence shown here that STM is a robust technique for mining clinical documents bodes well for other surveillance-related topics.
Entities:
Keywords:
Accidental Falls; Ambulatory Care; Electronic Health Records; Text Mining
Authors: John P Pestian; Pawel Matykiewicz; Michelle Linn-Gust; Brett South; Ozlem Uzuner; Jan Wiebe; K Bretonnel Cohen; John Hurdle; Christopher Brew Journal: Biomed Inform Insights Date: 2012-01-30
Authors: Stephen L Luther; James A McCart; Donald J Berndt; Bridget Hahm; Dezon Finch; Jay Jarman; Philip R Foulis; William A Lapcevic; Robert R Campbell; Ronald I Shorr; Keryl Motta Valencia; Gail Powell-Cope Journal: Am J Public Health Date: 2015-04-16 Impact factor: 9.308
Authors: Sunyang Fu; Bjoerg Thorsteinsdottir; Xin Zhang; Guilherme S Lopes; Sandeep R Pagali; Nathan K LeBrasseur; Andrew Wen; Hongfang Liu; Walter A Rocca; Janet E Olson; Jennifer St Sauver; Sunghwan Sohn Journal: Int J Med Inform Date: 2022-03-07 Impact factor: 4.730
Authors: Noman Dormosh; Martijn C Schut; Martijn W Heymans; Nathalie van der Velde; Ameen Abu-Hanna Journal: J Gerontol A Biol Sci Med Sci Date: 2022-07-05 Impact factor: 6.591