David W Frost1, Shankar Vembu2, Jiayi Wang2, Karen Tu3, Quaid Morris4, Howard B Abrams5. 1. Division of General Internal Medicine, University of Toronto, Ontario, Canada; University Health Network, Toronto, Ontario; OpenLab at University Health Network, Toronto, Ontario; University of Toronto, Ontario, Canada. Electronic address: david.frost@uhn.ca. 2. Donnelly Center for Cellular and Biomolecular Research, Toronto, Ontario; University of Toronto, Ontario, Canada. 3. Department of Family and Community Medicine and Institute of Health Policy, Management and Evaluation, University of Toronto, Ontario, Canada; University Health Network, Toronto, Ontario; University of Toronto, Ontario, Canada; Institute for Clinical Evaluative Sciences, Toronto Ontario. 4. Donnelly Center for Cellular and Biomolecular Research, Toronto, Ontario; Banting and Best Department of Medical Research, Toronto, Ontario; Department of Medical Genetics, University of Toronto, Ontario, Canada; Department of Electrical and Computer Engineering, University of Toronto, Ontario, Canada; Department of Computer Science, University of Toronto, Ontario, Canada; University of Toronto, Ontario, Canada. 5. Division of General Internal Medicine, University of Toronto, Ontario, Canada; University Health Network, Toronto, Ontario; OpenLab at University Health Network, Toronto, Ontario; University of Toronto, Ontario, Canada.
Abstract
BACKGROUND: A small proportion of patients account for a high proportion of healthcare use. Accurate preemptive identification may facilitate tailored intervention. We sought to determine whether machine learning techniques using text from a family practice electronic medical record can be used to predict future high emergency department use and total costs by patients who are not yet high emergency department users or high cost to the healthcare system. METHODS: Text from fields of the cumulative patient profile within an electronic medical record of 43,111 patients was indexed. Separate training and validation cohorts were created. After processing, 11,905 words were used to fit a logistic regression model. The primary outcomes of interest in the 12 months after prediction were 3 or more emergency department visits and being in the top 5% in healthcare expenditures. Outcomes were assessed through linkage to administrative databases housed at the Institute for Clinical Evaluative Sciences. RESULTS: In the model to predict frequent emergency department visits, after excluding patients who were high emergency department users in the previous year, the area under the receiver operating characteristic curve was 0.71. By using the same methodology, the model to predict the top 5% in total system costs had an area under the receiver operating characteristic curve of 0.76. CONCLUSIONS: Machine learning techniques can be applied to analyze free text contained in electronic medical records. This dataset is more predictive of patients who will generate future high costs than future emergency department visits. It remains to be seen whether these predictions can be used to reduce costs by early interventions in this cohort of patients.
BACKGROUND: A small proportion of patients account for a high proportion of healthcare use. Accurate preemptive identification may facilitate tailored intervention. We sought to determine whether machine learning techniques using text from a family practice electronic medical record can be used to predict future high emergency department use and total costs by patients who are not yet high emergency department users or high cost to the healthcare system. METHODS: Text from fields of the cumulative patient profile within an electronic medical record of 43,111 patients was indexed. Separate training and validation cohorts were created. After processing, 11,905 words were used to fit a logistic regression model. The primary outcomes of interest in the 12 months after prediction were 3 or more emergency department visits and being in the top 5% in healthcare expenditures. Outcomes were assessed through linkage to administrative databases housed at the Institute for Clinical Evaluative Sciences. RESULTS: In the model to predict frequent emergency department visits, after excluding patients who were high emergency department users in the previous year, the area under the receiver operating characteristic curve was 0.71. By using the same methodology, the model to predict the top 5% in total system costs had an area under the receiver operating characteristic curve of 0.76. CONCLUSIONS: Machine learning techniques can be applied to analyze free text contained in electronic medical records. This dataset is more predictive of patients who will generate future high costs than future emergency department visits. It remains to be seen whether these predictions can be used to reduce costs by early interventions in this cohort of patients.
Authors: Gregory E Simon; Eric Johnson; Jean M Lawrence; Rebecca C Rossom; Brian Ahmedani; Frances L Lynch; Arne Beck; Beth Waitzfelder; Rebecca Ziebell; Robert B Penfold; Susan M Shortreed Journal: Am J Psychiatry Date: 2018-05-24 Impact factor: 18.112
Authors: Laura C Rosella; Kathy Kornas; Zhan Yao; Douglas G Manuel; Catherine Bornbaum; Randall Fransoo; Therese Stukel Journal: Med Care Date: 2018-10 Impact factor: 2.983
Authors: Sheryl Hui Xian Ng; Nabilah Rahman; Ian Yi Han Ang; Srinath Sridharan; Sravan Ramachandran; Debby Dan Wang; Astrid Khoo; Chuen Seng Tan; Mengling Feng; Sue-Anne Ee Shiow Toh; Xin Quan Tan Journal: BMJ Open Date: 2020-01-06 Impact factor: 2.692