Sizheng Steven Zhao1,2,3, Chuan Hong4, Tianrun Cai3,4, Chang Xu3, Jie Huang3, Joerg Ermann3,4, Nicola J Goodson1,2, Daniel H Solomon3,4,5, Tianxi Cai4,6, Katherine P Liao3,4. 1. Institute of Ageing and Chronic Disease, University of Liverpool. 2. Department of Academic Rheumatology, Aintree University Hospital, Liverpool, UK. 3. Division of Rheumatology, Immunology and Allergy, Brigham and Women's Hospital. 4. Harvard Medical School. 5. Division of Pharmacoepidemiology and Pharmacoeconomics, Brigham and Women's Hospital. 6. Harvard T.H. Chan School of Public Health, Boston, MA, USA.
Abstract
OBJECTIVES: To develop classification algorithms that accurately identify axial SpA (axSpA) patients in electronic health records, and compare the performance of algorithms incorporating free-text data against approaches using only International Classification of Diseases (ICD) codes. METHODS: An enriched cohort of 7853 eligible patients was created from electronic health records of two large hospitals using automated searches (⩾1 ICD codes combined with simple text searches). Key disease concepts from free-text data were extracted using NLP and combined with ICD codes to develop algorithms. We created both supervised regression-based algorithms-on a training set of 127 axSpA cases and 423 non-cases-and unsupervised algorithms to identify patients with high probability of having axSpA from the enriched cohort. Their performance was compared against classifications using ICD codes only. RESULTS: NLP extracted four disease concepts of high predictive value: ankylosing spondylitis, sacroiliitis, HLA-B27 and spondylitis. The unsupervised algorithm, incorporating both the NLP concept and ICD code for AS, identified the greatest number of patients. By setting the probability threshold to attain 80% positive predictive value, it identified 1509 axSpA patients (mean age 53 years, 71% male). Sensitivity was 0.78, specificity 0.94 and area under the curve 0.93. The two supervised algorithms performed similarly but identified fewer patients. All three outperformed traditional approaches using ICD codes alone (area under the curve 0.80-0.87). CONCLUSION: Algorithms incorporating free-text data can accurately identify axSpA patients in electronic health records. Large cohorts identified using these novel methods offer exciting opportunities for future clinical research.
OBJECTIVES: To develop classification algorithms that accurately identify axial SpA (axSpA) patients in electronic health records, and compare the performance of algorithms incorporating free-text data against approaches using only International Classification of Diseases (ICD) codes. METHODS: An enriched cohort of 7853 eligible patients was created from electronic health records of two large hospitals using automated searches (⩾1 ICD codes combined with simple text searches). Key disease concepts from free-text data were extracted using NLP and combined with ICD codes to develop algorithms. We created both supervised regression-based algorithms-on a training set of 127 axSpA cases and 423 non-cases-and unsupervised algorithms to identify patients with high probability of having axSpA from the enriched cohort. Their performance was compared against classifications using ICD codes only. RESULTS: NLP extracted four disease concepts of high predictive value: ankylosing spondylitis, sacroiliitis, HLA-B27 and spondylitis. The unsupervised algorithm, incorporating both the NLP concept and ICD code for AS, identified the greatest number of patients. By setting the probability threshold to attain 80% positive predictive value, it identified 1509 axSpA patients (mean age 53 years, 71% male). Sensitivity was 0.78, specificity 0.94 and area under the curve 0.93. The two supervised algorithms performed similarly but identified fewer patients. All three outperformed traditional approaches using ICD codes alone (area under the curve 0.80-0.87). CONCLUSION: Algorithms incorporating free-text data can accurately identify axSpA patients in electronic health records. Large cohorts identified using these novel methods offer exciting opportunities for future clinical research.
Authors: Kimberly J O'Malley; Karon F Cook; Matt D Price; Kimberly Raiford Wildes; John F Hurdle; Carol M Ashton Journal: Health Serv Res Date: 2005-10 Impact factor: 3.402
Authors: U Lindström; S Exarchou; V Sigurdardottir; B Sundström; J Askling; J K Eriksson; H Forsblad-d'Elia; C Turesson; L E Kristensen; L Jacobsson Journal: Scand J Rheumatol Date: 2015-03-23 Impact factor: 3.641
Authors: M Rudwaleit; D van der Heijde; R Landewé; J Listing; N Akkoc; J Brandt; J Braun; C T Chou; E Collantes-Estevez; M Dougados; F Huang; J Gu; M A Khan; Y Kirazli; W P Maksymowych; H Mielants; I J Sørensen; S Ozgocmen; E Roussou; R Valle-Oñate; U Weber; J Wei; J Sieper Journal: Ann Rheum Dis Date: 2009-03-17 Impact factor: 19.103
Authors: Ashwin N Ananthakrishnan; Andrew Cagan; Tianxi Cai; Vivian S Gainer; Stanley Y Shaw; Guergana Savova; Susanne Churchill; Elizabeth W Karlson; Shawn N Murphy; Katherine P Liao; Isaac Kohane Journal: Inflamm Bowel Dis Date: 2016-01 Impact factor: 5.325
Authors: Maureen Dubreuil; Christine Peloquin; Yuqing Zhang; Hyon K Choi; Robert D Inman; Tuhina Neogi Journal: Pharmacoepidemiol Drug Saf Date: 2016-01-13 Impact factor: 2.890
Authors: Katherine P Liao; Ashwin N Ananthakrishnan; Vishesh Kumar; Zongqi Xia; Andrew Cagan; Vivian S Gainer; Sergey Goryachev; Pei Chen; Guergana K Savova; Denis Agniel; Susanne Churchill; Jaeyoung Lee; Shawn N Murphy; Robert M Plenge; Peter Szolovits; Isaac Kohane; Stanley Y Shaw; Elizabeth W Karlson; Tianxi Cai Journal: PLoS One Date: 2015-08-24 Impact factor: 3.240
Authors: Katherine P Liao; Tianxi Cai; Guergana K Savova; Shawn N Murphy; Elizabeth W Karlson; Ashwin N Ananthakrishnan; Vivian S Gainer; Stanley Y Shaw; Zongqi Xia; Peter Szolovits; Susanne Churchill; Isaac Kohane Journal: BMJ Date: 2015-04-24
Authors: Jeffrey M Ashburner; Yuchiao Chang; Xin Wang; Shaan Khurshid; Christopher D Anderson; Kumar Dahal; Dana Weisenfeld; Tianrun Cai; Katherine P Liao; Kavishwar B Wagholikar; Shawn N Murphy; Steven J Atlas; Steven A Lubitz; Daniel E Singer Journal: J Am Heart Assoc Date: 2022-07-29 Impact factor: 6.106