Kathleen E Corey1,2, Uri Kartoun3,4, Hui Zheng5, Stanley Y Shaw3,4. 1. Gastrointestinal Unit, Massachusetts General Hospital, 55 Fruit Street, Blake 4, Boston, MA, 02114, USA. kcorey@partners.org. 2. Harvard Medical School, Boston, MA, USA. kcorey@partners.org. 3. Harvard Medical School, Boston, MA, USA. 4. Center for Systems Biology, Massachusetts General Hospital, Boston, MA, USA. 5. Biostatistics Center, Massachusetts General Hospital, Boston, MA, USA.
Abstract
BACKGROUND AND AIMS: Nonalcoholic fatty liver disease (NAFLD) is the most common cause of chronic liver disease worldwide. Risk factors for NAFLD disease progression and liver-related outcomes remain incompletely understood due to the lack of computational identification methods. The present study sought to design a classification algorithm for NAFLD within the electronic medical record (EMR) for the development of large-scale longitudinal cohorts. METHODS: We implemented feature selection using logistic regression with adaptive LASSO. A training set of 620 patients was randomly selected from the Research Patient Data Registry at Partners Healthcare. To assess a true diagnosis for NAFLD we performed chart reviews and considered either a documentation of a biopsy or a clinical diagnosis of NAFLD. We included in our model variables laboratory measurements, diagnosis codes, and concepts extracted from medical notes. Variables with P < 0.05 were included in the multivariable analysis. RESULTS: The NAFLD classification algorithm included number of natural language mentions of NAFLD in the EMR, lifetime number of ICD-9 codes for NAFLD, and triglyceride level. This classification algorithm was superior to an algorithm using ICD-9 data alone with AUC of 0.85 versus 0.75 (P < 0.0001) and leads to the creation of a new independent cohort of 8458 individuals with a high probability for NAFLD. CONCLUSIONS: The NAFLD classification algorithm is superior to ICD-9 billing data alone. This approach is simple to develop, deploy, and can be applied across different institutions to create EMR-based cohorts of individuals with NAFLD.
BACKGROUND AND AIMS: Nonalcoholic fatty liver disease (NAFLD) is the most common cause of chronic liver disease worldwide. Risk factors for NAFLD disease progression and liver-related outcomes remain incompletely understood due to the lack of computational identification methods. The present study sought to design a classification algorithm for NAFLD within the electronic medical record (EMR) for the development of large-scale longitudinal cohorts. METHODS: We implemented feature selection using logistic regression with adaptive LASSO. A training set of 620 patients was randomly selected from the Research Patient Data Registry at Partners Healthcare. To assess a true diagnosis for NAFLD we performed chart reviews and considered either a documentation of a biopsy or a clinical diagnosis of NAFLD. We included in our model variables laboratory measurements, diagnosis codes, and concepts extracted from medical notes. Variables with P < 0.05 were included in the multivariable analysis. RESULTS: The NAFLD classification algorithm included number of natural language mentions of NAFLD in the EMR, lifetime number of ICD-9 codes for NAFLD, and triglyceride level. This classification algorithm was superior to an algorithm using ICD-9 data alone with AUC of 0.85 versus 0.75 (P < 0.0001) and leads to the creation of a new independent cohort of 8458 individuals with a high probability for NAFLD. CONCLUSIONS: The NAFLD classification algorithm is superior to ICD-9 billing data alone. This approach is simple to develop, deploy, and can be applied across different institutions to create EMR-based cohorts of individuals with NAFLD.
Authors: Jeffrey D Browning; Lidia S Szczepaniak; Robert Dobbins; Pamela Nuremberg; Jay D Horton; Jonathan C Cohen; Scott M Grundy; Helen H Hobbs Journal: Hepatology Date: 2004-12 Impact factor: 17.425
Authors: J R Kramer; J A Davila; E D Miller; P Richardson; T P Giordano; H B El-Serag Journal: Aliment Pharmacol Ther Date: 2007-11-08 Impact factor: 8.171
Authors: S Dam-Larsen; M Franzmann; I B Andersen; P Christoffersen; L B Jensen; T I A Sørensen; U Becker; F Bendtsen Journal: Gut Date: 2004-05 Impact factor: 23.059
Authors: Tielman T Van Vleck; Lili Chan; Steven G Coca; Catherine K Craven; Ron Do; Stephen B Ellis; Joseph L Kannry; Ruth J F Loos; Peter A Bonis; Judy Cho; Girish N Nadkarni Journal: Int J Med Inform Date: 2019-07-06 Impact factor: 4.046
Authors: Hannes Hagström; Leon A Adams; Alina M Allen; Christopher D Byrne; Yoosoo Chang; Henning Grønbaek; Mona Ismail; Peter Jepsen; Fasiha Kanwal; Jennifer Kramer; Jeffrey V Lazarus; Michelle T Long; Rohit Loomba; Philip N Newsome; Ian A Rowe; Seungho Ryu; Jörn M Schattenberg; Marina Serper; Nick Sheron; Tracey G Simon; Elliot B Tapper; Sarah Wild; Vincent Wai-Sun Wong; Yusuf Yilmaz; Shira Zelber-Sagi; Fredrik Åberg Journal: Hepatology Date: 2021-06-22 Impact factor: 17.298