Literature DB >> 27441408

Population-Level Prediction of Type 2 Diabetes From Claims Data and Analysis of Risk Factors.

Narges Razavian1, Saul Blecker2, Ann Marie Schmidt3, Aaron Smith-McLallen4, Somesh Nigam4, David Sontag1.   

Abstract

We present a new approach to population health, in which data-driven predictive models are learned for outcomes such as type 2 diabetes. Our approach enables risk assessment from readily available electronic claims data on large populations, without additional screening cost. Proposed model uncovers early and late-stage risk factors. Using administrative claims, pharmacy records, healthcare utilization, and laboratory results of 4.1 million individuals between 2005 and 2009, an initial set of 42,000 variables were derived that together describe the full health status and history of every individual. Machine learning was then used to methodically enhance predictive variable set and fit models predicting onset of type 2 diabetes in 2009-2011, 2010-2012, and 2011-2013. We compared the enhanced model with a parsimonious model consisting of known diabetes risk factors in a real-world environment, where missing values are common and prevalent. Furthermore, we analyzed novel and known risk factors emerging from the model at different age groups at different stages before the onset. Parsimonious model using 21 classic diabetes risk factors resulted in area under ROC curve (AUC) of 0.75 for diabetes prediction within a 2-year window following the baseline. The enhanced model increased the AUC to 0.80, with about 900 variables selected as predictive (p < 0.0001 for differences between AUCs). Similar improvements were observed for models predicting diabetes onset 1-3 years and 2-4 years after baseline. The enhanced model improved positive predictive value by at least 50% and identified novel surrogate risk factors for type 2 diabetes, such as chronic liver disease (odds ratio [OR] 3.71), high alanine aminotransferase (OR 2.26), esophageal reflux (OR 1.85), and history of acute bronchitis (OR 1.45). Liver risk factors emerge later in the process of diabetes development compared with obesity-related factors such as hypertension and high hemoglobin A1c. In conclusion, population-level risk prediction for type 2 diabetes using readily available administrative data is feasible and has better prediction performance than classical diabetes risk prediction algorithms on very large populations with missing data. The new model enables intervention allocation at national scale quickly and accurately and recovers potentially novel risk factors at different stages before the disease onset.

Entities:  

Keywords:  big data analytics; data mining; disease prediction; longitudinal study; machine learning; predictive analytics; risk assessment

Year:  2015        PMID: 27441408     DOI: 10.1089/big.2015.0020

Source DB:  PubMed          Journal:  Big Data        ISSN: 2167-6461            Impact factor:   2.128


  40 in total

1.  Combining the Power of Artificial Intelligence with the Richness of Healthcare Claims Data: Opportunities and Challenges.

Authors:  David Thesmar; David Sraer; Lisa Pinheiro; Nick Dadson; Razvan Veliche; Paul Greenberg
Journal:  Pharmacoeconomics       Date:  2019-06       Impact factor: 4.981

2.  Targeting weight loss interventions to reduce cardiovascular complications of type 2 diabetes: a machine learning-based post-hoc analysis of heterogeneous treatment effects in the Look AHEAD trial.

Authors:  Aaron Baum; Joseph Scarpa; Emilie Bruzelius; Ronald Tamler; Sanjay Basu; James Faghmous
Journal:  Lancet Diabetes Endocrinol       Date:  2017-07-12       Impact factor: 32.069

3.  Artificial Intelligence and the Softer Side of Medicine.

Authors:  Joseph A Craft
Journal:  Mo Med       Date:  2018 Sep-Oct

Review 4.  Risks and Opportunities to Ensure Equity in the Application of Big Data Research in Public Health.

Authors:  Paul Wesson; Yulin Hswen; Gilmer Valdes; Kristefer Stojanovski; Margaret A Handley
Journal:  Annu Rev Public Health       Date:  2021-12-06       Impact factor: 21.981

5.  A Comparison of Logistic Regression Against Machine Learning Algorithms for Gastric Cancer Risk Prediction Within Real-World Clinical Data Streams.

Authors:  Robert J Huang; Nicole Sung-Eun Kwon; Yutaka Tomizawa; Alyssa Y Choi; Tina Hernandez-Boussard; Joo Ha Hwang
Journal:  JCO Clin Cancer Inform       Date:  2022-06

6.  Stop Explaining Black Box Machine Learning Models for High Stakes Decisions and Use Interpretable Models Instead.

Authors:  Cynthia Rudin
Journal:  Nat Mach Intell       Date:  2019-05-13

7.  Stacked classifiers for individualized prediction of glycemic control following initiation of metformin therapy in type 2 diabetes.

Authors:  Dennis H Murphree; Elaheh Arabmakki; Che Ngufor; Curtis B Storlie; Rozalina G McCoy
Journal:  Comput Biol Med       Date:  2018-10-16       Impact factor: 4.589

8.  Digital Diabetes Data and Artificial Intelligence: A Time for Humility Not Hubris.

Authors:  David Kerr; David C Klonoff
Journal:  J Diabetes Sci Technol       Date:  2018-09-05

9.  The Value of Artificial Intelligence in Laboratory Medicine.

Authors:  Ketan Paranjape; Michiel Schinkel; Richard D Hammer; Bo Schouten; R S Nannan Panday; Paul W G Elbers; Mark H H Kramer; Prabath Nanayakkara
Journal:  Am J Clin Pathol       Date:  2021-05-18       Impact factor: 2.493

10.  Development and Validation of a Machine Learning Model Using Administrative Health Data to Predict Onset of Type 2 Diabetes.

Authors:  Mathieu Ravaut; Vinyas Harish; Hamed Sadeghi; Kin Kwan Leung; Maksims Volkovs; Kathy Kornas; Tristan Watson; Tomi Poutanen; Laura C Rosella
Journal:  JAMA Netw Open       Date:  2021-05-03
View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.