BACKGROUND: The ability to identify the risk factors related to an adverse condition, e.g., heart failures (HF) diagnosis, is very important for improving care quality and reducing cost. Existing approaches for risk factor identification are either knowledge driven (from guidelines or literatures) or data driven (from observational data). No existing method provides a model to effectively combine expert knowledge with data driven insight for risk factor identification. METHODS: We present a systematic approach to enhance known knowledge-based risk factors with additional potential risk factors derived from data. The core of our approach is a sparse regression model with regularization terms that correspond to both knowledge and data driven risk factors. RESULTS: The approach is validated using a large dataset containing 4,644 heart failure cases and 45,981 controls. The outpatient electronic health records (EHRs) for these patients include diagnosis, medication, lab results from 2003-2010. We demonstrate that the proposed method can identify complementary risk factors that are not in the existing known factors and can better predict the onset of HF. We quantitatively compare different sets of risk factors in the context of predicting onset of HF using the performance metric, the Area Under the ROC Curve (AUC). The combined risk factors between knowledge and data significantly outperform knowledge-based risk factors alone. Furthermore, those additional risk factors are confirmed to be clinically meaningful by a cardiologist. CONCLUSION: We present a systematic framework for combining knowledge and data driven insights for risk factor identification. We demonstrate the power of this framework in the context of predicting onset of HF, where our approach can successfully identify intuitive and predictive risk factors beyond a set of known HF risk factors.
BACKGROUND: The ability to identify the risk factors related to an adverse condition, e.g., heart failures (HF) diagnosis, is very important for improving care quality and reducing cost. Existing approaches for risk factor identification are either knowledge driven (from guidelines or literatures) or data driven (from observational data). No existing method provides a model to effectively combine expert knowledge with data driven insight for risk factor identification. METHODS: We present a systematic approach to enhance known knowledge-based risk factors with additional potential risk factors derived from data. The core of our approach is a sparse regression model with regularization terms that correspond to both knowledge and data driven risk factors. RESULTS: The approach is validated using a large dataset containing 4,644 heart failure cases and 45,981 controls. The outpatient electronic health records (EHRs) for these patients include diagnosis, medication, lab results from 2003-2010. We demonstrate that the proposed method can identify complementary risk factors that are not in the existing known factors and can better predict the onset of HF. We quantitatively compare different sets of risk factors in the context of predicting onset of HF using the performance metric, the Area Under the ROC Curve (AUC). The combined risk factors between knowledge and data significantly outperform knowledge-based risk factors alone. Furthermore, those additional risk factors are confirmed to be clinically meaningful by a cardiologist. CONCLUSION: We present a systematic framework for combining knowledge and data driven insights for risk factor identification. We demonstrate the power of this framework in the context of predicting onset of HF, where our approach can successfully identify intuitive and predictive risk factors beyond a set of known HF risk factors.
Authors: Véronique L Roger; Alan S Go; Donald M Lloyd-Jones; Emelia J Benjamin; Jarett D Berry; William B Borden; Dawn M Bravata; Shifan Dai; Earl S Ford; Caroline S Fox; Heather J Fullerton; Cathleen Gillespie; Susan M Hailpern; John A Heit; Virginia J Howard; Brett M Kissela; Steven J Kittner; Daniel T Lackland; Judith H Lichtman; Lynda D Lisabeth; Diane M Makuc; Gregory M Marcus; Ariane Marelli; David B Matchar; Claudia S Moy; Dariush Mozaffarian; Michael E Mussolino; Graham Nichol; Nina P Paynter; Elsayed Z Soliman; Paul D Sorlie; Nona Sotoodehnia; Tanya N Turan; Salim S Virani; Nathan D Wong; Daniel Woo; Melanie B Turner Journal: Circulation Date: 2011-12-15 Impact factor: 29.690
Authors: Hua Xu; Shane P Stenner; Son Doan; Kevin B Johnson; Lemuel R Waitman; Joshua C Denny Journal: J Am Med Inform Assoc Date: 2010 Jan-Feb Impact factor: 4.497
Authors: Wayne Rosamond; Katherine Flegal; Gary Friday; Karen Furie; Alan Go; Kurt Greenlund; Nancy Haase; Michael Ho; Virginia Howard; Brett Kissela; Bret Kissela; Steven Kittner; Donald Lloyd-Jones; Mary McDermott; James Meigs; Claudia Moy; Graham Nichol; Christopher J O'Donnell; Veronique Roger; John Rumsfeld; Paul Sorlie; Julia Steinberger; Thomas Thom; Sylvia Wasserthiel-Smoller; Yuling Hong Journal: Circulation Date: 2006-12-28 Impact factor: 29.690
Authors: Javed Butler; Andreas Kalogeropoulos; Vasiliki Georgiopoulou; Rhonda Belue; Nicolas Rodondi; Melissa Garcia; Douglas C Bauer; Suzanne Satterfield; Andrew L Smith; Viola Vaccarino; Anne B Newman; Tamara B Harris; Peter W F Wilson; Stephen B Kritchevsky Journal: Circ Heart Fail Date: 2008-07 Impact factor: 8.790
Authors: Lesley H Curtis; David J Whellan; Bradley G Hammill; Adrian F Hernandez; Kevin J Anstrom; Alisa M Shea; Kevin A Schulman Journal: Arch Intern Med Date: 2008-02-25
Authors: Eldrin F Lewis; Scott D Solomon; Kathleen A Jablonski; Madeline Murguia Rice; Francesco Clemenza; Judith Hsia; Aldo P Maggioni; Miguel Zabalgoitia; Thao Huynh; Thomas E Cuddy; Bernard J Gersh; Jean Rouleau; Eugene Braunwald; Marc A Pfeffer Journal: Circ Heart Fail Date: 2009-04-14 Impact factor: 8.790
Authors: Douglas D Schocken; Emelia J Benjamin; Gregg C Fonarow; Harlan M Krumholz; Daniel Levy; George A Mensah; Jagat Narula; Eileen Stuart Shor; James B Young; Yuling Hong Journal: Circulation Date: 2008-04-07 Impact factor: 29.690
Authors: Jimeng Sun; Candace D McNaughton; Ping Zhang; Adam Perer; Aris Gkoulalas-Divanis; Joshua C Denny; Jacqueline Kirby; Thomas Lasko; Alexander Saip; Bradley A Malin Journal: J Am Med Inform Assoc Date: 2013-09-17 Impact factor: 4.497
Authors: Yajuan Wang; Kenney Ng; Roy J Byrd; Jianying Hu; Shahram Ebadollahi; Zahra Daar; Christopher deFilippi; Steven R Steinhubl; Walter F Stewart Journal: Conf Proc IEEE Eng Med Biol Soc Date: 2015