Gregor Stiglic1, Fei Wang2, Adam Davey3, Zoran Obradovic3. 1. University of Maribor, Maribor, Slovenia. 2. IBM T.J. Watson Research Center, Yorktown Heights, NY. 3. Temple University, Philadelphia, PA.
Abstract
BACKGROUND: Regulations and privacy concerns often hinder exchange of healthcare data between hospitals or other healthcare providers. Sharing predictive models built on original data and averaging their results offers an alternative to more efficient prediction of outcomes on new cases. Although one can choose from many techniques to combine outputs from different predictive models, it is difficult to find studies that try to interpret the results obtained from ensemble-learning methods. METHODS: We propose a novel approach to classification based on models from different hospitals that allows a high level of performance along with comprehensibility of obtained results. Our approach is based on regularized sparse regression models in two hierarchical levels and exploits the interpretability of obtained regression coefficients to rank the contribution of hospitals in terms of outcome prediction. RESULTS: The proposed approach was used to predict the 30-days all-cause readmissions for pediatric patients in 54 Californian hospitals. Using repeated holdout evaluation, including more than 60,000 hospital discharge records, we compared the proposed approach to alternative approaches. The performance of two-level classification model was measured using the Area Under the ROC Curve (AUC) with an additional evaluation that uncovered the importance and contribution of each single data source (i.e. hospital) to the final result. The results for the best distributed model (AUC=0.787, 95% CI: 0.780-0.794) demonstrate no significant difference in terms of AUC performance when compared to a single elastic net model built on all available data (AUC=0.789, 95% CI: 0.781-0.796). CONCLUSIONS: This paper presents a novel approach to improved classification with shared predictive models for environments where centralized collection of data is not possible. The significant improvements in classification performance and interpretability of results demonstrate the effectiveness of our approach.
BACKGROUND: Regulations and privacy concerns often hinder exchange of healthcare data between hospitals or other healthcare providers. Sharing predictive models built on original data and averaging their results offers an alternative to more efficient prediction of outcomes on new cases. Although one can choose from many techniques to combine outputs from different predictive models, it is difficult to find studies that try to interpret the results obtained from ensemble-learning methods. METHODS: We propose a novel approach to classification based on models from different hospitals that allows a high level of performance along with comprehensibility of obtained results. Our approach is based on regularized sparse regression models in two hierarchical levels and exploits the interpretability of obtained regression coefficients to rank the contribution of hospitals in terms of outcome prediction. RESULTS: The proposed approach was used to predict the 30-days all-cause readmissions for pediatric patients in 54 Californian hospitals. Using repeated holdout evaluation, including more than 60,000 hospital discharge records, we compared the proposed approach to alternative approaches. The performance of two-level classification model was measured using the Area Under the ROC Curve (AUC) with an additional evaluation that uncovered the importance and contribution of each single data source (i.e. hospital) to the final result. The results for the best distributed model (AUC=0.787, 95% CI: 0.780-0.794) demonstrate no significant difference in terms of AUC performance when compared to a single elastic net model built on all available data (AUC=0.789, 95% CI: 0.781-0.796). CONCLUSIONS: This paper presents a novel approach to improved classification with shared predictive models for environments where centralized collection of data is not possible. The significant improvements in classification performance and interpretability of results demonstrate the effectiveness of our approach.
Authors: Preciosa M Coloma; Martijn J Schuemie; Gianluca Trifirò; Rosa Gini; Ron Herings; Julia Hippisley-Cox; Giampiero Mazzaglia; Carlo Giaquinto; Giovanni Corrao; Lars Pedersen; Johan van der Lei; Miriam Sturkenboom Journal: Pharmacoepidemiol Drug Saf Date: 2010-11-08 Impact factor: 2.890
Authors: Naomi S Bardach; Eric Vittinghoff; Renée Asteria-Peñaloza; Jeffrey D Edwards; Jinoos Yazdany; Henry C Lee; W John Boscardin; Michael D Cabana; R Adams Dudley Journal: Pediatrics Date: 2013-08-26 Impact factor: 7.124
Authors: Peter K Lindenauer; Sharon-Lise T Normand; Elizabeth E Drye; Zhenqiu Lin; Katherine Goodrich; Mayur M Desai; Dale W Bratzler; Walter J O'Donnell; Mark L Metersky; Harlan M Krumholz Journal: J Hosp Med Date: 2011-01-05 Impact factor: 2.960
Authors: Harlan M Krumholz; Zhenqiu Lin; Elizabeth E Drye; Mayur M Desai; Lein F Han; Michael T Rapp; Jennifer A Mattera; Sharon-Lise T Normand Journal: Circ Cardiovasc Qual Outcomes Date: 2011-03
Authors: Jay G Berry; David E Hall; Dennis Z Kuo; Eyal Cohen; Rishi Agrawal; Chris Feudtner; Matt Hall; Jacqueline Kueser; William Kaplan; John Neff Journal: JAMA Date: 2011-02-16 Impact factor: 56.272
Authors: Jay G Berry; Sara L Toomey; Alan M Zaslavsky; Ashish K Jha; Mari M Nakamura; David J Klein; Jeremy Y Feng; Shanna Shulman; Vincent W Chiang; Vincent K Chiang; William Kaplan; Matt Hall; Mark A Schuster Journal: JAMA Date: 2013-01-23 Impact factor: 56.272
Authors: Jimeng Sun; Jianying Hu; Dijun Luo; Marianthi Markatou; Fei Wang; Shahram Edabollahi; Steven E Steinhubl; Zahra Daar; Walter F Stewart Journal: AMIA Annu Symp Proc Date: 2012-11-03