Theodora S Brisimi1, Ruidi Chen1, Theofanie Mela2, Alex Olshevsky1, Ioannis Ch Paschalidis3, Wei Shi4. 1. Department of Electrical & Computer Engineering, and Division of Systems Engineering, Boston University, 8 Saint Mary's St., Boston, MA 02215, United States. 2. Electrophysiology Lab/Arrhythmia Service, Massachusetts General Hospital, 55 Fruit St., Boston, MA 02114, United States. 3. Department of Electrical & Computer Engineering, and Division of Systems Engineering, Boston University, 8 Saint Mary's St., Boston, MA 02215, United States; Department of Biomedical Engineering, Boston University, 44 Cummington Mall, Boston, MA 02215, United States. Electronic address: yannisp@bu.edu. 4. Department of Electrical & Computer Engineering, and Division of Systems Engineering, Boston University, 8 Saint Mary's St., Boston, MA 02215, United States; School of Electrical & Computer Engineering, Arizona State University, Tempe, AZ, United States.
Abstract
BACKGROUND: In an era of "big data," computationally efficient and privacy-aware solutions for large-scale machine learning problems become crucial, especially in the healthcare domain, where large amounts of data are stored in different locations and owned by different entities. Past research has been focused on centralized algorithms, which assume the existence of a central data repository (database) which stores and can process the data from all participants. Such an architecture, however, can be impractical when data are not centrally located, it does not scale well to very large datasets, and introduces single-point of failure risks which could compromise the integrity and privacy of the data. Given scores of data widely spread across hospitals/individuals, a decentralized computationally scalable methodology is very much in need. OBJECTIVE: We aim at solving a binary supervised classification problem to predict hospitalizations for cardiac events using a distributed algorithm. We seek to develop a general decentralized optimization framework enabling multiple data holders to collaborate and converge to a common predictive model, without explicitly exchanging raw data. METHODS: We focus on the soft-margin l1-regularized sparse Support Vector Machine (sSVM) classifier. We develop an iterative cluster Primal Dual Splitting (cPDS) algorithm for solving the large-scale sSVM problem in a decentralized fashion. Such a distributed learning scheme is relevant for multi-institutional collaborations or peer-to-peer applications, allowing the data holders to collaborate, while keeping every participant's data private. RESULTS: We test cPDS on the problem of predicting hospitalizations due to heart diseases within a calendar year based on information in the patients Electronic Health Records prior to that year. cPDS converges faster than centralized methods at the cost of some communication between agents. It also converges faster and with less communication overhead compared to an alternative distributed algorithm. In both cases, it achieves similar prediction accuracy measured by the Area Under the Receiver Operating Characteristic Curve (AUC) of the classifier. We extract important features discovered by the algorithm that are predictive of future hospitalizations, thus providing a way to interpret the classification results and inform prevention efforts.
BACKGROUND: In an era of "big data," computationally efficient and privacy-aware solutions for large-scale machine learning problems become crucial, especially in the healthcare domain, where large amounts of data are stored in different locations and owned by different entities. Past research has been focused on centralized algorithms, which assume the existence of a central data repository (database) which stores and can process the data from all participants. Such an architecture, however, can be impractical when data are not centrally located, it does not scale well to very large datasets, and introduces single-point of failure risks which could compromise the integrity and privacy of the data. Given scores of data widely spread across hospitals/individuals, a decentralized computationally scalable methodology is very much in need. OBJECTIVE: We aim at solving a binary supervised classification problem to predict hospitalizations for cardiac events using a distributed algorithm. We seek to develop a general decentralized optimization framework enabling multiple data holders to collaborate and converge to a common predictive model, without explicitly exchanging raw data. METHODS: We focus on the soft-margin l1-regularized sparse Support Vector Machine (sSVM) classifier. We develop an iterative cluster Primal Dual Splitting (cPDS) algorithm for solving the large-scale sSVM problem in a decentralized fashion. Such a distributed learning scheme is relevant for multi-institutional collaborations or peer-to-peer applications, allowing the data holders to collaborate, while keeping every participant's data private. RESULTS: We test cPDS on the problem of predicting hospitalizations due to heart diseases within a calendar year based on information in the patients Electronic Health Records prior to that year. cPDS converges faster than centralized methods at the cost of some communication between agents. It also converges faster and with less communication overhead compared to an alternative distributed algorithm. In both cases, it achieves similar prediction accuracy measured by the Area Under the Receiver Operating Characteristic Curve (AUC) of the classifier. We extract important features discovered by the algorithm that are predictive of future hospitalizations, thus providing a way to interpret the classification results and inform prevention efforts.
Authors: Alexander Statnikov; Ioannis Tsamardinos; Yerbolat Dosbayev; Constantin F Aliferis Journal: Int J Med Inform Date: 2005-08 Impact factor: 4.046
Authors: Wuyang Dai; Theodora S Brisimi; William G Adams; Theofanie Mela; Venkatesh Saligrama; Ioannis Ch Paschalidis Journal: Int J Med Inform Date: 2014-10-16 Impact factor: 4.046
Authors: Ralph B D'Agostino; Ramachandran S Vasan; Michael J Pencina; Philip A Wolf; Mark Cobain; Joseph M Massaro; William B Kannel Journal: Circulation Date: 2008-01-22 Impact factor: 29.690
Authors: Luke V Rasmussen; Pascal S Brandt; Guoqian Jiang; Richard C Kiefer; Jennifer A Pacheco; Prakash Adekkanattu; Jessica S Ancker; Fei Wang; Zhenxing Xu; Jyotishman Pathak; Yuan Luo Journal: AMIA Annu Symp Proc Date: 2020-03-04
Authors: William Rogers; Sithin Thulasi Seetha; Turkey A G Refaee; Relinde I Y Lieverse; Renée W Y Granzier; Abdalla Ibrahim; Simon A Keek; Sebastian Sanduleanu; Sergey P Primakov; Manon P L Beuque; Damiënne Marcus; Alexander M A van der Wiel; Fadila Zerka; Cary J G Oberije; Janita E van Timmeren; Henry C Woodruff; Philippe Lambin Journal: Br J Radiol Date: 2020-02-26 Impact factor: 3.039