Yong Li1, Xiaoqian Jiang2, Shuang Wang3, Hongkai Xiong1, Lucila Ohno-Machado3. 1. EE Department, Shanghai Jiaotong University, Shanghai, China, 200240. 2. Department of Biomedical Informatics, UC San Diego, La Jolla, California, USA x1jiang@ucsd.edu. 3. Department of Biomedical Informatics, UC San Diego, La Jolla, California, USA.
Abstract
OBJECTIVE: To develop an accurate logistic regression (LR) algorithm to support federated data analysis of vertically partitioned distributed data sets. MATERIAL AND METHODS: We propose a novel technique that solves the binary LR problem by dual optimization to obtain a global solution for vertically partitioned data. We evaluated this new method, VERTIcal Grid lOgistic regression (VERTIGO), in artificial and real-world medical classification problems in terms of the area under the receiver operating characteristic curve, calibration, and computational complexity. We assumed that the institutions could "align" patient records (through patient identifiers or hashed "privacy-protecting" identifiers), and also that they both had access to the values for the dependent variable in the LR model (eg, that if the model predicts death, both institutions would have the same information about death). RESULTS: The solution derived by VERTIGO has the same estimated parameters as the solution derived by applying classical LR. The same is true for discrimination and calibration over both simulated and real data sets. In addition, the computational cost of VERTIGO is not prohibitive in practice. DISCUSSION: There is a technical challenge in scaling up federated LR for vertically partitioned data. When the number of patients m is large, our algorithm has to invert a large Hessian matrix. This is an expensive operation of time complexity O(m(3)) that may require large amounts of memory for storage and exchange of information. The algorithm may also not work well when the number of observations in each class is highly imbalanced. CONCLUSION: The proposed VERTIGO algorithm can generate accurate global models to support federated data analysis of vertically partitioned data. Published by Oxford University Press on behalf of the American Medical Informatics Association 2015. This work is written by US Government employees and is in the public domain in the US.
OBJECTIVE: To develop an accurate logistic regression (LR) algorithm to support federated data analysis of vertically partitioned distributed data sets. MATERIAL AND METHODS: We propose a novel technique that solves the binary LR problem by dual optimization to obtain a global solution for vertically partitioned data. We evaluated this new method, VERTIcal Grid lOgistic regression (VERTIGO), in artificial and real-world medical classification problems in terms of the area under the receiver operating characteristic curve, calibration, and computational complexity. We assumed that the institutions could "align" patient records (through patient identifiers or hashed "privacy-protecting" identifiers), and also that they both had access to the values for the dependent variable in the LR model (eg, that if the model predicts death, both institutions would have the same information about death). RESULTS: The solution derived by VERTIGO has the same estimated parameters as the solution derived by applying classical LR. The same is true for discrimination and calibration over both simulated and real data sets. In addition, the computational cost of VERTIGO is not prohibitive in practice. DISCUSSION: There is a technical challenge in scaling up federated LR for vertically partitioned data. When the number of patients m is large, our algorithm has to invert a large Hessian matrix. This is an expensive operation of time complexity O(m(3)) that may require large amounts of memory for storage and exchange of information. The algorithm may also not work well when the number of observations in each class is highly imbalanced. CONCLUSION: The proposed VERTIGO algorithm can generate accurate global models to support federated data analysis of vertically partitioned data. Published by Oxford University Press on behalf of the American Medical Informatics Association 2015. This work is written by US Government employees and is in the public domain in the US.
Entities:
Keywords:
dual optimization; federated data analysis; logistic regression; vertically partitioned data
Authors: Mohammed Saeed; Mauricio Villarroel; Andrew T Reisner; Gari Clifford; Li-Wei Lehman; George Moody; Thomas Heldt; Tin H Kyaw; Benjamin Moody; Roger G Mark Journal: Crit Care Med Date: 2011-05 Impact factor: 7.598
Authors: Lucila Ohno-Machado; Zia Agha; Douglas S Bell; Lisa Dahm; Michele E Day; Jason N Doctor; Davera Gabriel; Maninder K Kahlon; Katherine K Kim; Michael Hogarth; Michael E Matheny; Daniella Meeker; Jonathan R Nebeker Journal: J Am Med Inform Assoc Date: 2014-04-29 Impact factor: 4.497
Authors: Feng Chen; Shuang Wang; Xiaoqian Jiang; Sijie Ding; Yao Lu; Jihoon Kim; S Cenk Sahinalp; Chisato Shimizu; Jane C Burns; Victoria J Wright; Eileen Png; Martin L Hibberd; David D Lloyd; Hai Yang; Amalio Telenti; Cinnamon S Bloss; Dov Fox; Kristin Lauter; Lucila Ohno-Machado Journal: Bioinformatics Date: 2017-03-15 Impact factor: 6.937
Authors: Reihaneh Torkzadehmahani; Reza Nasirigerdeh; David B Blumenthal; Tim Kacprowski; Markus List; Julian Matschinske; Julian Spaeth; Nina Kerstin Wenke; Jan Baumbach Journal: Methods Inf Med Date: 2022-01-21 Impact factor: 1.800