Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 High performance logistic regression for privacy-preserving genome analysis.

Literature DB >> 33472626

High performance logistic regression for privacy-preserving genome analysis.

Martine De Cock¹, Rafael Dowsley², Anderson C A Nascimento³, Davis Railsback³, Jianwei Shen³, Ariel Todoki³.

Abstract

BACKGROUND: In biomedical applications, valuable data is often split between owners who cannot openly share the data because of privacy regulations and concerns. Training machine learning models on the joint data without violating privacy is a major technology challenge that can be addressed by combining techniques from machine learning and cryptography. When collaboratively training machine learning models with the cryptographic technique named secure multi-party computation, the price paid for keeping the data of the owners private is an increase in computational cost and runtime. A careful choice of machine learning techniques, algorithmic and implementation optimizations are a necessity to enable practical secure machine learning over distributed data sets. Such optimizations can be tailored to the kind of data and Machine Learning problem at hand.
METHODS: Our setup involves secure two-party computation protocols, along with a trusted initializer that distributes correlated randomness to the two computing parties. We use a gradient descent based algorithm for training a logistic regression like model with a clipped ReLu activation function, and we break down the algorithm into corresponding cryptographic protocols. Our main contributions are a new protocol for computing the activation function that requires neither secure comparison protocols nor Yao's garbled circuits, and a series of cryptographic engineering optimizations to improve the performance.
RESULTS: For our largest gene expression data set, we train a model that requires over 7 billion secure multiplications; the training completes in about 26.90 s in a local area network. The implementation in this work is a further optimized version of the implementation with which we won first place in Track 4 of the iDASH 2019 secure genome analysis competition.
CONCLUSIONS: In this paper, we present a secure logistic regression training protocol and its implementation, with a new subprotocol to securely compute the activation function. To the best of our knowledge, we present the fastest existing secure multi-party computation implementation for training logistic regression models on high dimensional genome data distributed across a local area network.

Entities: Chemical Disease

Keywords: Gene expression data; Gradient descent; Logistic regression; Machine learning; Secure multi-party computation

Year: 2021 PMID： 33472626 PMCID： PMC7818577 DOI： 10.1186/s12920-020-00869-9

Source DB: PubMed Journal: BMC Med Genomics ISSN： 1755-8794 Impact factor: 3.063

7 in total

1. Gene-expression profiles to predict distant metastasis of lymph-node-negative primary breast cancer.

Authors: Yixin Wang; Jan G M Klijn; Yi Zhang; Anieta M Sieuwerts; Maxime P Look; Fei Yang; Dmitri Talantov; Mieke Timmermans; Marion E Meijer-van Gelder; Jack Yu; Tim Jatkoe; Els M J J Berns; David Atkins; John A Foekens
Journal: Lancet Date: 2005 Feb 19-25 Impact factor: 79.321

2. Comparison among dimensionality reduction techniques based on Random Projection for cancer classification.

Authors: Haozhe Xie; Jie Li; Qiaosheng Zhang; Yadong Wang
Journal: Comput Biol Chem Date: 2016-09-21 Impact factor: 2.877

3. Deriving genomic diagnoses without revealing patient genomes.

Authors: Karthik A Jagadeesh; David J Wu; Johannes A Birgmeier; Dan Boneh; Gill Bejerano
Journal: Science Date: 2017-08-18 Impact factor: 47.728

4. A secure distributed logistic regression protocol for the detection of rare adverse drug events.

Authors: Khaled El Emam; Saeed Samet; Luk Arbuckle; Robyn Tamblyn; Craig Earle; Murat Kantarcioglu
Journal: J Am Med Inform Assoc Date: 2012-08-07 Impact factor: 4.497

5. Privacy-preserving logistic regression training.

Authors: Charlotte Bonte; Frederik Vercauteren
Journal: BMC Med Genomics Date: 2018-10-11 Impact factor: 3.063

6. Logistic regression model training based on the approximate homomorphic encryption.

Authors: Andrey Kim; Yongsoo Song; Miran Kim; Keewoo Lee; Jung Hee Cheon
Journal: BMC Med Genomics Date: 2018-10-11 Impact factor: 3.063

7. Logistic regression over encrypted data from fully homomorphic encryption.

Authors: Hao Chen; Ran Gilad-Bachrach; Kyoohyung Han; Zhicong Huang; Amir Jalali; Kim Laine; Kristin Lauter
Journal: BMC Med Genomics Date: 2018-10-11 Impact factor: 3.063

7 in total

2 in total

1. Isolated Sphenoid Sinusitis: Anatomical Features for Choosing a Method of Treatment, a Case-Control Study.

Authors: Sergei Karpishchenko; Olga Vereshchagina; Olga Stancheva; Tatiana Nagornykh; Alexander Krasichkov; Irina Serdiukova; Aleksandr Sinitca; Dmitry Kaplun
Journal: Diagnostics (Basel) Date: 2022-05-21

2. Machine learning for infection risk prediction in postoperative patients with non-mechanical ventilation and intravenous neurotargeted drugs.

Authors: Yi Du; Haipeng Shi; Xiaojing Yang; Weidong Wu
Journal: Front Neurol Date: 2022-08-01 Impact factor: 4.086

2 in total