Literature DB >> 28472232

Differential privacy-based evaporative cooling feature selection and classification with relief-F and random forests.

Trang T Le1, W Kyle Simmons2,3, Masaya Misaki2, Jerzy Bodurka2,4, Bill C White5, Jonathan Savitz2,3, Brett A McKinney1,5.   

Abstract

MOTIVATION: Classification of individuals into disease or clinical categories from high-dimensional biological data with low prediction error is an important challenge of statistical learning in bioinformatics. Feature selection can improve classification accuracy but must be incorporated carefully into cross-validation to avoid overfitting. Recently, feature selection methods based on differential privacy, such as differentially private random forests and reusable holdout sets, have been proposed. However, for domains such as bioinformatics, where the number of features is much larger than the number of observations p≫n , these differential privacy methods are susceptible to overfitting.
METHODS: We introduce private Evaporative Cooling, a stochastic privacy-preserving machine learning algorithm that uses Relief-F for feature selection and random forest for privacy preserving classification that also prevents overfitting. We relate the privacy-preserving threshold mechanism to a thermodynamic Maxwell-Boltzmann distribution, where the temperature represents the privacy threshold. We use the thermal statistical physics concept of Evaporative Cooling of atomic gases to perform backward stepwise privacy-preserving feature selection.
RESULTS: On simulated data with main effects and statistical interactions, we compare accuracies on holdout and validation sets for three privacy-preserving methods: the reusable holdout, reusable holdout with random forest, and private Evaporative Cooling, which uses Relief-F feature selection and random forest classification. In simulations where interactions exist between attributes, private Evaporative Cooling provides higher classification accuracy without overfitting based on an independent validation set. In simulations without interactions, thresholdout with random forest and private Evaporative Cooling give comparable accuracies. We also apply these privacy methods to human brain resting-state fMRI data from a study of major depressive disorder.
AVAILABILITY AND IMPLEMENTATION: Code available at http://insilico.utulsa.edu/software/privateEC . CONTACT: brett-mckinney@utulsa.edu. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
© The Author (2017). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com

Entities:  

Mesh:

Year:  2017        PMID: 28472232      PMCID: PMC5870708          DOI: 10.1093/bioinformatics/btx298

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


  24 in total

1.  Evaporative cooling feature selection for genotypic data involving interactions.

Authors:  B A McKinney; D M Reif; B C White; J E Crowe; J H Moore
Journal:  Bioinformatics       Date:  2007-06-22       Impact factor: 6.937

2.  Revisiting default mode network function in major depression: evidence for disrupted subsystem connectivity.

Authors:  F Sambataro; N D Wolf; M Pennuto; N Vasic; R C Wolf
Journal:  Psychol Med       Date:  2013-10-31       Impact factor: 7.723

Review 3.  Resting state networks in major depressive disorder.

Authors:  Arpan Dutta; Shane McKie; J F William Deakin
Journal:  Psychiatry Res       Date:  2014-10-13       Impact factor: 3.222

Review 4.  Resting-state functional connectivity in major depressive disorder: A review.

Authors:  Peter C Mulders; Philip F van Eijndhoven; Aart H Schene; Christian F Beckmann; Indira Tendolkar
Journal:  Neurosci Biobehav Rev       Date:  2015-07-30       Impact factor: 8.989

5.  Fractionation of social brain circuits in autism spectrum disorders.

Authors:  Stephen J Gotts; W Kyle Simmons; Lydia A Milbury; Gregory L Wallace; Robert W Cox; Alex Martin
Journal:  Brain       Date:  2012-07-11       Impact factor: 13.501

6.  Regional homogeneity in depression and its relationship with separate depressive symptom clusters: a resting-state fMRI study.

Authors:  Zhijian Yao; Li Wang; Qing Lu; Haiyan Liu; Gaojun Teng
Journal:  J Affect Disord       Date:  2008-11-12       Impact factor: 4.839

7.  Bias in error estimation when using cross-validation for model selection.

Authors:  Sudhir Varma; Richard Simon
Journal:  BMC Bioinformatics       Date:  2006-02-23       Impact factor: 3.169

8.  ReliefSeq: a gene-wise adaptive-K nearest-neighbor feature selection tool for finding gene-gene interactions and main effects in mRNA-Seq gene expression data.

Authors:  Brett A McKinney; Bill C White; Diane E Grill; Peter W Li; Richard B Kennedy; Gregory A Poland; Ann L Oberg
Journal:  PLoS One       Date:  2013-12-10       Impact factor: 3.240

9.  Resolving individuals contributing trace amounts of DNA to highly complex mixtures using high-density SNP genotyping microarrays.

Authors:  Nils Homer; Szabolcs Szelinger; Margot Redman; David Duggan; Waibhav Tembe; Jill Muehling; John V Pearson; Dietrich A Stephan; Stanley F Nelson; David W Craig
Journal:  PLoS Genet       Date:  2008-08-29       Impact factor: 5.917

10.  Identify changes of brain regional homogeneity in bipolar disorder and unipolar depression using resting-state FMRI.

Authors:  Min-Jie Liang; Quan Zhou; Kan-Rong Yang; Xiao-Ling Yang; Jin Fang; Wen-Li Chen; Zheng Huang
Journal:  PLoS One       Date:  2013-12-04       Impact factor: 3.240

View more
  9 in total

1.  EpistasisRank and EpistasisKatz: interaction network centrality methods that integrate prior knowledge networks.

Authors:  Saeid Parvandeh; Brett A McKinney
Journal:  Bioinformatics       Date:  2019-07-01       Impact factor: 6.937

2.  Consensus features nested cross-validation.

Authors:  Saeid Parvandeh; Hung-Wen Yeh; Martin P Paulus; Brett A McKinney
Journal:  Bioinformatics       Date:  2020-05-01       Impact factor: 6.937

3.  Theoretical properties of distance distributions and novel metrics for nearest-neighbor feature selection.

Authors:  Bryan A Dawkins; Trang T Le; Brett A McKinney
Journal:  PLoS One       Date:  2021-02-08       Impact factor: 3.240

Review 4.  Relief-based feature selection: Introduction and review.

Authors:  Ryan J Urbanowicz; Melissa Meeker; William La Cava; Randal S Olson; Jason H Moore
Journal:  J Biomed Inform       Date:  2018-07-18       Impact factor: 6.317

Review 5.  The role of systems biology approaches in determining molecular signatures for the development of more effective vaccines.

Authors:  Abdulmohammad Pezeshki; Inna G Ovsyannikova; Brett A McKinney; Gregory A Poland; Richard B Kennedy
Journal:  Expert Rev Vaccines       Date:  2019-02-11       Impact factor: 5.217

Review 6.  Differential privacy in health research: A scoping review.

Authors:  Joseph Ficek; Wei Wang; Henian Chen; Getachew Dagne; Ellen Daley
Journal:  J Am Med Inform Assoc       Date:  2021-09-18       Impact factor: 7.942

7.  Random-forest algorithm based biomarkers in predicting prognosis in the patients with hepatocellular carcinoma.

Authors:  Lingyun Guo; Zhenjiang Wang; Yuanyuan Du; Jie Mao; Junqiang Zhang; Zeyuan Yu; Jiwu Guo; Jun Zhao; Huinian Zhou; Haitao Wang; Yanmei Gu; Yumin Li
Journal:  Cancer Cell Int       Date:  2020-06-17       Impact factor: 5.722

8.  STatistical Inference Relief (STIR) feature selection.

Authors:  Trang T Le; Ryan J Urbanowicz; Jason H Moore; Brett A McKinney
Journal:  Bioinformatics       Date:  2019-04-15       Impact factor: 6.937

9.  AgeGuess, a Methylomic Prediction Model for Human Ages.

Authors:  Xiaoqian Gao; Shuai Liu; Haoqiu Song; Xin Feng; Meiyu Duan; Lan Huang; Fengfeng Zhou
Journal:  Front Bioeng Biotechnol       Date:  2020-03-10
  9 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.