| Literature DB >> 35814415 |
Md Mohaiminul Islam1, Noman Mohammed2, Yang Wang2, Pingzhao Hu1,2,3,4.
Abstract
Proper analysis of high-dimensional human genomic data is necessary to increase human knowledge about fundamental biological questions such as disease associations and drug sensitivity. However, such data contain sensitive private information about individuals and can be used to identify an individual (i.e., privacy violation) uniquely. Therefore, raw genomic datasets cannot be publicly published or shared with researchers. The recent success of deep learning (DL) in diverse problems proved its suitability for analyzing the high volume of high-dimensional genomic data. Still, DL-based models leak information about the training samples. To overcome this challenge, we can incorporate differential privacy mechanisms into the DL analysis framework as differential privacy can protect individuals' privacy. We proposed a differential privacy based DL framework to solve two biological problems: breast cancer status (BCS) and cancer type (CT) classification, and drug sensitivity prediction. To predict BCS and CT using genomic data, we built a differential private (DP) deep autoencoder (dpAE) using private gene expression datasets that performs low-dimensional data representation learning. We used dpAE features to build multiple DP binary classifiers to predict BCS and CT in any individual. To predict drug sensitivity, we used the Genomics of Drug Sensitivity in Cancer (GDSC) dataset. We extracted GDSC's dpAE features to build our DP drug sensitivity prediction model for 265 drugs. Evaluation of our proposed DP framework shows that it achieves improved prediction performance in predicting BCS, CT, and drug sensitivity than the previously published DP work.Entities:
Keywords: Rényi differential privacy; breast cancer; deep learning; differential privacy; omics data
Year: 2022 PMID: 35814415 PMCID: PMC9259987 DOI: 10.3389/fonc.2022.879607
Source DB: PubMed Journal: Front Oncol ISSN: 2234-943X Impact factor: 5.738
Figure 1Proposed deep learning based differential private framework to perform classification and linear regression tasks with privacy-sensitive biological data. (A) Pipeline to build an underlying data representation learning model (i.e., dpAE) with private data. (B) DL architecture of dpAE. (C) DL architecture of our proposed differential private classifier (i.e., dpClassM). (D) Pipeline to predict the sensitivity of drugs in cell lines. (E) DL architecture of our proposed differential private linear drug sensitivity regressor (i.e., dpRegM).
Figure 2Comparison of prediction performances between our proposed Rényi differential private binary (ER+/-) classifiers and baseline models for different privacy budgets on the METABRIC (34) dataset. (A, B) Comparison of ER+/- classifiers in terms of mean accuracy (%) and mean AUC, respectively, from the 10-times repeated experiments (C, D) Comparison of standard deviations of accuracy (%) and AUC, respectively, from the 10-times repeated experiments.
Figure 3Comparison of prediction performances between our proposed Rényi differential private binary (ER+/- or cancer types) classifiers and baseline models on the METABRIC (34) and the TCGA (33) datasets when privacy budget is 1.0. (A, B) Comparison of binary (ER+/- and cancer types) classifiers in terms of mean accuracy (%) and mean AUC, respectively, from the 10-fold cross-validation (C, D) Comparison of standard deviations of accuracy (%) and AUC, respectively, from the 10-fold cross-validation.
Comparison of drug sensitivity prediction performance in terms of average Spearman’s rank correlation coefficients of differential private and non-private models.
| Framework | Dataset for representation learning | Privacy Status | Spearman’srank correlation coefficient |
|---|---|---|---|
| ( | Redistributed TCGA | Private | 0.25 |
| ( | None | Private | 0.18 |
| Non-private | 0.26 | ||
| Bayesian DP ( | None | Private | 0.20(STD 0.057) |
| METABRIC | Private | 0.20(STD 0.051) | |
| Non-private | 0.22(STD 0.043) | ||
| TCGA(Original) | Private | 0.26(STD 0.045) | |
| Non-private | 0.28(STD 0.044) |
The privacy budget was ∈ = 1.0 for all differential private models. The “Proposed framework” means the differential private model, and the STD represents the standard deviation.