| Literature DB >> 24778614 |
Anand D Sarwate1, Sergey M Plis2, Jessica A Turner3, Mohammad R Arbabshirani4, Vince D Calhoun4.
Abstract
The growth of data sharing initiatives for neuroimaging and genomics represents an exciting opportunity to confront the "small N" problem that plagues contemporary neuroimaging studies while further understanding the role genetic markers play in the function of the brain. When it is possible, open data sharing provides the most benefits. However, some data cannot be shared at all due to privacy concerns and/or risk of re-identification. Sharing other data sets is hampered by the proliferation of complex data use agreements (DUAs) which preclude truly automated data mining. These DUAs arise because of concerns about the privacy and confidentiality for subjects; though many do permit direct access to data, they often require a cumbersome approval process that can take months. An alternative approach is to only share data derivatives such as statistical summaries-the challenges here are to reformulate computational methods to quantify the privacy risks associated with sharing the results of those computations. For example, a derived map of gray matter is often as identifiable as a fingerprint. Thus alternative approaches to accessing data are needed. This paper reviews the relevant literature on differential privacy, a framework for measuring and tracking privacy loss in these settings, and demonstrates the feasibility of using this framework to calculate statistics on data distributed at many sites while still providing privacy.Entities:
Keywords: collaborative research; data integration; data sharing; neuroimaging; privacy
Year: 2014 PMID: 24778614 PMCID: PMC3985022 DOI: 10.3389/fninf.2014.00035
Source DB: PubMed Journal: Front Neuroinform ISSN: 1662-5196 Impact factor: 4.081
Figure 1System for differentially private classifier aggregation from many sites. The N sites each train a classifier on their local data to learn vectors {}. These are used by an aggregator to compute new features for its own data set. The aggregator can learn a classifier using its own data using a non-private algorithm (if its data is public) or a differentially private algorithm (if its data is private).
Figure 2Classification error rates for the mixed private-public case (A) and the fully-private case (B). In both cases the combined differentially private classifier performs significantly better than the individual classifiers. The difference is statistically significant even after Bonferroni correction (to account for multiple sites) with corrected p-values below 1.8 × 10−33. Results thus motivate the use of differential privacy for sharing of brain imaging and genetic data to enable quick access to data which is either hard to access for logical reasons or not available for open sharing at all.