| Literature DB >> 32621651 |
Tonya White1,2, Elisabet Blok1, Vince D Calhoun3.
Abstract
Collaborative networks and data sharing initiatives are broadening the opportunities for the advancement of science. These initiatives offer greater transparency in science, with the opportunity for external research groups to reproduce, replicate, and extend research findings. Further, larger datasets offer the opportunity to identify homogeneous patterns within subgroups of individuals, where these patterns may be obscured by the heterogeneity of the neurobiological measure in smaller samples. However, data sharing and data pooling initiatives are not without their challenges, especially with new laws that may at first glance appear quite restrictive for open science initiatives. Interestingly, what is key to some of these new laws (i.e, the European Union's general data protection regulation) is that they provide greater control of data to those who "give" their data for research purposes. Thus, the most important element in data sharing is allowing the participants to make informed decisions about how they want their data to be used, and, within the law of the specific country, to follow the participants' wishes. This framework encompasses obtaining thorough informed consent and allowing the participant to determine the extent that they want their data shared, many of the ethical and legal obstacles are reduced to just monsters under the bed. In this manuscript we discuss the many options and obstacles for data sharing, from fully open, to federated learning, to fully closed. Importantly, we highlight the intersection of data sharing, privacy, and data ownership and highlight specific examples that we believe are informative to the neuroimaging community.Entities:
Keywords: ENIGMA; HIPAA; data ownership; data sharing; general data protection regulation
Mesh:
Year: 2020 PMID: 32621651 PMCID: PMC8675413 DOI: 10.1002/hbm.25120
Source DB: PubMed Journal: Hum Brain Mapp ISSN: 1065-9471 Impact factor: 5.038
A sampling of sharing approaches and their trade‐offs
| What is shared | Centralized full data | Centralized individual features | Voxel‐based and machine learning | Information content | Compute load | Custom subject‐level models | Privacy |
|---|---|---|---|---|---|---|---|
| Nothing | No | No | No | None | None | No | Highest |
| Privatized intermediates (e.g., COINSTAC [Plis et al., | No | No | Yes | High | Med‐low | Yes | Higher |
| Intermediates (e.g., COINSTAC [Plis et al., | No | No | Yes | High | Med‐low | Yes | High |
| Group coordinates (e.g., Brainmap [Fox & Lancaster, | No | No | Yes | Low | Low | No | High |
| Features (e.g., dataShield [Wolfson et al., | No | Yes | Yes | Med‐high | Med‐low | Yes | Med‐high |
| Data (temporarily) (e.g., ViPAR [Carter et al., 2016]) | Yes (private) | Yes | Yes | Med‐high | Med‐high | Yes | High |
| Group maps (e.g., neurovault [Gorgolewski et al., | No | No | Yes | Med‐low | Med‐low | No | High |
| Meta data (e.g., ENIGMA [Thompson et al., | No | No | No | Med‐low | Med‐low | Yes | Med |
| Mega data (e.g., ENIGMA [Thompson et al., | Yes | Yes | Yes | Med | Med | Yes | Med |
| Preprocessed data | Yes | Yes | Yes | High | High | Yes | Med |
| NIfTI data | Yes | Yes | Yes | High | High | Yes | Low |
| DICOM data | Yes | Yes | Yes | High | High | Yes | Low |
| Everything | Yes | Yes | Yes | Highest | Highest | Yes | Lowest |
One can use decentralized algorithms which also include additional privacy protection by, for example, adding structured noise to the derivatives before they are sent to the aggregator (e.g., differential privacy).
Because COINSTAC preprocessing for a given site can be pre‐computed once, the computational demands for subsequence analyses can be much lower (e.g., if one wants to incorporate a remote large N dataset with a local smaller N dataset).
Derivatives are privately aggregated.
It has been shown that in multiple cases, even group averages can reveal unanticipated information about the individual.
FIGURE 1The pros and cons of data sharing from the perspective of funding agencies, the public, and researchers