| Literature DB >> 27594820 |
Sergey M Plis1, Anand D Sarwate2, Dylan Wood1, Christopher Dieringer1, Drew Landis1, Cory Reed1, Sandeep R Panta1, Jessica A Turner3, Jody M Shoemaker1, Kim W Carter4, Paul Thompson5, Kent Hutchison6, Vince D Calhoun7.
Abstract
The field of neuroimaging has embraced the need for sharing and collaboration. Data sharing mandates from public funding agencies and major journal publishers have spurred the development of data repositories and neuroinformatics consortia. However, efficient and effective data sharing still faces several hurdles. For example, open data sharing is on the rise but is not suitable for sensitive data that are not easily shared, such as genetics. Current approaches can be cumbersome (such as negotiating multiple data sharing agreements). There are also significant data transfer, organization and computational challenges. Centralized repositories only partially address the issues. We propose a dynamic, decentralized platform for large scale analyses called the Collaborative Informatics and Neuroimaging Suite Toolkit for Anonymous Computation (COINSTAC). The COINSTAC solution can include data missing from central repositories, allows pooling of both open and "closed" repositories by developing privacy-preserving versions of widely-used algorithms, and incorporates the tools within an easy-to-use platform enabling distributed computation. We present an initial prototype system which we demonstrate on two multi-site data sets, without aggregating the data. In addition, by iterating across sites, the COINSTAC model enables meta-analytic solutions to converge to "pooled-data" solutions (i.e., as if the entire data were in hand). More advanced approaches such as feature generation, matrix factorization models, and preprocessing can be incorporated into such a model. In sum, COINSTAC enables access to the many currently unavailable data sets, a user friendly privacy enabled interface for decentralized analysis, and a powerful solution that complements existing data sharing solutions.Entities:
Keywords: brain imaging; data sharing; decentralized algorithms; decentralized processing; privacy
Year: 2016 PMID: 27594820 PMCID: PMC4990563 DOI: 10.3389/fnins.2016.00365
Source DB: PubMed Journal: Front Neurosci ISSN: 1662-453X Impact factor: 4.677
The challenges addressed by COINSTAC.
| Privacy protection for subjects | Differential privacy models |
| Data sharing concerns for investigators | Raw data are not shared; they are used and summaries are shared |
| Complex processing streams | A platform for automated, iterated, distributed algorithms |
| Centralized compute resources | Local computation of input data, centralized aggregation |
| Quality assurance/data heterogeneity | Anomaly detection and validated quality control methods (planned) |
| Ease of use | Passive on investigator's part; data reuse occurs without interruption |
Figure 1An overview diagram of how COINSTAC organizes research on (potentially sensitive) decentralized data.
Figure 2COINSTAC vision for multisite data analysis relative to the current most common approaches.
Figure 3Comparison of djICA ISI with that of pooled data ICA. (A) Effect of increasing the number of subjects in an experiment with two sites. Distributed ICA has performance competitive with centralized ICA even though the data is split across different sites and only derivatives are shared. (B) Effect of splitting a total subject pool of 2048 subjects among an increasing number of sites. There was no visible change in performance despite the data being distributed.
Figure 4Architecture of the COINSTAC implementation. The client is a desktop application written using standard web technologies on top of the Electron (2015) and Node.js (2015). Electron enables installation on any major OS and access to low level system utilities not available to browser-based web apps. The server is a RESTful API running on top of Node.js (2015) and Hapi.js (2015). Data is stored in multiple CouchDB (2015) datastores. Client-server data transfer takes place via the RESTful webservice, and synchronization mechanisms that are built into CouchDB (2015).
Figure 5User interface examples and parts of the COINSTAC workflow: (A) login to the system or account creation, (B) viewing (after creation) of the list of consortia the user is involved in, (C) view of a single consortium, and (D) adding files to a project within a consortium. Consortia list in (B) demonstrates a case of exercising a granular control: parts (or all) of a local data pool can be involved in different consortia (studies). At the same time, a data provider only allows their data to be used for the agreed-upon purposes of a consortium; to use these data for something else, a different consortium needs to be joined. The individual consortium view in (C) can show progress of ongoing computations and other monitoring information. Despite the client being essentially a web application, in COINSTAC it is straightforward to add data to a consortium using a simple point and click operation via standard OS tools with which the users are already familiar.
Figure 6Nonlinear 2D embedding using t-SNE on a dataset of 10,000 structural MRI scans collected at the Mind Research Network (stored in . The colors are added at the post-processing stage and signify two different field strengths (i.e., two scanners) used for data collection. Notably, the data are visibly different for the two scanners. We also show that despite this, the coding of effects such as age is remarkably similar in the two scanners, providing support for cross-scanner studies which incorporate calibration factors (Panta et al., 2016).
Single-shot averaging
| 1: |
| 2: |
| 3: Node |
| 4: |
| 5: A |
Decentralized gradient descent
| 1: A |
| 2: |
| 3: |
| 4: A |
| 5: Node |
| 6: Node |
| 7: |
| 8: A |
| 9: A |
| 10: A |
| 11: |
| 12: η ← η∕2 ⊳ reduce step size if overshoot |
| 13: |
| 14: A |
| 15: A |
| 16: A |
| 17: |
| 18: |
| 19: |
Differentially private single-shot averaging
| 1: |
| 2: |
| 3: Node |
| 4: |
| 5: A |
Differentially private stochastic gradient descent
| 1: A |
| 2: |
| 3: |
| 4: A |
| 5: Node |
| 6: Node |
| 7: Node |
| 8: |
| 9: A |
| 10: |
| 11: |