| Literature DB >> 28111547 |
Samir Das1, Tristan Glatard2, Christine Rogers1, John Saigle3, Santiago Paiva4, Leigh MacIntyre1, Mouna Safi-Harab1, Marc-Etienne Rousseau1, Jordan Stirling1, Najmeh Khalili-Mahani5, David MacFarlane1, Penelope Kostopoulos1, Pierre Rioux1, Cecile Madjar6, Xavier Lecours-Boucher1, Sandeep Vanamala3, Reza Adalat1, Zia Mohaddes1, Vladimir S Fonov7, Sylvain Milot7, Ilana Leppert7, Clotilde Degroot3, Thomas M Durcan3, Tara Campbell1, Jeremy Moreau7, Alain Dagher4, D Louis Collins7, Jason Karamchandani3, Amit Bar-Or3, Edward A Fon3, Rick Hoge7, Sylvain Baillet7, Guy Rouleau3, Alan C Evans1.
Abstract
Data sharing is becoming more of a requirement as technologies mature and as global research and communications diversify. As a result, researchers are looking for practical solutions, not only to enhance scientific collaborations, but also to acquire larger amounts of data, and to access specialized datasets. In many cases, the realities of data acquisition present a significant burden, therefore gaining access to public datasets allows for more robust analyses and broadly enriched data exploration. To answer this demand, the Montreal Neurological Institute has announced its commitment to Open Science, harnessing the power of making both clinical and research data available to the world (Owens, 2016a,b). As such, the LORIS and CBRAIN (Das et al., 2016) platforms have been tasked with the technical challenges specific to the institutional-level implementation of open data sharing, including: Comprehensive linking of multimodal data (phenotypic, clinical, neuroimaging, biobanking, and genomics, etc.)Secure database encryption, specifically designed for institutional and multi-project data sharing, ensuring subject confidentiality (using multi-tiered identifiers).Querying capabilities with multiple levels of single study and institutional permissions, allowing public data sharing for all consented and de-identified subject data.Configurable pipelines and flags to facilitate acquisition and analysis, as well as access to High Performance Computing clusters for rapid data processing and sharing of software tools.Robust Workflows and Quality Control mechanisms ensuring transparency and consistency in best practices.Long term storage (and web access) of data, reducing loss of institutional data assets.Enhanced web-based visualization of imaging, genomic, and phenotypic data, allowing for real-time viewing and manipulation of data from anywhere in the world.Numerous modules for data filtering, summary statistics, and personalized and configurable dashboards. Implementing the vision of Open Science at the Montreal Neurological Institute will be a concerted undertaking that seeks to facilitate data sharing for the global research community. Our goal is to utilize the years of experience in multi-site collaborative research infrastructure to implement the technical requirements to achieve this level of public data sharing in a practical yet robust manner, in support of accelerating scientific discovery.Entities:
Keywords: bids; big data; cyberinfrastructure; data sharing; neuroimaging; neuroscience; open science framework; workflow
Year: 2017 PMID: 28111547 PMCID: PMC5216036 DOI: 10.3389/fninf.2016.00053
Source DB: PubMed Journal: Front Neuroinform ISSN: 1662-5196 Impact factor: 4.081
Figure 1MNI data flow from internal institutional repository to public-facing Open Science platform. At the institutional level, data are organized within individual studies and are only accessible by users approved by the study's principal investigator. Subjects participating in multiple studies are assigned unique IDs for each study. When data are shared to the Open MNI repository, a subject's data will be linked across all studies by a new unique subject ID.
Figure 2Information Flow for De-identification: Identifying subject information is encrypted and protected at each step. Subject information is collected (1) and then iteratively hashed (2) by a PBKDF2 algorithm using a SHA1 function to generate an Internal Hash value. This Internal Hash is mapped to a unique subject ID for each study (3); this mapping is stored in a database only accessible by database administrators (Internal MNI LORIS). Users of the Internal MNI LORIS platform will reference each study participant by this unique private ID, such that an individual enrolled in different studies will be registered under different subject IDs. For datasets that are selected for sharing via the Open MNI LORIS platform, (4) the Internal Hash value for each subject is encrypted again using a secure key known only to database administrators, such that data cannot be easily linked back to private subject IDs. At the same time, (5) data are further anonymized and images de-faced (facial features removed) during transfer from the Internal MNI platform to the public-facing Open MNI LORIS data platform.
Figure 3Imaging workflow from subject registration to data sharing in Open Science detailing processes for radiological reviews, quality control, and dissemination.
Figure 5Clinical/Behavioral workflow from subject registration to data sharing in Open Science detailing data validation, range checks, data integrity flags, and interactive statistics interface at the study and institutional levels.
Figure 4Biobanking workflow from subject registration to data sharing in Open Science detailing data collection, sample tracking, quality control, and dissemination.
Figure 6LORIS and CBRAIN interaction (Das et al., . Datasets hosted in LORIS' data-sharing platform are pushed to, processed by, and returned to the central LORIS repository from the CBRAIN distributed computing platform. Data can be downloaded or disseminated at any stage. Custom tools and pipelines can be packaged and mounted on CBRAIN for use by a research group or larger community of investigators.
C-BIG repository overview.
| Imagebank | Multi-modal, raw/processed neuroimaging data | MRI, PET, MEG, EEG, Spectroscopy |
| Biobank | Biospecimen data | Blood, saliva, skin, muscle & nerve biopsies, whole brains, cerebrospinal fluid |
| Genetic | Summary genetic data | SNPs, CNVs, CpG, GWAS |
| Phenotypic | Behavioral, clinical data | Instruments, Assessments, Questionnaires |
Data types and description of data that will be stored in the MNI's C-BIG Repository.