| Literature DB >> 34003431 |
Samir Das1,2,3, Rida Abou-Haidar4,5, Henri Rabalais4,5, Sonia Denise Lai Wing Sun5,6, Zaliqa Rosli4,5, Krishna Chatpar4,5, Marie-Noëlle Boivin5, Mahdieh Tabatabaei5, Christine Rogers4,5, Melanie Legault4,5, Derek Lo4,5, Clotilde Degroot5, Alain Dagher5,7, Stephanie O M Dyke4,5,8, Thomas M Durcan6,9, Annabel Seyller5, Julien Doyon5,7, Viviane Poupon5, Edward A Fon5,8, Angela Genge5,6, Guy A Rouleau5, Jason Karamchandani5,6, Alan C Evans4,5,7.
Abstract
In January 2016, the Montreal Neurological Institute-Hospital (The Neuro) declared itself an Open Science organization. This vision extends beyond efforts by individual scientists seeking to release individual datasets, software tools, or building platforms that provide for the free dissemination of such information. It involves multiple stakeholders and an infrastructure that considers governance, ethics, computational resourcing, physical design, workflows, training, education, and intra-institutional reporting structures. The C-BIG repository was built in response as The Neuro's institutional biospecimen and clinical data repository, and collects biospecimens as well as clinical, imaging, and genetic data from patients with neurological disease and healthy controls. It is aimed at helping scientific investigators, in both academia and industry, advance our understanding of neurological diseases and accelerate the development of treatments. As many neurological diseases are quite rare, they present several challenges to researchers due to their small patient populations. Overcoming these challenges required the aggregation of datasets from various projects and locations. The C-BIG repository achieves this goal and stands as a scalable working model for institutions to collect, track, curate, archive, and disseminate multimodal data from patients. In November 2020, a Registered Access layer was made available to the wider research community at https://cbigr-open.loris.ca , and in May 2021 fully open data will be released to complement the Registered Access data. This article outlines many of the aspects of The Neuro's transition to Open Science by describing the data to be released, C-BIG's full capabilities, and the design aspects that were implemented for effective data sharing.Entities:
Keywords: Biobank; Database; Genetic; Interoperability; Open Science; Registered access
Mesh:
Year: 2021 PMID: 34003431 PMCID: PMC9537233 DOI: 10.1007/s12021-021-09516-9
Source DB: PubMed Journal: Neuroinformatics ISSN: 1539-2791
Fig. 1Architectural diagram of the various components used in C-BIG to illustrate the workflows involved in acquiring, curating, processing, and disseminating data: Software: The LORIS platform undergoes continual development with regular releases to improve the functionality, security and interface of the C-BIG repository. Data Acquisition: C-BIG acquires a number of data modalities from consenting patients whose unique IDs are tracked via the SPI patient registry based on patient preference and their risk tolerance. Internal Database: Data are standardized and housed in the C-BIG Internal Database, where metadata/data can be viewed and manipulated by lab technicians and researchers across multiple projects via a web-based repository organizing the metadata in modules specific to their needs. Anonymization: Datasets are then anonymized to reduce the risk of re-identification. High Performance Computing can be leveraged using any HPC system depending on the user preferences. Public Access: Datasets are made available via the public layer where user access is regulated by tiers determined by the nature of the dataset. The data can be queried in a granular manner by researchers wanting to do specific processing and analysis, or seeking summary statistics, documentation or quality control results
Example of metadata fields shared across all specimens in the database
| Field | Value |
|---|---|
| Serum | |
| Cryotube Vial | |
| TOSI0000001 | |
| Visit 01 | |
| TOSI | |
| CRU-MNI | |
| 500 μL | |
| 0 | |
| −80° | |
| Available |
Example of how key to SOP relations can differ across SOPs
| Isolation of Serum from Whole Blood - SOP BB-P-0003 | DNA Extraction from Whole Blood - SOP BB-P-0009 | ||
|---|---|---|---|
| key | value | key | value |
| Milky Serum | FALSE | DNA Quantification Date | 2019-10-29 |
| Hemolyzed | TRUE | DNA Concentration (ng/μL) | 178.9 |
| Hemodialysis Index | 2 | 260/280 Ratio | 1.86 |
Fig. 2Data flow and interoperability chart between the various subsystems
List of tests that were conducted to ensure proper functionality of the C-BIG system
| Test type | Details |
|---|---|
| System configurations | Routinely run tests based on validated database configurations and imported study parameters. |
| Atomic operations | Every atomic Biobank operation1 was tested independently at each test deployment. |
| Biobank integration | Operation combinations2 were tested with acceptable samples of permutations succeeding. |
| Biobank usability | Frequent usability tests were always performed by clinical users to ensure intuitive development. |
| Scalability testing | The system was loaded with high flux of data to identify and correct algorithmic inefficiencies. |
| Integration testing | The entire system was tested using real scenarios. Deployment contingent on a 100% passing grade. |
1Atomic operations are unique actions users can take in the module (add a new biospecimen, edit a container, discard used pool)
2Operation combinations refer to a sequence of ordered atomic operations (create sample, pool it, aliquot the pool and discard it)
Fig. 3Biobank data entry form for biospecimens
Fig. 4Biobank specimen page displays specimen metadata, processing stages & life cycle
Fig. 5Biobank container page with graphic display of the container dimensions and contents
Fig. 6The number of biospecimens collected over time in the C-BIG since 2008
Fig. 7(LEFT) C-BIG currently houses data from 1720 patients (931 males/789 females) in 88 disease groups across 20 projects and 19 sites, 49% with a clinical diagnosis of Parkinson’s disease. (RIGHT) Over 32,500 biospecimen samples have been collected and archived in C-BIG. The storage infrastructure includes 367 Matrix Boxes, and 12 Freezers and Cryogenic Tanks
Metadata for 585 sequenced patients in 8 different disease groups
| Disease Name | Amount of Patients Sequenced |
|---|---|
| Atypical Parkinsonism | 4 |
| Essential Tremors | 1 |
| Gaucher Disease | 1 |
| Lewy Body Dementia | 3 |
| Multiple System Atrophy | 2 |
| Parkinson Disease | 568 |
| Progressive supranuclear palsy | 5 |
| Wilson Disease | 1 |